Search engines employ automated processes or robots, casually known as ’spiders’ or ‘crawlers,’ to find various sites. They’re an important part of the whole internet infrastructure, but why is that so? What do they do exactly?
These spiders actually have a rather limited scope of understanding and power available to them, far less than you would think considering they’re minions of such great and mighty names as Google and Yahoo. There’s a lot of things out of the scope of their understanding, such as frames, visuals such as movies or pictures, and scripting via java. Nor can they peek into parts of sites protected by passwords, or click buttons. Well, that’s what they can’t do. What CAN they do?
The robot makes a list of the web pages in the system at the ’submit a URL page, then searches for these web pages in order from the list the next time it goes on the web. Sometimes a robot will find your page whether you have submitted it or not because other site links may lead the robot to your site. Building your link popularity and getting links from other topical sites back to your site is important. The first thing a robot does when it arrives is to check for a robots.txt file. This file tells the robots which sites are off-limits. Usually these are files that should be of no concern because they are binaries or other files that are not needed by the robot.
Submitting a new URL to a search engine adds this URL to the queue which the spiders are due to ‘crawl’ or visit. However, even if a URL isn’t submitted directly, the spiders usually find it through links from other websites. If you build link popularity, this will help the spiders find you faster. When the robots arrive, they’ll check your site for a file called ‘robots.txt,’ which will tell them what areas of the website they are not allowed to visit. Off-limits files may include things like binaries or other information that the spiders need not report back.
To ensure that searchers get the right results with the most relevant response to their query, quick calculations are done to see that this happens. Server logs and log statistics program results can be checked by the user to see what pages have been visited and how often. Some robots may be easy to identify such as Google’s ‘Googlebot’, while less well-known ones such as Inktomi’s ‘Slurp’ are not easily identifiable. Some robots even appear to be human-powered browsers.
A robot ‘reads’ your site by collecting data on any visible text, on tags you may have in the coding of your page, and on any links available. These are the things that determine what the search engines ‘think’ your content is about, so these are the things you really need to pay attention to when building a site that you want to have high visibility in search results.
Search engines don’t update instantly from moment to moment. No, their database updates can vary in the exact timing. However, once you’re in there, the bots will make a point to visit you frequently so as to pick up on updates and the like. If your site is down at the time the bot may not be able to update your site in the search engine database, so do keep that in mind. So, robots may be scary things in movies, but as you can see, as far as the internet goes they’re nothing but helpful tools to guide us in going from site to site. Embrace them, learn how to help them be more efficient, and work with them to get your web site highly-ranked so that you can maximize your visitors.
Justin Harrison is an internationally recognised Internet Marketing Consultant expert who provides world class Search Engine Optimization to website owners. For more information visit: http://www.seorankings.co.za