Posted: Sat Jan 17, 2004 8:15 pm Post subject: Googlebot indexing behaviour?
Hello everyone,
How does a search engine robot, e.g. Google, index a page with lots of external links, if the meta tag includes "index, follow, all"?
Will the Googlebot leave the page being indexed to follow an external link, therefore failing to index the rest of the current page, and indeed failing to index the rest of the site, or does a Googlebot follow a link and then turn round and come back to index the rest of the page and site?
Or alternatively, does it spawn multiple copies of itself and continue to index the current page and site while following the external links?
My concern is that in having external links, and the "index, follow, all" tag for the benefit of the internal links, Googlebot might be led away from the site, and which will not be indexed as a result. Is this correct, and if so, how can Googlebot be prevented from following external links, while still following internal links and indexing the entire site?
As a general question, how does Google manage to index the entire Web? The bandwidth requirement to copy billions of pages back to its databases must be absolutely astronomical! Does it really index the entire web as it finds it, or does it just index what it can and leave the rest?
Posted: Sun Jan 18, 2004 12:02 am Post subject: Re: Googlebot indexing behaviour?
1. Google, and other SE's, spider the site they are at, recording all data and links. AFter the spider is done with what it is "programmed to pick up" from that site, then it continues to the next site in the list, and so on.
I say programmed to pick up because sometimes GB will pick up your index page, and that's it, other times it will pick up a few pages, or the whole site.
2. Astronomical servers? Absolutely. Having the technology to be able to do such indexing does require phenomenal resources we ordinary folk can only dream about
Googlebot, like most major SE's, have a "restrainer" built into them so they don't hit a site, pulling pages so fast it would overload the server of the site it is indexing.
Sometimes, having a dynamic site, or a very large content site, can cause a bot to hang around your site so long your host might shut down access to your site for going over your bandwidth limit for the month!
Debs _________________ Learn how to turn keyphrases into quality, well-targeted articles your visitors and SE's will love with Gary Antosh's new ebook "Web Content Made Easy!"
I think a lot of people think of googlebot as a this one robot that goes out and does all the work. Instead of numerous spiders that they have all working out there at once hehe.
I have this odd picture in my head of the googlebot coming and all the natives bowing down to it lol
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
Your host: Allan Gardyne. Earning a good living from affiliate
programs since 1998.