What's in a Search Engine?
To effectively optimize for search engines and to better understand
what's really happening, there is value in knowing how modern search
algorithms work. This article will walk through the creation of a
hypothetical search engine, and will show how this impacts search
engine optimization.
Step One: Make a List of URLs and Crawl Them
Before anything can be done, a list of URLs needs to be retrieved to
initially crawl. The most popular option for this is to load the URLs in
the DMOZ database. These aren't the only sites that will be crawled.
The pages linked to by sites in the DMOZ directory are also crawled
since the crawler follows the links. It certainly helps to be in DMOZ,
especially if you don't have enough links from other sites to be sure
that you'll be sufficiently crawled.
Now, a group of computers are set up to download all of the pages on
the list. These are called the "crawlers." They will also look at the links
on those pages, and crawl those URLs as well (the crawlers will
continue following links until their hard drives are full).