How Search Works
Welcome back for your next lesson. This one’s all about search engines—or more specifically, how they work. Now, if you want to get more traffic from search engines. You first need to know how they tick—then you need to know what things they look out for. And in this lesson, I’m pleased to let you know, that’s exactly what we’ll be covering.
The 3 Stages to How Search Engines Work
Now, you’ll remember that search engines behave in a similar way to librarians. And that included three key steps. First they crawl, then they index and finally, they rank.
Now, let’s look at each step in more detail:
Step 1: Crawling
As we know, the crawling stage is where search engines discover publicly available webpages. Or in the library analogy, they discover books. The process has a beginning, a middle and an ending.
At the beginning, search engines crawl webpages to find information. They send out automated crawlers which we sometimes call bots, spiders or in Google’s case, Googlebot. And they discover pages through:
• Revisiting previously crawled pages
• Reviewing pages submitted by webmasters
• Following links off previously crawled pages
Now for the last one, following links of previously crawled pages. It’s perhaps the most common and natural way they do it. So, if you want a new page to be found, it should always be linked off an existing page.
In the middle part of the crawl, search engines pay close attention to three things:
• New pages: these are pages not seen in previous crawls, so they are not yet indexed
• Updated pages: this is where the content of a page has changed since the previous crawl
• Dead links: these are links that return an error, rather than useful content
Winter Olympic Games Example
Okay, let’s now look at an example. Imagine you are a sports writer for the popular Sport Lion website.
It’s the last day of the Winter Olympic Games and it’s been a busy morning. You’ve written two new pages. The first, on the Bobsleigh event—and the second for the Ski Jump.
But, you’ve also been asked to take down a page, due to a privacy complaint. That said, the two new pages provide great information—and search engines want it!
And as we know, one way search engines discover content is by revisiting previously crawled pages. And then crawling the links off those pages.
So, let’s now look at how a crawler would find your content and record its information.
First, the search engine crawler would start at its index. Then it would progress to the Sports Lion homepage. From there, the crawler would see a webpage—and in this case, that it’s been changed. And because of this update, the crawler would then take a snapshot of what the page looked like— including the date and time.
After this, it would continue its journey and visit a random page linked off the homepage. In this case it would see another previously crawled page that has also been updated since the previous crawl. And again, it would take snapshot—and then continue its crawl.
Next, the crawler would discover one of the new pages. The Jamaican team taking Bobsleigh gold. Hey—it had to happen one day!
Then, the crawler finds the page that was removed. And because the page now generates a 404 error, it’s considered a dead link. Dead links no longer offer content, so search engines remove them from their indexes. So to fix this, in addition to removing the page, the reference or link to the page, should also be removed.
From there, the crawler discovers that the Austrian team took the Ski Jump gold—and with a new world record.
And finally, the crawler returns to the index , passes back all the information recorded—and that signals the end of the crawl.
And of course, there isn’t just one crawler, there are many. All working at the same time—and recording important information. Information like, cute Fox Red Labradors. And delicious Welsh Cawl Stew. Both big favourites in our household and I expect, in many others too.
Step 2: Indexing
You’ll remember that the indexing stage is where search engines organise and file away webpages. This way, they can be quickly retrieved when serving relevant results for keyword searches. And incredibly, Google stores 130 trillion+ pages in its index.
When search engines review webpages, they look at key signals, so they can accurately classify them before filing them away in their index. This will include things like the keywords used in the page’s title and within the content; the topic of the page and how authoritative the content is perceived to be.
And that brings us to the final stage of how search engines work.
Step 3: Ranking
Search engines order webpages by ranking. Ranking systems are made up of a series of algorithms and their job is to:
• Analyse what it is you are looking for
• Then, decide and return the best information to you
There are believed to by 200+ ranking factors but fortunately, you don’t need to know them all intimately—and we’ll see in a later lesson that many are similar and can be placed into common categories.
So, just to recap, there are three stages to how search engines works: crawling, indexing and ranking.
And to recap further, the crawling stage is all about content discovery, which we’ll explore more detail in the technical optimisation module.
The indexing stage is to do with how search engines organise all the webpages they hold.
As for the ranking stage, that’s what the rest of this course is all about. You’ll understand what it takes to get traffic from search engines and attention from your target audience.
Crawl, Index and Rank
Now, here’s an important thing to remember. We all want our webpages to rank at the top of search engines, BUT this is not possible if the pages are not in the search engine index to begin with. And, of course, for this to happen the content first needs to be crawled by search engines.
Or to put it another way, if search engines can’t index your content, they can’t rank it—and for this to happen the content first needs to have been crawled.
But how do you know if a page is in Google’s index? Well, we’ll be covering that and more in the upcoming How to Use Google Like and SEO Pro lesson.