How Search Engines Work

by Paul O'Brien / Tuesday, 09 October 2007 / Published in Advertising, Search

About a week ago, Kosmix, one of the few emerging businesses doing anything significant in Search, announced the release of a search technology that gives them a significant advantage in the search space: the Kosmos File System.

With all our talk of algorithms and indexes, I’m sure you realize that an engine isn’t as simple as a search box and a keyword query but are you familiar with what is really involved and how it affects us? The most fascinating experience for me, of late, has been working with technology behind search and learning that the implications and opportunities are worth your attention as it is this foundation that drives our industry.

As best I can put it, there are 4 critical components to a search engine and it is the combination of these components that make the difference between Google and a simple site search function:

Content and Data

Science

Software

Distributed Systems (servers)

A site search box, which is what most websites not using Google Custom Search use, really only needs content and an algorithm (albeit a simple one = if keyword then result) to return results. Search engines must have all 4 components – good search engines excel at them.

Kosmix’s Kosmos File System (KFS) is a significant enhancement to that forth consideration but before I go into how and why, let’s look at each component.

Content and Data

A better part of Search Engine Optimization is about supporting this piece of the puzzle (not the “algorithms”). Much less important than keyword optimization, density, anchor text, or even page titles is exposure of your content to search engines so they can index your site and serve your page for respective queries. No matter how good your titles and page copy, you don’t exist to a search engine if they can’t get to you. Search engines that aggregate content well (Google) excel, those that don’t (ahem… Yahoo) frustrate us to no end.
Think about this not from the context of your role as marketer but as user: Content is a critical pillar of a Search Engine’s popularity – users will use an engine that returns relevant results and abandon one that fails to return what they seek.
Not to suggest this pillar is so simple as aggregating content, good search engines also leverage the data flowing in to their platform by user behavior: popularity of results, search query stream, click stream. Mining that data makes all the difference in the world.

Science

Now let’s talk about the algorithm. True search relevance is not determined merely by keywords but clicks, search activity, popularity, the quality of content, and context of the visit (personalization). Simply enlightened, Google’s algorithm takes these into account with Page Rank (PR), click stream analysis, inlink anchor text, and page keywords. The best example of the importance of the science is the comparison between site search and a search engine. A site search box returns results to the user by looking at the keywords in the title and descriptions of your pages; returning results based on keyword density – No context, no weight based on demand, no measure of the timeliness for your content.

Software

The algorithm only directs the engine; a platform is required to model data, crunch results, and apply that algorithm to the content the engine has crawled. Google’s does this using Map/Reduce (I hate linking to Wikipedia!! increasing their credibility, but that is one of those terms better explained by an encyclopedia) and BigTable which is a massive scale storage technology that allows them to better mine time indexed data and directly support data intensive applications like personalized search, analytics, and ad targeting. Now, Google has a significant advantage with Bigtable in that it allows them to scale by simply adding more commodity servers with very little intervention as it automatically handles load balancing.
Zvents is working in this space with HyperTable, an exceptional technology to be released shortly. Until then, with both of Google’s technologies being proprietary to their engines, Google’s dominance can only be undone with work from the industry here (no it is not the better algorithm that makes the engine); this is where search engines are really competing. Well, this and…

Distributed Systems

I oversimplify above when I call this the “servers” on which search engines depend though that is a part of it. As you can imagine, search engines need thousands of servers to house the terabytes of data and manage the loads driven by millions of users submitting unique queries. I’ve oversimplified because the real challenge for engines is in data mining that distributed system (data across multiple servers). Google’s distributed file system, GFS, is what gives them a platform advantage, today, over all other engines. Kosmix’s KFS, an open source release, is the next big step in distributed systems following HDFS (Hadoop) and Gluster (GNU Cluster Distribution) which are other iterations the space. What makes KFS so monumental is its exceptional development allowing businesses to easily scale beyond a single machine to build applications and run data mining clusters at parity with Google.

So, why does this matter to you?
It is not all about the algorithm and SEO is about more than keywords. Pay attention to content and keep your eye on the real players in the Search Engine space who are keeping Google on their toes and delivering to users and businesses the future of Search.

Tagged under: bing, Google, Search, search engines

6 Comments to “How Search Engines Work”

dave says :

October 10, 2007 at 12:57 am

Who would you consider to be the “real players” in search? With so many verticals, it’s hard to keep up.
Marty says :

October 10, 2007 at 3:44 pm

Nice post. I stumbled it :).
Improving Search through Google's Distributed File System | SEO'Brien - Search & online marketing blog says :

November 8, 2007 at 7:14 pm

[…] those of you interested my post last month about how search engines really work, Doug Judd has written a slightly technical brief about the Google File System and specifically, […]
Yahoo, Microsoft, Google oh my! | SEO'Brien - Search & online marketing blog says :

February 6, 2008 at 9:56 pm

[…] Site – Google Adsense Links to YouTubeHow to Explain SEO to the Illiterate – the Library AnalogyHow Search Engines WorkIs Your Paid Seach Vendor Ignorant (or […]
Is the media helping or hurting our understanding of SEO? | SEO'Brien - Search & online marketing blog says :

July 25, 2008 at 8:23 pm

[…] done some amazing work but make sure you take the hype with a grain of salt. Really understand how search engines work if you have any expectation of success as a search marketer. Ethan’s perspective is […]
Vishal Sharma says :

May 12, 2021 at 7:44 pm

Thank you for sharing. I have been obsessed with SEO recently so this is on top of my reading list. So much power in using them right.

How Search Engines Work

Content and Data

Science

Software

Distributed Systems

Related articles:

6 Comments to “How Search Engines Work”

Leave a Reply

How Search Engines Work