I love Google. I love almost everything about them. I love their motto (Do No Evil), I love their employee-focused work environment, and of course I love their web/image/news/video search tools. I also love most of their other products – Google Maps was ground breaking when it was introduced, Gmail has taken web-based email to the next level, and I use Google Reader everyday as my news and blog aggregator. I’m a devoted Google-lover because everything they do is so Googley.
But one of Google’s products stands out as very Un-Googley. You may not know it, but Google has packaged up their search technology into a standalone appliance that it sells for companies to use inside their firewall as an Enterprise Search Engine. Enterprise Search Engines are used to find corporate information that has been stored in various nooks and crannies of file systems, internal web sites, and content management repositories. Although Google has the best Internet search engine in the world, it has one of the worst Enterprise Search Engines. To find out why, let’s start from the beginning.
Back in the 90s, there were a lot of researchers thinking about search technology. Information on the Internet was growing at an astounding rate (it still is) and was driving the market for search tools through the roof. Back then, the main problem researchers were trying to solve was the “query expression” problem – that is, they wanted to make it easier for users to tell the computer what they were looking for. A lot of effort was put into natural language processing, and out of this research came search engines like Ask Jeeves, where the user could literally type in a question like “What store has the least expensive shoes?”
When Google came along, it blew these search engines out of the water. It turns out that all these researchers were trying to solve the wrong problem. Google showed us that the hard problem in search technology was not in expressing the query, it was in expressing the results – specifically, in placing the most relevant results at the top of the list.
Prior to Google, search engines determined relevancy by how many times the search term appeared in the web page and by where it appeared. If the word “Penguin” appeared in the title of the page and also appeared several times in the body of the page, that page was deemed to be more relevant to penguins than a page that only contained the word once. This led unscrupulous web masters to game the system, filling their pages with the most popular search terms repeated over and over, often in a tiny font the same color as the background. It wasn’t uncommon for these unscrupulous web sites to be near the top of a search engine’s result list.
Google uses a different approach for determining if a web page is relevant. Each time a web page links to another web page, Google treats that link as a “vote”. Web pages that get the most votes are considered to be the most relevant. This simple algorithm, called PageRank, uses a simple premise – if your page is the most popular (a lot of other pages link to it), it is likely to be the most relevant.
This approach works amazingly well for Internet searches, and I almost never have to look at the second page of results when I search for something on Google.
The problem with Google’s Search Appliance is that although the PageRank algorithm works great for web pages, it doesn’t work at all for Word documents, Excel spreadsheets, PowerPoint presentations and the like, because these documents don’t link to each other the way web pages do. And the information that companies most want to search for is contained within office documents like these. Google’s inside-the-firewall search appliances aren’t any better than Internet search engines of the early 1990s.
The biggest issue with all of this is that Google’s search appliances diminish their brand. When I use Google inside my firewall, I have to scroll through pages of results to find the document I’m looking for. This just doesn’t measure up to the expectations that I have for a Google product – it’s just not Googley enough.