Tuesday, March 16, 2004
Understanding Google and other search engines.
First, a side note. I had originally read this article here:
Search Beyond Google
However, it is no longer available there -- you have to pay for a subscription if you want to read it now. The same article is still available here:
Search Beyond Google (minus the illustrations)
How does that help anything?
The thing that is interesting about the article is its look "behind the scenes" at how search engines perform some of their tricks, and how the idea of "search" will be changing in the very near future. For example:
I'm still looking for a way to solve this simple search problem: "How Many _____ Are There." These are really simple questions, like, "How many teenagers are there in the United States?" or "How many cars did Ford make last year?" If you type either of those questions into Google, you get gibberish back. So you end up trying to find obtuse ways to ferret out the answer. "Number teens million" might be a typical attempt. And forget it if you want to know the answer for 1982. "How much" questions can be just as bad. Easy solutions for simple questions like that would be great.
ARCHIVES
Search Beyond Google
However, it is no longer available there -- you have to pay for a subscription if you want to read it now. The same article is still available here:
Search Beyond Google (minus the illustrations)
How does that help anything?
The thing that is interesting about the article is its look "behind the scenes" at how search engines perform some of their tricks, and how the idea of "search" will be changing in the very near future. For example:
- Take Microsoft Research's AskMSR program, which Brill and his colleagues have been testing on Microsoft's internal network for more than a year. At its core is a simple search box where users can enter questions such as "Who killed Abraham Lincoln?" and, instead of getting back a list of sites that may have the information they seek, receive a plain answer: "John Wilkes Booth." The software relies not on any advanced artificial-intelligence algorithm but rather on two surprisingly simple tricks.
First, it uses language rules learned from a large database of sample sentences to rewrite the search phrase so that it resembles possible answers: for example, "__ killed Abraham Lincoln" or "Abraham Lincoln was killed by __." Those text strings are then used as the queries in a sequence of standard keyword-based Web searches. If the searches produce an exact match, the program is done, and it presents that answer to the user.
In many cases, though, the program won't find an exact match, but only oblique variations on the text strings, such as "John Wilkes Booth's violent deed at the Ford Theater ended Lincoln's second term before it had started." That's okay, too. As its second trick, AskMSR reasons that if "Booth" frequently appears in the same sentence as "Lincoln," there must be an important relationship between them-which allows it to posit an answer, even if it's not 100 percent confident.
"We are tapping into the redundancy of the Web," explains Brill. "If you have a lot of places where you are somewhat certain that you have found the answer, the redundancy makes it more certain." As the Web grows, so will its redundancy, making AskMSR ever more powerful, Brill reasons. While plans for AskMSR aren't definite, Brill believes the code will see the light of day, perhaps as part of a future Microsoft search engine.
I'm still looking for a way to solve this simple search problem: "How Many _____ Are There." These are really simple questions, like, "How many teenagers are there in the United States?" or "How many cars did Ford make last year?" If you type either of those questions into Google, you get gibberish back. So you end up trying to find obtuse ways to ferret out the answer. "Number teens million" might be a typical attempt. And forget it if you want to know the answer for 1982. "How much" questions can be just as bad. Easy solutions for simple questions like that would be great.
- 05/01/2003 - 06/01/2003
- 06/01/2003 - 07/01/2003
- 07/01/2003 - 08/01/2003
- 08/01/2003 - 09/01/2003
- 09/01/2003 - 10/01/2003
- 10/01/2003 - 11/01/2003
- 11/01/2003 - 12/01/2003
- 12/01/2003 - 01/01/2004
- 01/01/2004 - 02/01/2004
- 02/01/2004 - 03/01/2004
- 03/01/2004 - 04/01/2004
- 04/01/2004 - 05/01/2004
- 05/01/2004 - 06/01/2004
- 06/01/2004 - 07/01/2004
- 07/01/2004 - 08/01/2004
- 08/01/2004 - 09/01/2004
- 09/01/2004 - 10/01/2004
- 10/01/2004 - 11/01/2004
- 01/01/2005 - 02/01/2005
- 02/01/2005 - 03/01/2005
- 03/01/2005 - 04/01/2005
- 04/01/2005 - 05/01/2005
- 05/01/2005 - 06/01/2005
- 06/01/2005 - 07/01/2005
- 07/01/2005 - 08/01/2005
- 08/01/2005 - 09/01/2005
- 09/01/2005 - 10/01/2005
- 10/01/2005 - 11/01/2005
- 11/01/2005 - 12/01/2005
- 12/01/2005 - 01/01/2006
- 02/01/2006 - 03/01/2006
- 03/01/2006 - 04/01/2006
- 04/01/2006 - 05/01/2006
- 05/01/2006 - 06/01/2006
- 06/01/2006 - 07/01/2006
- 07/01/2006 - 08/01/2006
- 08/01/2006 - 09/01/2006
- 09/01/2006 - 10/01/2006
- 10/01/2006 - 11/01/2006
- 11/01/2006 - 12/01/2006
- 12/01/2006 - 01/01/2007
- 01/01/2007 - 02/01/2007
- 02/01/2007 - 03/01/2007
- 03/01/2007 - 04/01/2007
- 05/01/2007 - 06/01/2007
- 07/01/2007 - 08/01/2007
- 09/01/2007 - 10/01/2007
- 07/01/2008 - 08/01/2008