Clustering technology used at GoogleOctober 12, 2004 Google just gave a sneak preview of its next steps to improve Internet search, and clustering technology played a critical role. During a panel discussion of research lab leaders at the Web 2.0 conference here, one of Google's top researchers previewed the search company's work in clustering both entities and words as a way to better glean users' intentions and distill information on the Web. Another space in Google's research net is statistical machine translation for turning Web pages into other languages, said Peter Norvig, director of search quality at Google. "[We're] trying to go just beyond keywords and the linking structure of the Web, the innovation that we brought to search, and get behind the deeper meaning," Norvig said during his presentation. In clustering, Norvig demonstrated a six-month-old project called "named entities abstraction," where Google's researchers are analyzing the company's large Web index to extract entities—such as the name of a company—from the structure of content and then decipher their relationship to one another.
Get the best
Linux or Windows Web hosting plan for your website.
For example, Norvig said, researchers are looking for ways to break down sentences by looking for a phrase like "such as" and grabbing the names that follow it. The goal is to not only pull out the name but also its clusters, so that a name such as "Java" can be associated both with the computer language and with language in general, Norvig said. "We want to be able to search and find these [entities] and the relationships between them, rather than you typing in the words specifically," Norvig said. With word clustering, the focus is on making the search engine better at understanding the multiple meanings of a word, Norvig said. Google started working on word clustering about three years ago. Apropos of the heated U.S. presidential election, Norvig demonstrated a prototype of word clustering with results both for President Bush and for his Democratic contender, Sen. John Kerry. Bush appeared in clusters for words around "president" and "White House," to name some examples, but the results drew laughter when he also appeared in descriptive categories such as "idiot" and "chimp." "This is what the Web says, not my opinion," Norvig said following the laughter. Kerry appeared within groups for "senator" and for his wife, "Teresa Heinz Kerry," as well as for "Bob Kerry," a former senator with whom some people may confuse him. None of the clustering approaches is publicly available, though Norvig said in an interview following the panel that they may become Google Labs betas in the future. Google Labs often prototypes features and services publicly that, sometimes, become new offerings. News alerts and Google's local search are among the labs' graduates. "Certainly one application for clusters is in results pages, and it may be something we do at some time," Norvig said in the interview. A growing number of search startups have targeted the automatic clustering of search results. Vivisimo Inc., one of the best-known startups that recently launched Clusty search site, groups results gathered from other search engines into clusters, or categories, as a way of drilling down into results. Grid computing has been powered on. Are you taking advantage? Join the eSeminar "Tapping the Juice In Underused Servers" on Oct. 13 at 2 p.m. EDT with eWEEK.com Database Center Editor Lisa Vaas and topical experts.
Montreal Web Design will
build a great-looking website for your business.
While it might make sense for startups to deploy clustering technology today, Norvig said, Google still views the technology as too immature. It is most useful only for a small percentage of search results, he said, so Google is focusing on improving the technology and increasing its usefulness. "Our take is that the state of the art is not there yet," Norvig said. With machine translation, Google is bringing to bear its formidable Web index—which at last count included 6 billion documents, images and items—as well as its computing resources. Google is well-known for having one of the largest clusters of Linux-based servers, which number in the thousands. Google already provides a Web-page translation feature, but Norvig said it is based on technology from a third party. Its research project is based on homegrown technology that eventually could translate Web pages and links more automatically, he said. Source: SF Gate.com Read Serge Thibodeau's daily blogs on search engines at Serge Thibodeau Live. We strongly suggest you bookmark our web site by clicking here. Tired of receiving unwanted spam in your in box? Get SpamArrest™ and put a stop to all that SPAM. Click here and get rid of SPAM forever! Get your business or company listed in the Global Business Listing directory and increase your business. It takes less then 24 hours to get a premium listing in the most powerful business search engine there is. Click here to find out all about it. Rank for $ales strongly recommends the use of WordTracker to effectively identify all your right industry keywords. Accurate identification of the right keywords and key phrases used in your industry is the first basic step in any serious search engine optimization program. Click here to start your keyword and key phrase research. You can link to the Rank for Sales web site as much as you like. Read our section on how your company can participate in our reciprocal link exchange program and increase your rankings in all the major search engines such as Google, AltaVista, Yahoo and all the others. Powered by Sun Hosting Sponsored by Avantex Traffic stats by Site Clicks™Site design by Mtl. Web D. Sponsored by Press Broadcast Sponsored by Blog Hosting.ca Call Rank for Sales toll free from anywhere in the US or Canada: 1-800-631-3221
email: info@rankforsales.com | Home | SEO Tips | SEO Myths | FAQ | SEO News | Articles | Sitemap | Contact | Copyright © Rank for Sales 2003 Terms of use Privacy agreement Legal disclaimer Ce site est disponible en Français |