Google's SafeSearch filter too sensitive?April 23, 2004 The domain name of PartsExpress.com includes an unfortunate string of letters, "sex," which is enough to block the Web site from Google's 'Adult Content' filtered results. PartsExpress.com proudly touts itself as the Net's No. 1 source for audio, video and speaker components--but online shoppers who rely on an optional feature in the Google search engine to block porn sites would never know it. Ironically, PartsExpress.com is not alone. A CNET News.com investigation shows that Google's SafeSearch filter technology incorrectly blocks many innocuous Web sites based solely on strings of letters such as "sex," "girls" or "porn" embedded in their domain names. Google's SafeSearch flaws are more than academic--they can have serious consequences for innocent Web site operators blocked out by them. Google is the most widely used search engine on the Web, and failure to appear in its listings can have a direct impact on sales for some companies, particularly smaller enterprises with limited marketing budgets. Research company WebSideStory reported last month that Google claimed an all-time high in search referrals, 41 percent of the United States total, and the search giant's market share is steadily expanding. "Traffic from Google can make or break a business," said Maria Medina, whose family-run clothing business at ALittleGirlsBoutique.com doesn't pass the SafeSearch censor. "Here I am, a mom of four children, creating an at-home business that sells little girl dresses and accessories, in order to spend more time with my children, and I have been filtered out as not being family friendly. Ridiculous." Matt Cutts, the Google engineer who designed SafeSearch four years ago, said his algorithm looks for a "relatively small" number of trigger words in a Web page's address. If one of those words appears, the SafeSearch algorithm puts the address on a block list and does not take the next step of evaluating the content of the site. "We try to find the best trade-off of precision, recall and safety," Cutts said. "People who opt in to SafeSearch are mostly OK with us being on the conservative side." Cutts would not disclose how many Web searches are done with SafeSearch enabled, saying only that it's a small percentage of the millions of queries handled by Google each day. But the sloppy filter stands out as a rare black eye for a company that prides itself on superior search technology and boasts on its payroll one of the world's highest concentrations of computer science doctoral degrees. Google claims SafeSearch "uses advanced proprietary technology that checks keywords and phrases" and filters out only Web pages "containing pornography and explicit sexual content." "That's not very bright," said Karen Schneider, a librarian who runs the Librarians' Index to the Internet and has made a study of filtering software. SafeSearch is "certainly evocative of the very primitive CyberSitter-type tools of the mid-1990s--not a tool of fairly sophisticated development." The Scunthorpe problem SafeSearch is "evocative of the very primitive CyberSitter-type tools of the mid-1990s--not a tool of fairly sophisticated development," says Karen Schneider, a librarian who runs the Librarians' Index to the Internet. Other Web sites misidentified by SafeSearch because of "sex" in their domain names include ArkansasExtermination.com, which claims to offer the "best in termite and pest control." The owner of the business, who declined to give his name, said he was puzzled by Google's categorization: "My brother wrote the Web site. I don't know anything about that." SafeSearch also marked as unsafe for children JewishSussex.com, a religious Web site; EssexCountyBeeKeepers.org of Topsfield, Mass.; BluesExcuse.SouthBurnett.com.au, an Australian blues band's site; BassExpert.com; and the Anglo-Saxon history site RomansInSussex.co.uk. Gareth Roelofse, the Web designer of RomansInSussex.co.uk, said his filtering complaints are broader than just Google. "We also found many library Net stations, school networks and Internet cafes block sites with the word 'sex' in" the domain name, Roelofse said. "This was a challenge for RomansInSussex.co.uk because its target audience is school children." "I think it would be nice if Google would have a 'white list' for sites like ours, but this would involve human man-hours, I guess," said Roelofse, who designed the site on behalf of the Sussex Archaeological Society and local museums. Cutts, the Google software engineer, noted that the SafeSearch Web page permits visitors to contact the company with complaints. "In most cases it's a pretty unambiguous usage," Cutts said about the word "sex" in domain names and Web addresses. "No filter can be 100 percent accurate. We're always willing to take a fresh look at our filter and see how we can improve it." Google is not alone in seeking to lure searchers worried about encountering online raunch and ribaldry: Yahoo offers a "mature Web content" search filter, and Ask Jeeves has set up a separate Web site for kid-friendly searches. But Yahoo's filter isn't as hypersensitive as Google's, and lists domains mentioning Sussex, Essex and Scunthorpe as acceptable. The flaws in Google's filter have persisted despite research published about a year ago that highlighted overblocking in SafeSearch. An April 2003 report from Harvard University's Berkman Center described similar but less extensive problems with SafeSearch. That report said some news articles and political Web sites were filtered. David Drummond, Google's vice president for business development, said that at the time of its development, SafeSearch was designed to be overly cautious. "The thinking was that SafeSearch was an opt-in feature," Drummond said. "People who turn it on care a lot more about something sneaking through than they do about something getting filtered out." "Plainly silly" blocking "People who opt in to SafeSearch are mostly OK with us being on the conservative side", says Matt Cutts, the Google engineer who designed SafeSearch. "None of that surprises me," said Barry Steinhardt, director of the American Civil Liberties Union's (ACLU) technology and liberty program. "The evidence that we put on in the library filtering case shows that it's very difficult to do filtering without being overinclusive, without blocking things that are just plainly silly. That's the reality of relying on blocking: You're going to block a lot of legitimate material." The ACLU, which has warned against buggy filters since publishing a report on the topic in 1997, unsuccessfully sued to overturn a federal law compelling public libraries to install filtering products. "In the end, the lists are proprietary," Steinhardt said. "Without access to the lists, you don't know precisely what's being blocked. You have to rely on the authors of the lists to have the right judgment." The word "girls" also tends to lead SafeSearch astray. It incorrectly blocks the Web sites of the private school GirlsSchoolOfAustin.org; the bridesmaid dress shop DressyGirls.com; TatuGirls.com, a Russian band's site; and TheCalicoGirls.com, a Web site devoted to cat poetry. "Porn" in a domain name can confuse SafeSearch just as thoroughly. It won't display Pornichet.org, devoted to improving tourism for the French seaside town of Pornichet; SpornGroup.com, a New York-based business consultancy; Sporn.com, which sells dog leashes; PornkRocks.com, a site devoted to the band Pornk; and Anti-Kinderporno.de, a German effort to oppose child pornography. Aaron Wolfe, information systems director for SafeSearch-banned PartsExpress.com, said the company is planning to excise that unfortunate string of letters from its domain name. "We are going to modify our domain name to Parts-Express.com," Wolfe said, adding that the renaming will also help "get around spam filters on e-mail servers." Source: C-Net News Read Serge Thibodeau's daily blogs on search engines at Serge Thibodeau Live. We strongly suggest you bookmark our web site by clicking here. Tired of receiving unwanted spam in your in box? Get SpamArrest™ and put a stop to all that SPAM. Click here and get rid of SPAM forever! Get your business or company listed in the Global Business Listing directory and increase your business. It takes less then 24 hours to get a premium listing in the most powerful business search engine there is. Click here to find out all about it. Rank for $ales strongly recommends the use of WordTracker to effectively identify all your right industry keywords. Accurate identification of the right keywords and key phrases used in your industry is the first basic step in any serious search engine optimization program. Click here to start your keyword and key phrase research. You can link to the Rank for Sales web site as much as you like. Read our section on how your company can participate in our reciprocal link exchange program and increase your rankings in all the major search engines such as Google, AltaVista, Yahoo and all the others. Powered by Sun Hosting Sponsered by Avantex Traffic stats by Site Clicks™Site design by Mtl. Web D. Sponsered by Press Broadcast Sponsered by Blog Hosting.ca Call Rank for Sales toll free from anywhere in the US or Canada: 1-800-631-3221
email: info@rankforsales.com | Home | SEO Tips | SEO Myths | FAQ | SEO News | Articles | Sitemap | Contact | Copyright © Rank for Sales 2003 Terms of use Privacy agreement Legal disclaimer Ce site est disponible en Français |