The Search Engine Professionals at Rank for $ales.com --- In business since 1997.
Back to our Homepage SEO Tips that will make a big difference in your rankings and our most popular ** How To ** section The most common myths about SEO -- Read what the experts have to say about today's most common SEO myths and misconceptions Frequently Asked Questions to Search Engine Optimization and Positioning Search Engine Optimization Industry News -- Stay in tune with the most recent developments in search engine technology and the SEO industry Contact Rank for $ales today and get your site's rankings high in the engines-- Right where they should be!

  SEARCH FOR:   CITY or STATE:

Search this site


Bayesian spam filters

November 28, 2003

On November 26, Stephen Lynch, journalist at the New York Post picked up the phone and initiated a telephone interview with me about an article I wrote on the previous day. The article was in relationship to the current November Google update dance, dubbed “Florida”.

The following day, Mr. Lynch wrote an article on me and it was published in the New York Post, offering his comments and, without being technical, explaining some of the negative effects such an update can have on the average site owner or Webmaster.

As the latest “Florida” monthly Google update ‘dance’ has shown us, having a website highly-ranked on the Internet’s number one search engine, Google- if your search rankings precipitously drop as much as some did and without warning, it can spell a devastating blow to some online stores or certain commercial websites.

In the last 10 days, a lot of articles have also been written by some of my colleagues, some in the SEO field and some, like Seth Finkelstein who are more in favour of the free flow of information that the Internet can provide. In this article, I will attempt to describe some of the spam-filtering techniques that Google is reported using during this Florida “dance”. This spam-filtering technology is based on the Bayesian algorithm, as it directly relates to the Bayesian spam filters used in the Google search engine.

The inner-workings of a spam filter for a search engine
For quite a long time now, Google’s search results have been under attack by search-engine spammers that continuously attempt to mask search results, in the end, cluttering the search engines with irrelevant information in their databases.

With the ever-growing popularity of Google and as it tries to handle more and more searching all over the Web, the temptation to foul the search results has become attractive to certain spammers, leading to substantial degradation in the quality and relevancy of Google’s search results.

Since Google is mostly concerned of quality search results that are relevant, it is now cracking down on these unscrupulous spammers, with new spam-filtering algorithms, using Bayesian filtering technology.

Of all the search engines in use today, Google probably has the most refined and sophisticated search algorithm called Page Rank™. The new Bayesian spam filters recently added act as a complement to Google's search algorithm already in place and are in an effort to 'win the war' on spam.

Fill out your e-mail address
to receive our free newsletter!

At the end of October 2003, Google deployed their new Bayesian anti-spamming algorithm, which appeared to have its search results crash when a previously identified spam site would have normally been displayed. In fact, the searching results were completely aborted when encountering such a spam-intended site.

The first shoe that fell
On or around November 5th, this spam problematic was in fact reduced significantly, resulting from the “kicking-in” of these new Bayesian spam filters. Although not perfect, this new Bayesian spam-filtering technology seemed to have worked, albeit there were some crashes in some cases.

On or about November 15th 2003, Google, as it always does every month, started dancing, performing its needed monthly and extensive deep crawl of the Web, indexing more than 3.5 Billion pages. This update had some rather strange results, in a way reminding some observers of a previous major algorithm change done in April of 2003, dubbed update Dominick, where similar and very unpredictable results could be noted across the Web.

It was generally observed that, many ‘old’ and high-ranking sites, some of which were highly regarded as ‘authoritative’, which were certainly not spammers in any way, appeared to fall sharply in their rankings or would disappear entirely from Google’s search results.

More on the Bayesian spam filter
Part of my research and the observations I have done in this matter point to the Bayesian spam filter that Google started to implement in late October. A Bayesian spam filter is a complex algorithm used in estimating the probability or the likelihood that certain content or material detected by Google is in fact spam. In its most basic format, the Bayesian spam filter determines if something "looks spammy" or if, on the other hand, it is relevant content that will truly help the user.

To a certain degree, the Bayesian algorithm has proven efficient in the war against spam in the search engines. Being ‘bombarded’ by spam as much as Google has been for the past couple of years, it has no choice but to implement such anti-spam safeguards to protect the quality and relevancy of its search results.

However, it is the general feeling in the SEO community that, unfortunately, the current Bayesian spam filter implementation seems to have extreme and unpredictable consequences that were practically impossible to be aware of beforehand.

On the outset, one of the problems with estimating the probability or likelihood that certain content does have spam in it is, given very huge datasets, such as the entire Internet for example, many “false success stories” can and will occur. It is exactly these false success stories that are at the centre of the current problem.

Since this whole event began to unwind, there are many people that have noted in tests and evaluations that, making the search more selective, differentiating such as trying to remove an irrelevant string tends to deactivate the new search results algorithm, which in turn effectively shuts down the newly-implemented Bayesian anti-spam solution at Google.

One more observation
While we are still on the subject of the new filter, but getting away from the topic of spam-related issues, as a side note, while doing some testing with the new Florida update, I did notice that Google is now “stemming”. To my knowledge, it’s the first time that Google does offer such an important search feature. How does stemming works? Well, for example, if you search for “reliability testing in appliances”, Google would suggest you “reliable testing in appliances”.

To a certain degree, variants of your search terms will be highlighted in the snippet of text that Google provide each accompanying result with. The new stemming feature is something that will certainly help a lot of people with their searching. Again, Google tries to make its searches the most relevant and this new stemming feature seems like a continuation on these efforts.

Fill out your e-mail address
to receive our free newsletter!

Conclusion
In retrospect, and in re-evaluating all the events that have happened on this major dance, it is clear that Google is still experimenting with its newly-implemented algorithm and that there are many important adjustments that will need to be done to it to make it more efficient. Spam being a growing problem day by day, today’s modern search engines have no choice other than to implement better and more “intelligent” spam-filtering algorithms that can make the difference between what is considered as spam and what isn’t.

The next 30 days can be viewed by some as being critical in the proper "fine-tuning" and deployment of this new breed of application in the war against spam. How the major search engines do it will be crucial for some commercial websites or online storefronts that rely solely on their Google rankings for the bulk of their sales.

In light of all this, perhaps some companies in this position would be well advised in evaluating other alternatives such as PPC and paid inclusion marketing programs as complements. At any rate, it is my guess that search will continue to be an important and growing part of online marketing, both locally, nationally and on a global basis.

References:
1) An anticensorware investigation by Seth Finkelstein
http://sethf.com/anticensorware/general/google-spam.php

2) Better Bayesian filtering by Paul Graham
http://www.paulgraham.com/better.html

Article written by Serge Thibodeau,
President & CEO,
Rank for $ales
Copyright (c) Serge Thibodeau 2003

Unless otherwise specified, all content and material on this site is copyrighted by Serge Thibodeau of rankforsales.com and may not be reproduced by any means without express written permission. Using my content without permission is a theft of my work. Please contact sthibodeau@rankforsales.com to discuss certain reprint options that would be acceptable.

You can read some of Serge Thibodeau's exclusive comments that are not posted on this website. Visit his personal blog by clicking here. For hardware, software or IT-related technology questions, it is recommended you visit www.techblog.org

We strongly suggest you bookmark our web site by clicking here.


Tired of receiving unwanted spam in your in box? Then get SpamArrest™ and put a stop to all that nonsense. Click here to get all the details.
Tired of receiving unwanted spam in your in box? Get SpamArrest™ and put a stop to all that SPAM. Click here and get rid of SPAM forever!

Get your business or company listed in the Global Business Listing directory and increase your business. It takes less then 24 hours to get a premium listing in the most powerful business search engine there is. Click here to find out all about it.

Rank for $ales strongly recommends the use of WordTracker to effectively identify all your right industry keywords. Accurate identification of the right keywords and key phrases used in your industry is the first basic step in any serious search engine optimization program. The keywords you think are the best may be totally different than the ones recommended by WordTracker. Click here to start your keyword and key phrase research.

Back to the top of the page.         
Pay Rank for $ales securely with your Visa, MasterCard, Discover, or American Express credit card through the secure PayPal network. (Note: PayPal is an eBay company, and maintains a net free capital of US $ 50 Million).
VisaMasterCardDiscoverAmerican Express

You can link to the Rank for Sales web site as much as you like. Read our section on how your company can participate in our reciprocal link exchange program and increase your rankings in all the major search engines such as Google, AltaVista, Yahoo and all the others.

Powered by Sun Hosting          Protected by Proxy Sentinel™          Traffic stats by Site Clicks™

Site design by GCIS              SEO enhanced by Pagina+™            Online sales by Web Store™


Call Rank for Sales toll free from anywhere in the US or Canada:   1-800-631-3221
email:   info@rankforsales.com

| Home | SEO Tips | SEO Myths | FAQ | SEO News | Articles | Sitemap | Contact |


Copyright © Rank for Sales 2003    Terms of use    Privacy agreement    Legal disclaimer

          Ce site est disponible en Français