Insurance against potential penaltiesSeptember 3, 2004 With the Robots.txt protocol, a webmaster or web site owner can really protect himself if it is done correctly. Today, web domain names are certainly plentiful on the Internet. There exists a multitude of sites on just about any subject anybody can think of. Most sites offer good content that is of value to most people and can certainly help with just about any query. However, like in the real world, what you see is not always what you get. There are a lot of sites out there that are spamming the engines. Spam is best defined as search engine results that have nothing to do with the keywords or key phrases that were used in the search. Enter any good SEO forum today and most spam topics in daily threads usually point to hidden text, keyword stuffing in the meta tags, doorway pages and cloaking issues. Thanks to newer and more powerful search engine algorithms, these domain networks that spam the engines are increasingly being penalized or banned all together. The inherent risks of getting a web site banned on the basis of spam increases proportionately if it appears to have duplicate listings or duplicate content. Rank for $ales does not recommend machine-generated pages because such pages have a tendency of generating spam. Most of those so-called �page generators� were not designed to be search engine-friendly and no attention was ever given to engines when they were designed. One major drawback of these �machines� is that once a page is �optimized� for a single keyword or key phrase, first-level and at times second-level keywords tend to flood results with listings that will most assuredly look as 100% spam. Stay away from any of those so-called �automated page generators�. A good optimization process starts with content that is completely written by a human! That way, you can be certain that each page of your site will end up being absolutely unique. How do search engines deal with duplicate content? We will take this practical example of where there are actually three identical web sites, all owned and operated by the same company, where the use of duplicate content is evident. Google, Alta-Vista and most other crawler-based search engines have noticed and indexed all three domains. In this scenario, the right thing to do is to make use of individual IP addresses and implementing a server re-direct command (a 301 re-direct). An alternative to this would be to at least provide unique folders or sub-directories and using the Robots.txt exclusion protocol to disallow two of the three affected domains. That way the search engines wouldn�t index the two duplicate sites. In such cases, the Robots.txt exclusion protocol should always be used. It is in fact your best �insurance� against getting your site penalized or banned. In the above example, since that was not done we will look at duplicate content and assess where the risk of getting a penalty is the highest. We will list and describe the indexing of these three sites as being site one which is the main primary domain, site two and finally, site three. The four major crawler-based engines that were analyzed were Google, Teoma, Fast and Alta-Vista. All three domain names point to the same IP address, which actually made it simpler to use Fast's Internet Protocol filter to discover that there was really no more than three affected domains in this example. However, all three web sites are directed to the same IP address AND content folder! Such a scenario makes them in fact exact duplicates, raising all the duplicate content flags in all four engines analyzed. Even if all three sites share the same Robots.txt file, the hosting arrangement and syntax in the Robots.txt file does nothing that is effective to help this duplicate content problem. Major spider-based search engines which rely a lot on hypertext to compute relevancy and importance as most do today, are best at discovering and dealing with sites that delve into duplicate content issues. As a direct result, a webmaster runs a large risk of having duplicate content in these engines because their algorithm makes it such a simple task to analyse, sort out and finally reject these duplicate content web sites. If a �spam technician� discovers duplicate listings, chances are very good they will take action against such offending sites. The chances actually increase when a person, often a competitor files a spam complaint or that a certain site is �spam-dexing� the engines. To be sure, any page caused by duplicate content can improperly "populate" a search query. The end result is unfairly dominating most search results. Marketing analysis and PPC �landing� pages Your index count will certainly decrease, but that is the right thing to do and you are actually performing the search engines a service. In such a case, a webmaster needs not to worry of impending penalties from the engines. If these businesses or their marketing departments are in fact running marketing tests or surveys, there is usually more than just one domain that could potentially appear in the actual results pages of the engines. In such cases, I strongly recommend writing or re-writing all content all over and making certain that no real duplicate content gets to be indexed. One way to achieve that is to use some form of meta refresh tag or Java script solution to actually direct visitors to the most recent versions of pages while their webmasters get the Robots.txt exclusion protocol written correctly. The Java script would effectively indicate where it is intended to redirect, assuring it can put the final document in its proper place. A �301 server redirect� command is always the best thing to use in these cases and constitutes the best insurance against any penalties, as it will inform the search engines that the affected document (s) have in fact moved permanently. (Updated from my original February 2000 article). Article written by Serge Thibodeau, Unless otherwise specified, all content and material on this site is copyrighted by Serge Thibodeau of rankforsales.com and may not be reproduced by any means without express written permission. Using my content without permission is a theft of my work. Please contact sthibodeau@rankforsales.com to discuss certain reprint options that would be acceptable. You can read some of Serge Thibodeau's exclusive comments that are not posted on this website. Visit his personal blog by clicking here. For hardware, software or IT-related technology questions, it is recommended you visit www.techblog.org We strongly suggest you bookmark our web site by clicking here. Tired of receiving unwanted spam in your in box? Get SpamArrest� and put a stop to all that SPAM. Click here and get rid of SPAM forever! Get your business or company listed in the Global Business Listing directory and increase your business. It takes less then 24 hours to get a premium listing in the most powerful business search engine there is. Click here to find out all about it. Rank for $ales strongly recommends the use of WordTracker to effectively identify all your right industry keywords. Accurate identification of the right keywords and key phrases used in your industry is the first basic step in any serious search engine optimization program. The keywords you think are the best may be totally different than the ones recommended by WordTracker. Click here to start your keyword and key phrase research. You can link to the Rank for Sales web site as much as you like. Read our section on how your company can participate in our reciprocal link exchange program and increase your rankings in all the major search engines such as Google, AltaVista, Yahoo and all the others. Powered by Sun Hosting Protected by Proxy Sentinel� Traffic stats by Site Clicks�Site design by GCIS SEO enhanced by Pagina+� Online sales by Web Store� Call Rank for Sales toll free from anywhere in the US or Canada: 1-800-631-3221
email: info@rankforsales.com | Home | SEO Tips | SEO Myths | FAQ | SEO News | Articles | Sitemap | Contact | Copyright � Rank for Sales 2003 Terms of use Privacy agreement Legal disclaimer |