New search engine for handwritten historical documentsDecember 6, 2004 If you need to search through handwritten documents, such as the approximately 140,000 pages that constitutes George Washington’s personal papers in the Library of Congress, you now have access to a new powerful new search engine, a first-of-its kind manuscript retrieval system developed at MIT. The search tool has been developed by the Center for Intelligent Information Retrieval in the computer science department at UMass Amherst. R. Manmatha, research assistant professor of computer science, along with graduate students Toni Rath and Victor Lavrenko, have created a demonstration of their search tool using 1,000 scanned pages of Washington’s manuscripts.
Increase your site traffic with a
paid inclusion
program
Manmatha says the computer interface is similar to the popular computer search engine Google. The scanned pages of Washington’s papers can be searched by typing in a word such as “Washington” or “Virginia,” and the program produces a list of ranked pages showing where they appear. Manmatha says, “Right now, searching a scanned handwritten document is very hard to do. Scanned historical documents are basically images, or pictures, and currently can only be searched if someone manually transcribes the documents or creates and index of their contents. This is time consuming and expensive to do. Given the cost, most handwritten documents are never transcribed or indexed,” Manmatha says. “But there is an enormous amount of handwritten, historical material. According to Toni Rath, “The basic idea is analogous to searching text documents in one language, say French, using queries in another language, say English. This is usually done by learning models from documents written in both languages. By analogy, our system learns from a parallel body of transcribed scanned images.
Leasing links to your website will boost your search engine visibility
That is, the word images form a ‘visual language’ and the transcriptions are in English.” Once the model is learned it may be used for searching scanned pages for which no transcriptions are available. A research paper describing the work was presented this summer at the leading information retrieval conference – the 27th Annual International ACM SIGIR conference in Sheffield, England. The work is partly funded by a grant from the National Science Foundation and the National Endowment for the Humanities. Source: University of Massachusetts (Amherst) Read Serge Thibodeau's daily blogs on search engines at Serge Thibodeau Live. We strongly suggest you bookmark our web site by clicking here. Tired of receiving unwanted spam in your in box? Get SpamArrest™ and put a stop to all that SPAM. Click here and get rid of SPAM forever! Get your business or company listed in the Global Business Listing directory and increase your business. It takes less then 24 hours to get a premium listing in the most powerful business search engine there is. Click here to find out all about it. Rank for $ales strongly recommends the use of WordTracker to effectively identify all your right industry keywords. Accurate identification of the right keywords and key phrases used in your industry is the first basic step in any serious search engine optimization program. Click here to start your keyword and key phrase research. You can link to the Rank for Sales web site as much as you like. Read our section on how your company can participate in our reciprocal link exchange program and increase your rankings in all the major search engines such as Google, AltaVista, Yahoo and all the others. Powered by Sun Hosting Sponsored by Avantex Traffic stats by Site Clicks™Site design by Mtl. Web D. Sponsored by Press Broadcast Sponsored by Blog Hosting.ca Call Rank for Sales toll free from anywhere in the US or Canada: 1-800-631-3221
email: info@rankforsales.com | Home | SEO Tips | SEO Myths | FAQ | SEO News | Articles | Sitemap | Contact | Copyright © Rank for Sales 2003 Terms of use Privacy agreement Legal disclaimer Ce site est disponible en Français |