Indexing the Web

Back-of-the-Book Style Indexing
Indexed Sites
Metadata and Web Indexing
Subject Tree Indexing
Search Engine Technologies

Indexing the Web is not a simple task, and what is evolving to meet the informational needs of Web users are three different kinds of indexing: a back-of-the-book style of hard-coded index links within a Web site, subject trees of reviewed sites, and search engines. Members of ASI who are interested in this specialized area of indexing may wish to join ASI’s Digital Publications Indexing SIG.

Some organizations are seeing that including indexes on their web sites is just as important as including indexes in books and online manuals. We’ve seen some good and some bad, some computer-generated, some obviously not constructed by professional indexers, and some professionally prepared. In any case, all site owners should be commended for recognizing the need for an index. We’d like to share some interesting indexes with you, and information about how search engine indexing works. Have a look and see what value these indexes add! This list will be changing from time to time, so be sure to bookmark, print, download, or save by other means the ones to which you think you’ll refer later.

Back-of-the-Book Style Web Indexing

Many web sites opt to provide a search function for the site. While this is certainly better than nothing, users encounter the same problems in that scenario as they do in other full-text database searching. The major problem is, of course, relevancy of items found via the search. For example, on a software publisher’s site a search for a product called Home Office, ends up retrieving all documents with the word “office” in them, because at the end of every page is the word “home”. If there is a site index, you can go directly to the “H” section, and find the one relevant page, thus saving time for other projects. Not only will an index weed out such irrelevant items, but of the many relevant ones, sub-headings give users a clue as to which are more likely to answer their questions.

These selected sites are merely a selection of sites with interesting indexes that we have happened to run into. The descriptions are written by those submitting the site suggestion. Sites listed here are listed for educational purposes only. The American Society for Indexing does not endorse the information content of these sites.

NOTE ON SUBMISSIONS: we welcome any suggestions from users about sites to add. Suggested URLs must be accompanied by (1) instructions on how to get to the index from the site’s home page, and (2) a description about what is useful or unusual about the index. Please remember that we wish to show actual indexes, not mere collections of links related to a certain topic.

Indexed Sites

BC Hydro
To reach the site index, scroll to be bottom of the home page and click the Site Index link. This alphabetical hyperlinked index displays user-friendly typography and layout.
Rochester History Index
This is a periodical index with hyperlinks to articles, including multiple hyperlinks for some topics.
UNIXhelp for Users
This online manual includes both a browsable index, and a keyword searchable index. Select “Manual Index” from the menu on the home page.
U.S. Census Bureau
To reach the index, click “Index A to Z” on the home page.


Metadata and Web Indexing

The META tag in HTML has been used with the goal of giving hints about web page content to search engines. The abuse of the META tag by webmasters who try to artificially raise the relevancy of a page by larding in META tags with terms unrelated to the actual content of the page has run rampant. Most commercial search engines now assign very little weight to text found in META tags.

In response, movements to standardize META tag content have emerged. Corporations and governmental bodies with many web sites often develop a public portal to their web content. They can improve search results for users by the careful use of structured META tags to guide their on-site search engines. Indexers can apply their analysis skills to creating these structured tags. Here are links about metadata, metatags and web page indexing.

Digital Object Identifier System
The Digital Object Identifier (DOI) is a system for identifying and exchanging intellectual property in the digital environment. It provides a framework for managing intellectual content, for linking customers with content suppliers, for facilitating electronic commerce, and enabling automated copyright management for all types of media. Using DOIs makes managing intellectual property in a networked environment much easier and more convenient, and allows the construction of automated services and transactions for e-commerce.
Dublin Core Metadata Initiative
The Dublin Core Metadata Initiative is an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. DCMI’s activities include consensus-driven working groups, global workshops, conferences, standards liaison, and educational efforts to promote widespread acceptance of metadata standards and practices.
How To Use Meta Tags
This 2007 SearchEngine Watch article explains meta tags, including their limitations.
US Governmment Information Locator Service (GILS)
The goal of the Global Information Locator Service is to make it easier for people to find all of the information they need. GILS is an open standard for searching basic information descriptions. Such descriptions may be inserted into Web documents with tools like TagGen, generated from databases with tools like MetaStar and Microsoft Access; or edited by catalogers and just stored as documents. Based on the ISO 23950 search standard, GILS includes the most commonly understood concepts by which people worldwide find information sources in libraries–concepts like Title, Author, Publisher, Date, and Place.

Subject Tree and Reviewed Site Indexes

Some Web search tools review each site with human eyes and brains to decide which categories and keywords fit the site, and then index it acccordingly. An example would be Yahoo, where hordes of people are building an index to the Web, which is also searchable by a search engine.

Search Engine Technologies

The vast majority of indexing on the Web is automatic, with a high level of retrieval and a low rate of relevancy. Most indexers feel that the precision rate most search engines provide is just not as good as true indexing. But as search engine technologies become more sophisticated, we should see some changes in the frustration level of people using these tools. Most search engines actually search an index, a list of terms that robots return from their voyages. Indexes could be manipulated or constructed for these engines to use, especially on an Intranet, by careful use of the META tag. This is an area that indexers should be researching and understanding, so that we can index for these engines.

You can see a version of a search engine working with a carefully-constructed set of indexes if you have Windows 95 and any of the Microsoft products that feature the Answer Wizard. The topics in these help systems were indexed in a special way that would help this engine bring them up with natural language queries, with a weighted order. You have to understand the compiling engine and the searching engine in order to index for it.

Below are some good sources for information on how search engines work, and the current state of search engine technology.

Search Engines
What they are, how they work, and practical suggestions for getting the most out of them. (1997 article)

Search Engine Watch
Web searching tips, listing of all the major search engines and meta search engines, kid-safe searches, tests and ratings of search engines, search engine technology and news. Also contains the current issue of an e-zine on search engine news and technology; subscribers can search an archive of back issues.

Mind Maps: Hot New Tools Proposed for Cyberspace Librarians, by Nancy Humphreys
This 1999 article appearing in Searcher takes the back-of-the-book index in a new direction.

Why On-Site Searching Stinks a fascinating 1997 study done by User Interface Engineering. They measured successful task performance using site-based full-text search engines, with dismal results.