BENEFITS OF INDEXING
Organizational Benefits of Indexing
Indexing digital documents produces both tangible and intangible benefits to the organization. Tangible benefits include financial, legal, employee, and value-added benefits. Intangible benefits include less concrete measures of success, such as improved perception of the organization by both employees and customers. Combined tangible and intangible benefits result in financial gain for the organization through increased employee productivity, customer service, and competitive advantage in the marketplace.
Increased production. The speed of many routine office procedures (such as production of statistical reports, records management tasks, access to and retrieval ofdigital documents, etc.) is increased.
Decreased future staff requirements. Increases in production can be handled by current staff.
Increased access to current information. Quick and accurate updates of indexes throughout the organization decreases information retrieval time and increases accuracy of information.
Improved customer service. Prompt, accurate information retrieval increases repeat revenue for the organization.
Decreases in human filing mistakes. Large legal practices often spend 8 or more hours to locate misfiled documents (Socha , 1996).
A legal firm using an image management system found that their cases could be handled by 2.5 fewer temporary full-time clerks than before they implemented the system. With the previous paper-based system, clerks spent large amounts of time retrieving documents identified in database searches, photocopying the documents, delivering the copies to attorneys and legal assistants, and refiling the originals. The clerks also spent considerable time searching for misfiled originals. 2.5 clerks earning $14/hour for 160 hours/month over 14 months would have cost the firm $78,400 ( Socha, 1996).
Litigation protection. In a lawsuit, records need to be produced very quickly. An indexing system that can identify and retrieve documents needed for litigation can pay for itself if a single multi-million dollar lawsuit is avoided.
Response to Rule 26. A new law requires parties involved in a federal lawsuit to identify and produce relevant records within 85 days of the beginning of the litigation (Skupsky, 1995). Quick and accurate retrieval of records is required.
Records retention compliance. Federal, state and local governments regulate record retention periods for organizations. There are over 10,000 federal recordkeeping laws alone (Skupsky, 1989). Good indexing systems include indexing fields related to retention (such as creation date, retention period, and disposition date).
As a result of Rule 26, courts will probably require each party involved in a lawsuit to make a full disclosure of their records in the early stages of the case. Sanctions will follow for parties which fail to produce relevant information. Disorganization of records will not excuse parties from compliance. For instance, in United States v. ABC Sales & Service, the court concluded that “‘a business that generates millions of files cannot frustrate discovery by creating an inadequate filing system so that individual files cannot readily be located'” (Skupsky, 1995).
Currency of business information. New documents can be added to the indexing system quickly, and if documents are indexed when they are created, all users can access them immediately. Employees can do their jobs better.
Document version control. Indexing digital documents makes it possible to control which version of a document users can access. Employees don’t waste time working on outdated documents, or updating a version that’s already been revised.
Remote access. An organization-wide standard indexing language allows authorized users to retrieve documents from anywhere in the world. Employees don’t have to take their whole office with them when they travel.
Simultaneous access. Employees can share a document if it is indexed properly and retrieved from a computer network. The “file folder” is never missing from the file cabinet. Hard copy production and distribution are also eliminated.
Decreased training time. New employees become quickly and fully productive in the organization.
When the U.S. Patent and Trademark Office (PTO) implemented a new imaging system, its most noticeable benefits involved customer service and employee training. The PTO Commissioner said that new patent examiners learned the business much faster because of the indexing system. The old manual indexing system required about 12 years to master; new examiners trained on the imaging system were up to speed in just a few months (Koulopoulos, 1995).
Customer service improvements. Organizations that provide high levels of service will gain customer loyalty and increase business.
Competitive advantage. Organizations that can retrieve information quickly and accurately will be able to accomplish more during the work week. Time is money, and indexing saves time.
Perceived excellence. Companies that project an image of excellence will attract more clients and better employees.
Pharmaceutical giant Glaxo implemented an EDMS and saved over $1 million per year associated with search and retrieval time. However, financial benefits were not the most valuable benefits realized. Each New Drug Application process requires about 50,000 pages of data preparation and documentation; the EDMS and its indexing system allowed Glaxo to prepare this documentation and receive clearance from the Food and Drug Administration much more quickly than before. Thus, EDMS implementation enabled Glaxo to collapse their business cycle and get their product to market sooner than their competitors ( Perkins, 19??).
Costs of Indexing digital documents
How much will it cost to index your digital documents? One vendor quickly replied, “How much do you have?” But that answer is neither realistic nor helpful. Companies contemplating development of an indexing system for digital documents want to spend as little as possible to obtain a retrieval system that is needed to conduct business. More specifically, they want a system that provides quick and accurate access to frequently-retrieved information and reliable (but not necessarily fast) access to infrequently-retrieved information.
Because the types of business documents which meet these criteria in different organizations vary so widely, it is obvious that there is no one “best” indexing scheme. One size will never fit all. Therefore, indexing costs will be detailed in two ways: 1) factors that affect the cost of indexing, and 2) cost information reported in published studies (see Table 1).
Factors That Affect the Cost of Indexing:
One of the first decisions which must be made is whether documents not currently in digital form will be converted. A paper or mICRofilm document is converted to digital format by scanning it into a computer; OCR/ICR (optical character recognition/intelligent character recognition) software may then be used to convert the document to ASCII text (Thiel, 1992). Documents can be indexed before or after they are scanned. Spencer (1996) estimates that the true cost of batch scanning 10,000 documents is about $.09/page before indexing costs are included. Thus, undertaking a large document conversion project can be costly. 20% of the documents to be scanned will require special handling (because of size or condition) and that rated equipment speeds are not reliable guides to how long jobs will actually take; special conditions like these further increase the cost of document conversion (Cullen, 1991) . Other factors which affect the costs of indexing include the cost of keying index field data, technological costs, retrieval costs, and costs of updating.
Manual field indexing of digital documents can be performed when the documents are created or when they are stored. For example, electronic document processing systems often require that employees who produce letters and reports using word processing/spreadsheet software fill some index fields when the document is saved. Although the time required to index a single word-processed document is small, the individuals who do this indexing may be highly paid, which increases the overall cost of indexing digital documents. The most variable (and often the highest) cost associated with indexing is labor.
Indexing cost can be minimized by searching for ways to fill index fields from information already contained in existing corporate databases. If manual entry of a customer number allows the system to automatically access name, address, or zipcode, a great deal of manual keying time may be eliminated (Devlin, 1996). Barcoding is a new and cost-effective way to quickly and accurately identify batches of document types or individual documents (Spencer, 1994). For example, if a type of business form is preprinted with a bar code that identifies what type of document it is, the EDMS can automatically populate the “document type” indexing field when the document is scanned and OCRed. No one has to key the document type, which decreases cost.
The number of index fields used to identify a particular document is a significant cost factor, especially when indexing is performed manually. A study of indexing projects showed that the average number of index fields is 8-12 (Cisco, 1993). However, an ANSI Technical Report prepared by the Association for Information and Image Management International suggests 50 possible index fields which might be used with electronic image management systems (AIIM, 1995). If the average field contains 12-20 characters, the cost difference between manually keying each additional field must be considered.
Sometimes the cost of indexing documents can be reduced or eliminated by using full text retrieval systems which create an additional file (usually called an inverted file) in which each non-trivial word is listed with a locator key (Thiel, 1992). full text retrieval systems also allow users to construct search queries in their own words, rather than having to conform to the restraints of pre-selected terms (Fidel, 1994). However, full-text systems often return an unacceptably low number of relevant documents, fewer than 20% in one study (Blair & Maron, 1985). Some organizations will be unable to afford the cost of not finding relevant documents every time they look for them.
Although most organizations are already computerized and the cost of adding computer capability and memory storage is becoming increasingly economical, there still remain technological cost implications in choosing indexing systems. The size of the index itself must be considered. Inverted files (used by full text retrieval systems) are often very large, sometimes requiring more storage space than the documents which they index (Thiel, 1992). Timely document retrieval may require faster processing speeds than the organization presently supports. And if documents are being shared by many users, local area networks may have to be installed.
The cost of data migration (which includes index migration) must also be considered. Organizations should appoint an information management professional to administer data migration and indexing so that documents remain accessible as technological change occurs. Many organizations already own systems that contain non-standard or proprietary software which makes integration and migration difficult. Planning for future technological change now will save costs later.
If minimizing the costs of indexing documents ultimately increases the cost of retrieval, it may be false economy. Kind and Eppendahl (1992) suggest a number of questions which must be asked about document retrieval, including who performs searches, how frequently items are needed, how long each search takes, how quickly the information must be made available, and how often a needed document cannot be found. Answers to such questions have cost implications which must be considered when designing an indexing system. For example, an inexpensive indexing system will require more search and retrieval time than a more expensive one. Can you afford to have your highly-paid employees spend time searching for and retrieving documents? If you don’t invest in the indexing system, you will pay for it (and pay more for it) in retrieval.
Another retrieval cost involves training employees to use the system. The more complicated the indexing scheme, the more time and training will be required before users feel comfortable and confident about their ability to access the information they need.
Cost of Updating
Two different Kinds of updating costs must be considered. First is updating the documents in the system. If most documents exist in only one version, it may be economically feasible to simply start indexing over each time a document is revised, essentially giving it a new identity. However, if documents are frequently revised or modified, the organization may need to identify the most recent or official version of a document. Additional indexing fields may be needed to ensure that multiple users all have access to the latest version.
The index itself must be kept current and updated. Griffiths and King (1993) survey 16 organizations and suggest that direct costs of an “index maintenance” project average $.29 per document (the project included creation and addition of new terms, removal of obsolete terms, and authority and location control work). Index maintenance may cost more that the original cost of indexing documents. Time and effort spent on initial index design may eliminate costly projects to correct or update after the system is in place.
Cost of Indexing
Table 1 shows examples of costs and ranges found in published studies of indexing projects. Koulopoulos(1995) reports that the time spent designing a typical system is divided among field identification and data standardization (20%), data entry (20%), and system correction and fine-tuning (60%). Initial purchase of digital imaging systems with capacity to process and store 300,000 to 3 million pages per year costs $.15 to $.25 per page, depending on use.
Costs reported by companies indexing their documents in-house range from $.12 to $.20 per page (Cisco, 1993). Typical service bureau charges currently range from $.15 to $.30 per page for scanning and indexing (it is not clear how many index fields would be included).
So how few index fields can your organization get by on? You need at least two fields to ensure data retrieval–one uniquely identifies each document, and another provides an alternate pathway in case the first one fails. We recommends indexing a unique identifier and the document type for each document (personal communication, August 3, 1996) . You need additional fields to manage records retention and disposal. You also need to index processing information about the software and hardware used to create each document so that data can be properly migrated when necessary. The Association For Information and Image Management (AIIM) identifies 30 possible processing information fields and 20 possible retrieval information fields (1995). The United States Department of Defense uses 22 records management fields to index their documents (Prescott, Underwood, & Kindl, 1995).
Answering the questions in “Taking Stock of Your Company’s Indexing Needs: Full-Text, Field or a Combination?” will help you identify what sort of data needs to be stored in index fields. Obviously, we cannot recommend a minimum number of indexing fields needed to effectively retrieve business documents. Each organization has unique requirements that should be thoroughly studied before implementing an indexing system.