What Is Document Imaging?
What is Document Imaging? Document imaging is an information technology category for systems capable of replicating documents commonly used in business. Document imaging systems can take many forms including microfilm, on demand printers, facsimile machines, copiers, multifunction printers, document scanners, computer output microfilm (COM) and archive writers. Since the 1990s, “document imaging” has been used to describe software-based computer systems that capture, store and reprint images.
Document imaging is a form of enterprise content management. In the early days of content management technologies, the term “document imaging” was used interchangeably with “document image management” as the industry tried to separate itself from the micrographic and reprographic technologies.
In the late 1980s, a new document management technology emerged: electronic document management. This technology was built around the need to manage and secure the escalating volume of electronic documents (spreadsheets, word-processing documents, PDF’s, e-mails) created in organizations.
The scanning or digitization of paper documents for storage makes different requirements of the scanning equipment used than scanning of pictures for reproduction. While documents can be scanned on general-purpose scanners, it is more efficiently performed on dedicated document scanners.
When scanning large quantities of documents, speed and paper-handling is very important, but the resolution of the scan will normally be much lower than for good reproduction of pictures.
Document scanners have document feeders, usually larger than those sometimes found on copiers or all-purpose scanners. Scans are made at high speed, perhaps 20 to 150 pages per minute, often in grayscale, although many scanners support color. Many scanners can scan both sides of double-sided originals (duplex operation). Sophisticated document scanners have firmware or software that cleans up scans of text as they are produced, eliminating accidental marks and sharpening type; this would be unacceptable for photographic work, where marks cannot reliably be distinguished from desired fine detail. Files created are compressed as they are made.
The resolution for scanning legal pages used is usually from 150 to 300 dpi, although the hardware may be capable of somewhat higher resolution; 200 dpi produces images of text good enough to read and for optical character recognition (OCR), without the higher demands on storage space required by higher-resolution images.
Document scans are often processed using OCR technology to create editable and searchable files. Most scanners use ISIS or TWAIN device drivers to scan documents into TIFF format so that the scanned pages can be fed into a document management system that will handle the archiving and retrieval of the scanned pages. Lossy JPEG compression, which is very efficient for pictures, is undesirable for text documents, as slanted straight edges take on a jagged appearance, and solid black (or other color) text on a light background compresses well with lossless compression formats.
While paper feeding and scanning can be done automatically and quickly, preparation and indexing are necessary and require much work by humans. Preparation involves manually inspecting the papers to be scanned and making sure that they are in order, unfolded, without staples or anything else that might jam the scanner. Additionally, some industries such as legal and medical may require documents to have Bates Numbering or some other mark giving a document identification number and date/time of the document scan.
Indexing involves associating relevant keywords to files so that they can be retrieved by content. This process can sometimes be automated to some extent, but it often requires manual labor performed by data-entry clerks. One common practice is the use of barcode-recognition technology: during preparation, barcode sheets with folder names or index information are inserted into the document files, folders, and document groups. Using automatic batch scanning, the documents are saved into appropriate folders, and an index is created for integration into document management systems.