In computing, a scanner is a device that optically scans images, printed text, handwriting, or an object, and converts it to a digital image. Common examples found in offices are variations of the desktop (or flatbed) scanner where the document is placed on a glass window for scanning. Hand-held scanners, where the device is moved by hand, were briefly popular but are now less common due to the difficulty of obtaining a high-quality image. Mechanically driven scanners that move the document are typically used for large-format documents, where a flatbed design would be impractical.
Modern scanners typically use charge-coupled device (CCD) or Contact Image Sensor (CIS) as the image sensor, whereas older drum scanners use a photomultiplier tube as the image sensor. A rotary scanner, used for high-speed document scanning, is another type of drum scanner, using a CCD array instead of a photomultiplier. Other types of scanners are planetary scanners, which take photographs of books and documents, and 3D scanners, for producing three-dimensional models of objects.
Another category of scanner is digital camera scanners, which are based on the concept of reprographic cameras. Due to increasing resolution and new features such as anti-shake, digital cameras have become an attractive alternative to regular scanners. While still having disadvantages compared to traditional scanners, digital cameras offer advantages in speed and portability.
Scanners can be considered the successors of early telephotography input devices, consisting of a rotating drum with a single photodetector at a standard speed of 60 or 120 rpm (later models up to 240 rpm). They send a linear analog AM signal through standard telephone voice lines to receptors, which synchronously print the proportional intensity on special paper. This system was in use in press from the 1920s to the mid-1990s. Color photos were sent as three separated RGB filtered images consecutively, but this was used only for special events due to transmission costs.
Drum scanners capture image information with photomultiplier tubes (PMT), rather than the charge-coupled-device (CCD) arrays found in flatbed scanners and inexpensive film scanners. Reflective and transmissive originals are mounted on an acrylic cylinder, the scanner drum, which rotates at high speed while it passes the object being scanned in front of precision optics that deliver image information to the PMTs. Most modern color drum scanners use 3 matched PMTs, which read red, blue, and green light respectively. Light from the original artwork is split into separate red, blue, and green beams in the optical bench of the scanner.
The drum scanner gets its name from the large glass drum on which the original artwork is mounted for scanning: they usually take 11"x17" documents, but maximum size varies by manufacturer. One of the unique features of drum scanners is the ability to control sample area and aperture size independently. The sample size is the area that the scanner encoder reads to create an individual pixel. The aperture is the actual opening that allows light into the optical bench of the scanner. The ability to control aperture and sample size separately is particularly useful for smoothing film grain when scanning black-and white and color negative originals.
While drum scanners are capable of scanning both reflective and transmissive artwork, a good-quality flatbed scanner can produce excellent scans from reflective artwork. As a result, drum scanners are rarely used to scan prints now that high quality inexpensive flatbed scanners are readily available. Film, however, is where drum scanners continue to be the tool of choice for high-end applications. Because film can be wet-mounted to the scanner drum and because of the exceptional sensitivity of the PMTs, drum scanners are capable of capturing very subtle details in film originals.
Only a few companies continue to manufacture drum scanners. While prices of both new and used units have come down over the last decade, they still require a considerable monetary investment when compared to CCD flatbed and film scanners. However, drum scanners remain in demand due to their capacity to produce scans that are superior in resolution, color gradation, and value structure. Also, since drum scanners are capable of resolutions up to 12,000 PPI, their use is generally recommended when a scanned image is going to be enlarged.
In most graphic-arts operations, very-high-quality flatbed scanners have replaced drum scanners, being both less expensive and faster. However, drum scanners continue to be used in high-end applications, such as museum-quality archiving of photographs and print production of high-quality books and magazine advertisements. In addition, due to the greater availability of pre-owned units many fine-art photographers are acquiring drum scanners, which has created a new niche market for the machines.
The first image scanner ever developed was a drum scanner. It was built in 1957 at the US National Bureau of Standards by a team led by Russell Kirsch. The first image ever scanned on this machine was a 5 cm square photograph of Kirsch's then-three-month-old son, Walden. The black and white image had a resolution of 176 pixels on a side.
A flatbed scanner is usually composed of a glass pane (or platen), under which there is a bright light (often xenon or cold cathode fluorescent) which illuminates the pane, and a moving optical array, whether CCD or CIS. Color scanners typically contain three rows (arrays) of sensors with red, green, and blue filters. Images to be scanned are placed face down on the glass, an opaque cover is lowered over it to exclude ambient light, and the sensor array and light source move across the pane, reading the entire area. An image is therefore visible to the charge-coupled device only because of the light it reflects. Transparent images do not work in this way, and require special accessories that illuminate them from the upper side. Many scanners offer this as an option.
"Slide" (positive) or negative film can be scanned in equipment specially manufactured for this purpose. Usually, uncut film strips of up to six frames, or four mounted slides, are inserted in a carrier, which is moved by a stepper motor across a lens and CCD sensor inside the scanner. Some models even have adaptors for APS film cassettes. Dedicated film scanners often offer better resolution than flatbed scanners, partly because they do not need to scan large areas.
Hand scanners are manual devices that are dragged across the surface of the image to be scanned. Scanning documents in this manner requires a steady hand, as an uneven scanning rate would produce distorted images - a little light on the scanner would indicate if the motion was too fast. They typically have a "start" button, which is held by the user for the duration of the scan; some switches to set the optical resolution; and a roller, which generates a clock pulse for synchronisation with the computer. Most hand scanners were monochrome, and produced light from an array of green LEDs to illuminate the image. A typical hand scanner also had a small window through which the document being scanned could be viewed. They were popular during the early 1990s and usually had a proprietary interface module specific to a particular type of computer, usually an Atari ST or Commodore Amiga.
Scanners typically read red-green-blue color (RGB) data from the array. This data is then processed with some proprietary algorithm to correct for different exposure conditions and sent to the computer, via the device's input/output interface (usually SCSI or LPT in machines pre-dating the USB standard). Color depth varies depending on the scanning array characteristics, but is usually at least 24 bits. High quality models have 48 bits or more color depth. The other qualifying parameter for a scanner is its resolution, measured in pixels per inch (ppi), sometimes more accurately referred to as samples per inch (spi). Instead of using the scanner's true optical resolution, the only meaningful parameter, manufacturers like to refer to the interpolated resolution, which is much higher thanks to software interpolation. As of 2004, a good flatbed scanner has an optical resolution of 1600–3200 ppi, high-end flatbed scanners can scan up to 5400 ppi, and a good drum scanner has an optical resolution of 8000–14,000 ppi.
Manufacturers often claim interpolated resolutions as high as 19,200 ppi; but such numbers carry little meaningful value, because the number of possible interpolated pixels is unlimited.
The higher the resolution, the larger the file. In most cases, there is a trade-off between manageable file size and level of detail.
The third important parameter for a scanner is its density range. A high density range means that the scanner is able to reproduce shadow details and brightness details in one scan.
Scanning the document is only one part of the process. For the scanned image to be useful, it must be transferred from the scanner to an application running on the computer. There are two basic issues: (1) how the scanner is physically connected to the computer and (2) how the application retrieves the information from the scanner.
Physical Connection to the Computer
The amount of data generated by a scanner can be very large: a 600 DPI 9"x11" (slightly larger than A4 paper) uncompressed 24-bit image consumes about 100 megabytes of uncompressed data in transfer and storage on the host computer. Recent scanners can generate this volume of data in a matter of seconds, making a fast connection desirable.
There are four common connections used by scanners:
- Parallel - Connecting through a parallel port is the slowest common transfer method. Early scanners had parallel port connections that could not transfer data faster than 70 kilobytes/second. The primary advantage of the parallel port connection was economy -- it avoided adding an interface card to the computer.
- Small Computer System Interface (SCSI), which is supported by most computers only via an additional SCSI interface card. Some SCSI scanners are supplied together with a dedicated SCSI card for a PC, although any SCSI controller can be used. During the evolution of the SCSI standard speeds increased, with backwards compatibility; a SCSI connection can transfer data at the highest speed which both the controller and the device support. SCSI has been largely replaced by USB and Firewire, one or both of which are directly supported by most computers, and which are easier to set up than SCSI.
- Universal Serial Bus (USB) scanners can transfer data quickly, and they are easier to use and cheaper than SCSI devices. The early USB 1.1 standard could transfer data at only 1.5 megabytes per second (slower than SCSI), but the later USB 2.0 standard can theoretically transfer up to 60 megabytes per second (although everyday rates are much lower), resulting in faster operation.
- FireWire is an interface that is much faster than USB 1.1 and comparable to USB 2.0. FireWire speeds are 25, 50, and 100 megabytes per second (but a device may not support all speeds). There's also a newer 400 megabyte per second speed.
Applications Programming Interface
An application such as Adobe Photoshop must communicate with the scanner. There are many different scanners, and many of those scanners use different protocols. In order to simplify applications programming, some Applications Programming Interfaces ("API") were developed. The API presents a uniform interface to the scanner. This means that the application does not need to know the specific details of the scanner in order to access it directly. For example, Adobe Photoshop supports the TWAIN standard; consequently, (in an ideal world) Photoshop can acquire an image from any scanner that also supports TWAIN.
In practice, there are often problems with an application communicating with a scanner. Either the application or the scanner manufacturer (or both) may have faults in their implementation of the API.
Typically, the API is implemented as a dynamically linked library. Each scanner manufacturer provides software that translates the API procedure calls into primitive commands that are issued to a hardware controller (such as the SCSI, USB, or FireWire controller). The manufacturer's part of the API is commonly called a device driver, but that designation is not strictly accurate: the API does not run in kernel mode and does not directly access the device.
Some scanner manufacturers will offer more than one API.
Most scanners use the TWAIN API. The TWAIN API, originally used for low-end and home-use equipment, is now widely used for large-volume scanning.
Other scanner API's are
- ISIS, created by Pixel Translations, which still uses SCSI-II for performance reasons, is used by large, departmental-scale, machines.
- SANE (Scanner Access Now Easy) is a free/open source API for accessing scanners. Originally developed for Unix and Linux operating systems, it has been ported to OS/2, Mac OS X, and Microsoft Windows. Unlike TWAIN, SANE does not handle the user interface. This allows batch scans and transparent network access without any special support from the device driver.
- Windows Image Acquisition ("WIA") is an API provided by Microsoft.
In addition to the API, many scanners come bundled with other software. Typically, a scanning utility, some type of image-editing application (such as Photoshop), and OCR software. OCR, or optical character recognition, software makes possible the conversion of graphical images of text into standard text that can be edited using common word-processing and text-editing software. OCR utilises an averaging algorithm to determine the character shape, then matches that shape to a corresponding letter or number.
The scanned result is a non-compressed RGB image, which can be transferred to a computer's memory. Some scanners compress and clean up the image using embedded firmware. Once on the computer, the image can be processed with a raster graphics program (such as Photoshop or the GIMP) and saved on a storage device (such as a hard disk).
In common use, images are stored on a computer's hard disk. Pictures are normally stored in image formats such as JPEG, TIFF, Bitmap, and PNG. Documents are usually stored in TIFF or PDF format. Some scanners can also be used to capture editable text, so long as the text can be read by the computer in a discernable font. This process is called Optical Character Recognition (OCR).
Document processing The scanning or digitization of paper documents for storage makes different requirements of the scanning equipment used than scanning of pictures for reproduction. While documents can be scanned on general-purpose scanners, it is more efficiently performed on dedicated document scanners manufactured by Atiz Innovation, Böwe Bell & Howell, Canon, Epson, Fujitsu, HP, Kodak and other companies.
When scanning large quantities of documents, speed and paper-handling is very important, but the resolution of the scan will normally be much lower than for good reproduction of pictures.
Document scanners have document feeders, usually larger than those sometimes found on copiers or all-purpose scanners. Scans are made at high speed, perhaps 20 to 150 pages per minute, often in grayscale, although many scanners support color. Many scanners can scan both sides of double-sided originals (duplex operation). Sophisticated document scanners have firmware of software that cleans up scans of text as they are produced, eliminating accidental marks and sharpening type; this would be unacceptable for photographic work, where marks cannot reliably be distinguished from desired fine detail. Files created are compressed as they are made.
The resolution used is usually from 150 to 300 dpi, although the hardware may be capable of somewhat higher resolution; this produces images of text good enough to read and for optical character recognition (OCR), without the higher demands on storage space required by higher-resolution images.
Document scans are often processed using OCR technology to create editable and searchable files. Most scanners use ISIS or Twain device drivers to scan documents into TIFF format so that the scanned pages can be fed into a document management system that will handle the archiving and retrieval of the scanned pages. Lossy JPEG compression, which is very efficient for pictures, is undesirable for text documents, as slanted straight edges take on a jagged appearance, and solid black (or other color) text on a light background compresses well with lossless compression formats.
While paper feeding and scanning can be done automatically and quickly, preparation and indexing are necessary and require much work by humans. Preparation involves manually inspecting the papers to be scanned and making sure that they are in order, unfolded, without staples or anything else that might jam the scanner. Additionally, some industries such as legal and medical may require documents to have Bates Numbering or some other mark giving a document identification number and date/time of the document scan.
Indexing involves associating keywords to files so that they can be retrieved by content. This process can sometimes be automated to some extent, but is likely to involve manual labour. One common practice is the use of barcode-recognition technology: during preparation, barcode sheets with folder names are inserted into the document files, folders, and document groups. Using automatic batch scanning, the documents are saved into the appropriate folders, and an index is created for integration into document-management software systems.
A specialized form of document scanning is book scanning. Technical difficulties arise from the books usually being bound and sometimes fragile and irreplaceable, but some manufacturers have developed specialized machinery to deal with this. For instance, Atiz DIY scanner uses a V-shaped cradle and a V-shaped transparent platen to handle brittle books. Often special robotic mechanisms are used to automate the page turning and scanning process
Infrared cleaning is a technique used to remove dust and scratches from film, and most modern scanners incorporate this feature. It works by scanning the film with infrared light. From this, it is possible to detect dust and scratches that cut off the infrared light; and they can then be automatically removed, by considering their position, size, shape, and surroundings.
Scanner manufacturers usually have their own name attached to this technique. For example, Epson, Nikon, Microtek, and others use Digital ICE, while Canon uses its own Film Automatic Retouching and Enhancement system.
Flatbed scanners are capable of synthesising simple musical scores, due to the variable speed (and tone) of their stepper motors. This property can be applied for hardware diagnostics: for example the HP Scanjet 5 plays Ode to Joy if powered on with SCSI ID set to zero. Windows- and Linux-based software is available for several brands and types of flatbad scanners to play MIDI files for fun purposes.