High Volume Requirements of OCR Todays business and administrative environment frequently require mass scanning of documents and then converting the output images through an OCR software in high volume. The need arises because of official requirements that policy documents be placed in the public domain, or because companies increasingly wish to make their balance sheets and other financial documents more transparent and available for public scrutiny. To meet these objectives, you will needs to process high volumes of documents in a mass scale, and then quickly make these documents editable by using an OCR software. Tools needed for Mass Scanning in High Volume In order to handle high volumes of documents you will need a high quality industrial scanner that can scan in good quality resolution, has a high throughput and a fast turnaround time. That means you can feed in large volumes of documents, and the scanner will do a mass scanning with very little gap in between individual pages. Once the mass scanning is over, the scanner sends the output to an OCR, which takes over the job of processing these images. Mechanism of High Volume Processing by the OCR Once the OCR receives the output from the scanner, it cleans up the images by removing smudges, scratches or other noise, and aligns all the volumes of pages properly in portrait mode. In the next step, the OCR optimizes the image resolution for its detection engine, and starts the character recognition part. The time depends on the mass of documents to be processed, and for high volumes it can even take 4-5 hours. After the detection process is finished, the OCR sends the scanned pages into a word processing software that you have previously specified, and you can begin editing all the masses of pages, saving each page as you finish editing it.
2. (Electronic redaction with OCR)
The need for electronic redaction The U.S. Supreme court has recently passed several rulings that public interest documents should be made widely accessible, and this is also one of the stated objectives of the Obama administration. As a result, a very high number of documents needs to be processed daily, and sensitive information regarding individuals need to be redacted. Since manual redaction can be a time consuming and tedious affair, this calls for electronic redaction with OCR software. Many government offices are employing OCR and similar tools for electronic redaction of their documents, and you should also consider this option thoroughly. How electronic redaction with OCR works In order to be able to electronically redact your documents, you first need to install an industrial scanner that can handle image processing at high resolution in quick time. Once the images are processed, the scanner sends its output to an OCR that you should specify beforehand. While many scanners come with bundled OCR packages, there are also some very high quality OCR software available commercially, and it is recommended that you install one of the latter. The little extra investment will translate into a large saving in time for you in two ways: commercial OCRs can process images faster, and they have superior detection algorithms that minimize your editing time. How the OCR redacts your electronic documents Commercial OCR packages have built in redaction software, and you can specify the type of information you want to redact for example SSN, or dates of birth, or company logos and individual names etc. the detection engine then scans all documents and automatically redacts any information that matches with your specification, and presents the result to you in an editable format. You can go through the output file and further remove any information that has been left behind, or clues to the redacted information, and then save the file as a fresh electronic document so that it becomes impossible to retrieve the redacted information.