Sei sulla pagina 1di 2

1.

(Mass OCR of High Volume)


High Volume Requirements of OCR
Todays business and administrative environment frequently require mass scanning
of documents and then converting the output images through an OCR software in
high volume. The need arises because of official requirements that policy
documents be placed in the public domain, or because companies increasingly wish
to make their balance sheets and other financial documents more transparent and
available for public scrutiny. To meet these objectives, you will needs to process
high volumes of documents in a mass scale, and then quickly make these
documents editable by using an OCR software.
Tools needed for Mass Scanning in High Volume
In order to handle high volumes of documents you will need a high quality industrial
scanner that can scan in good quality resolution, has a high throughput and a fast
turnaround time. That means you can feed in large volumes of documents, and the
scanner will do a mass scanning with very little gap in between individual pages.
Once the mass scanning is over, the scanner sends the output to an OCR, which
takes over the job of processing these images.
Mechanism of High Volume Processing by the OCR
Once the OCR receives the output from the scanner, it cleans up the images by
removing smudges, scratches or other noise, and aligns all the volumes of pages
properly in portrait mode. In the next step, the OCR optimizes the image resolution
for its detection engine, and starts the character recognition part. The time depends
on the mass of documents to be processed, and for high volumes it can even take
4-5 hours. After the detection process is finished, the OCR sends the scanned pages
into a word processing software that you have previously specified, and you can
begin editing all the masses of pages, saving each page as you finish editing it.

2. (Electronic redaction with OCR)


The need for electronic redaction
The U.S. Supreme court has recently passed several rulings that public interest
documents should be made widely accessible, and this is also one of the stated
objectives of the Obama administration. As a result, a very high number of
documents needs to be processed daily, and sensitive information regarding
individuals need to be redacted. Since manual redaction can be a time consuming
and tedious affair, this calls for electronic redaction with OCR software. Many
government offices are employing OCR and similar tools for electronic redaction of
their documents, and you should also consider this option thoroughly.
How electronic redaction with OCR works
In order to be able to electronically redact your documents, you first need to install
an industrial scanner that can handle image processing at high resolution in quick
time. Once the images are processed, the scanner sends its output to an OCR that
you should specify beforehand. While many scanners come with bundled OCR
packages, there are also some very high quality OCR software available
commercially, and it is recommended that you install one of the latter. The little
extra investment will translate into a large saving in time for you in two ways:
commercial OCRs can process images faster, and they have superior detection
algorithms that minimize your editing time.
How the OCR redacts your electronic documents
Commercial OCR packages have built in redaction software, and you can specify the
type of information you want to redact for example SSN, or dates of birth, or
company logos and individual names etc. the detection engine then scans all
documents and automatically redacts any information that matches with your
specification, and presents the result to you in an editable format. You can go
through the output file and further remove any information that has been left
behind, or clues to the redacted information, and then save the file as a fresh
electronic document so that it becomes impossible to retrieve the redacted
information.

Potrebbero piacerti anche