BPMS Guide

Which scanner settings are Best for Boosting Text Recognition?

Whether you are using a document management solution or just wish to rid your office of space consuming filing, you will doubtless have copious amounts of paper that regularly needs to be scanned.

Over decades, we at BPMS have scanned millions of pages of data and increasingly these images are delivered to our clients via document management systems such as DocuWare

When scanning documents, it is vital to deliver the best possible image from the original page, especially if you wish later to interrogate the files text. Whilst to a certain extent scanning is driven by the RIRO principle (Rubbish in, Rubbish out), any document scanned can be enhanced to its optimum level if you get the settings of your scanner right.

In the latest of our guides we explain how you can achieve this.

Which file format?

Should you wish to process information contained within a document – for example to read text and barcodes and then use this information for either indexing or in workflows, the document must be captured in either a PDF or PDF/A format. Make sure to select this setting before scanning the document(s).

If you only wish to view the document or forward it to others via email, then any of the common file formats will suffice. In addition to PDF and PDF/A, you also have available PNG, JPEG and TIF. Put simply, if you only intend to archive a document, it doesn’t really matter in which format you retain it.

Is Colour Capture Required

The basic rule is to scan in black and white where possible to maintain file sizes as low as possible. In certain instances however, to do so would render the copy either as not a true facsimile or worse would detract from conveying the key information for future reference. Examples of this might be photographs, graphs and pie charts or where coloured inks have been used in the original copy.

You may also wish to consider your scanners grey-scale settings which can be very useful in capturing detail and is significantly more subtle that traditional black and white.

Finding the right resolution

Setting the appropriate resolution is perhaps the most important aspect when scanning your documents.

The higher level of dpi (dots per inch) will dictate largely the clarity of the end result. This aspect is critical particularly if you require the scanned documents to be OCR (optical character recognition) text searchable.

Too low a resolution will mean that your software will struggle to identify the text correctly and as a result will potentially record it inaccurately or worse not at all. Whilst errors or omissions may be corrected manually, this can be time consuming and should be avoided.

Where black and white or grey-scale documents are required to be text searchable, we would strongly recommend using a dpi of not less that 300. In the case of colour, a higher resolution of 400 dpi should be considered.

Another factor you may wish to bear in mind is the font size of the original document. As a general guide, the smaller the font size, the higher the resolution should be.

 

The Right Settings Deliver the Best Results

Whilst choosing the right settings for your scanner at first may seem complicated, with a little practise it will soon become second nature.

At BPMS we offer a fully staffed document scanning bureau utilising the latest scanner technology. Should you be uncertain as to how to tackle your document archive or would prefer us to take this concern from you, please get in touch.