Tutorial Abbyy Recognition Server (en)
Creating a Workflow
Create the folder structure
It's advisable to create it like the following:
- _Input
- Scans are copied or moved into this folder
- _Output
- OCR results are saved in these subfolders with the corresponding file format
- right button click on the folder
- share -> extended share
- extended share
- set checkbox on share this folder
- privileges
- choose the user(s) that may have access to
These users have to be an account with password on the server
- remove everyone
- add
- extended
- search now
- choos user
- OK -> OK
- activate privileges
- change
- read
- extended
- add
Mount the higher folder, e.g.Daheim as network share on computers installed with Verification Station.
- open Windows-Explorer
- extras
- mount network share
- choose drive letter
- insert the path to the server, e.g. IP-address\shared folder
- reconnect during login
- finish
Workflow settings

Explanation of "Entzerren/dewarping"
This option arranges the picture on the basis of straight lines. If they are like in this example, a trapeze-shaped flag, the result looks like this:

That's why you should deactivate the checkbox "Entzerren/dewarping"

Quality check
Documents with simple text formatting, e.g. continuous text, you may choose "Keine Überprüfung/No check".

Documents with more sophisticated formatting the option "Alle Dokumente überprüfen/Check all documents" should be activated.

Training Data
Create new language in FineReader
You can choose the character set and exclude not apparent characters like the @ -sign. It doesn't exist in oder prints, but is wrongly recognized. In FineReader choose Options -> Tools -> Options
- new
- create new language based on an existing
- OK
The dialog field Properties of language is opened
- extended
- illegal characters
- add e.g. the @ -sign.
To add more characters, just type them in a row without any whitespaces or seperators, e.g. @µ€
Create training file for characters
If Recognition Server doesn't recognize certain characters, in FineReader a training file can be created, exported as .fbt-file and imported in Recognition Server. A demo version of FineReader can be downloaded on the Abbyy-homepage.
In order to create a training file, e.g. Gothic type, set the checkbox "Use training in order to recognize new characters and ligatures. Now click on "read page". A dialog field opens.

In this case the letter "M" wasn't recognized properly. Extend the box by using the open double arrow >> until the M is completely covered. Following click on "Training". If there is more than one letter boxed, execpt on ligatures, you can shrink the box by using the closing double arrow << until the single letter is covered on it's own.


The training can be closed at anytime. When you finished, click on Close
The trained letters can be reviewed by
- Tools
- Pattern Editor
Their properties like bold or italic style can be set.

In order to check it the accuracy of OCR you can OCR some documents with and without your custom training.
The training file can be exported to Recognition Server by clicking on
- Tools
- OCR
- userdefined pattern
- save language
- userdefined pattern
- OCR
and then add this file like described in the workflow settings.


Correction
Specifics of print type Gothic
Not read page numbers
If there's a text box with page numbers not shown, even though it is enclosed, you may click on the estimated position corresponding to the original scan. Now the cursor should blink and you can type the page number. Next, correct the number of reading order to "1".

Furthermore, you needto check the reading order and correct it, if necessary.
Everytime a new text box is created, the print type is set to default, which only includes "normal" print types. If there is gothic to be recognized, the check on Gothic has to be set.
Settings in Verification Station
Spell Checking
The verification station has a built-in spell check, that can be invoked by clicking the corresponding button. As the spell check is quite faulty, sooner or later error messages appear (see Fehlermeldungen/Error messages). Alternatively, a regular text-file with UTF-16 can act as dictionary when you apply it in the workflow settings (see Workflow settings 2nd tab -> Benutzerdefiniertes Wörterbuch verwenden/add user defined dictionary)
It's added on tab 2 Verarbeitung/Processing.
Create new text boxes
Everytime a new text box is created, the print type is set to default, which only includes "normal" print types. If there is gothic to be recognized, the check on Gothic has to be set.
[Bild:D:\OCR\Dokumente\Wiki\Neue_Textbox.png]
Furthermore, you need to check the reading order and correct it, if necessary.
