Tutorial Abbyy Recognition Server (en): Unterschied zwischen den Versionen

Aus Kallimachos
Wechseln zu:Navigation, Suche
DeletedUser (Diskussion | Beiträge)
Keine Bearbeitungszusammenfassung
DeletedUser (Diskussion | Beiträge)
Keine Bearbeitungszusammenfassung
Zeile 114: Zeile 114:
If Recognition Server doesn't recognize certain characters, in FineReader a training file can be created, exported as .fbt-file and imported in Recognition Server. A demo version of FineReader can be downloaded on the Abbyy-homepage.
If Recognition Server doesn't recognize certain characters, in FineReader a training file can be created, exported as .fbt-file and imported in Recognition Server. A demo version of FineReader can be downloaded on the Abbyy-homepage.


In order to create a training file, e.g. Gothic type, set the checkbox "Use training in order to recognize new characters and ligatures. Now click on "read page". A dialog field opens. In this case the letter "M" wasn't recognized properly. Extend the box b using the open double arrow until the M is completely covered. Following click on "Training". If there is more than one letter boxed, execpt on ligatures, you can shrink the box by using the closing double arrow until the single letter is covered on it's own.  
In order to create a training file, e.g. Gothic type, set the checkbox "Use training in order to recognize new characters and ligatures. Now click on "read page". A dialog field opens.  
 
 
[[Bild:Training_Benutzermuster.png|600px|center]]
 
 
In this case the letter "M" wasn't recognized properly. Extend the box by using the open double arrow >> until the M is completely covered. Following click on "Training". If there is more than one letter boxed, execpt on ligatures, you can shrink the box by using the closing double arrow << until the single letter is covered on it's own.  
 


[[Bild:Mustertraining.png|600px|center]]
[[Bild:Mustertraining.png|600px|center]]


[[Bild:Mustertraining_2.png|600px|center]]
[[Bild:Mustertraining_2.png|600px|center]]


When training is complete, click on ''Close''


The trained letters can be reviewed by ''Tools -> Pattern Editor''. Their properties like bold or italic style can be set.
The training can be closed at anytime. When you finished, click on ''Close''
 
The trained letters can be reviewed by  
* Tools  
** Pattern Editor  
 
Their properties like bold or italic style can be set.
 


[[Bild:Benutzermuster.png|600px|center]]
[[Bild:Benutzermuster.png|600px|center]]
In order to check it the accuracy of OCR you can OCR some documents with and without your custom training.
The training file can be exported to Recognition Server by clicking on
* Tools
** OCR
*** userdefined pattern
**** save language
and then add this file like described in the workflow settings.

Version vom 16. August 2017, 13:01 Uhr

[bearbeiten]

Installation and Configuration of Recognition Servers

Please note: this documentation concerns to the prerequisites and specifics wich aren't described comprehensively in the Abbyy Recognition Servers manual, regarding to the magazine "Daheim" and comparable digitized media.


These settings can be accepted as provided. It saves time-consuming troubleshooting.

Minimal prerequisites are:

  • Server Manager
  • Verification Station

Choosing the user account the server is running


The Recognition Server-manual is related to a Windows based domain and Windows network shares. If the OCR results are to be saved onto a network sare, you need to select a user with privileges on accessing the network. The local system account does have comprehensive privileges on the local machine, bot none on the network. It's advisable to choose the network service account. In our case we had Novell domain and Novell network shares. These seem to be incompatible in this case. The error message of Recognition Server was cannot access drive xxx (the network share). Thus we saved the OCR results on the local server and copied them onto a network share by using a synchronizing programm.

Create the folder structure

It's advisable to create it like the following:

  • _Input
    • Scans are copied or moved into this folder
  • _Output
    • OCR results are saved in these subfolders with the corresponding file format

Share the Daheim-folder

  • right button click on the folder
    • share -> extended share
    • extended share
    • set checkbox on share this folder
    • privileges
      • choose the user(s) that may have access to

These users have to be an account with password on the server

  • remove everyone
    • add
      • extended
        • search now
        • choos user
        • OK -> OK
      • activate privileges
        • change
        • read

Mount the higher folder, e.g.Daheim as network share on computers installed with Verification Station.

  • open Windows-Explorer
    • extras
    • mount network share
      • choose drive letter
      • insert the path to the server, e.g. IP-address\shared folder
      • reconnect during login
      • finish

Workflow settings

Explanation of "Entzerren/dewarping":

This option arranges the picture on the basis of straight lines. If they are like in this example, a trapeze-shaped flag, the result looks like this:

That's why you should deactivate the checkbox "Entzerren/dewarping"

Quality check

Documents with simple text formatting, e.g. continuous text, you may choose "Keine Überprüfung/No check".

Documents with more sophisticated formatting the option "Alle Dokumente überprüfen/Check all documents" should be activated.

Including training data

Create new language in FineReader

You can choose the character set and exclude not apparent characters like the @ -sign. It doesn't exist in oder prints, but is wrongly recognized. In FineReader choose Options -> Tools -> Options

  • new
  • create new language based on an existing
  • OK

The dialog field Properties of language is opened

  • extended
  • illegal characters
  • add e.g. the @ -sign.

To add more characters, just type them in a row without any whitespaces or seperators, e.g. @µ€

Create training file for characters

If Recognition Server doesn't recognize certain characters, in FineReader a training file can be created, exported as .fbt-file and imported in Recognition Server. A demo version of FineReader can be downloaded on the Abbyy-homepage.

In order to create a training file, e.g. Gothic type, set the checkbox "Use training in order to recognize new characters and ligatures. Now click on "read page". A dialog field opens.



In this case the letter "M" wasn't recognized properly. Extend the box by using the open double arrow >> until the M is completely covered. Following click on "Training". If there is more than one letter boxed, execpt on ligatures, you can shrink the box by using the closing double arrow << until the single letter is covered on it's own.




The training can be closed at anytime. When you finished, click on Close

The trained letters can be reviewed by

  • Tools
    • Pattern Editor

Their properties like bold or italic style can be set.



In order to check it the accuracy of OCR you can OCR some documents with and without your custom training.

The training file can be exported to Recognition Server by clicking on

  • Tools
    • OCR
      • userdefined pattern
        • save language

and then add this file like described in the workflow settings.