Tutorial Abbyy Recognition Server (en): Unterschied zwischen den Versionen

Aus Kallimachos
Wechseln zu:Navigation, Suche
DeletedUser (Diskussion | Beiträge)
DeletedUser (Diskussion | Beiträge)
 
(10 dazwischenliegende Versionen desselben Benutzers werden nicht angezeigt)
Zeile 9: Zeile 9:


These settings can be accepted as provided. It saves time-consuming troubleshooting.
These settings can be accepted as provided. It saves time-consuming troubleshooting.
Minimal prerequisites are:
* Server Manager
* Verification Station


=== Choosing the user account the server is running ===
=== Choosing the user account the server is running ===
Zeile 48: Zeile 44:
** set checkbox on ''share this folder''
** set checkbox on ''share this folder''
** privileges
** privileges
*** choose the user(s) that may have access to
*** choose the user(s) that may get access




Zeile 62: Zeile 58:
**** change
**** change
**** read
**** read




Mount the higher folder, e.g.''Daheim'' as network share on computers installed with Verification Station.
Mount the higher folder, e.g.''Daheim'' as network share on computers installed with Verification Station.


* open Windows-Explorer
* open Windows-Explorer
Zeile 73: Zeile 71:
*** reconnect during login
*** reconnect during login
*** finish
*** finish


=== Workflow settings===
=== Workflow settings===




[[Bild:Erweiterte_Workfloweinstellungen.png|600px|center]]
[[Bild:Erweiterte_Workfloweinstellungen.png|600px|center]]




==== Explanation of "Entzerren/dewarping"====
==== Explanation of "Entzerren/dewarping"====


This option arranges the picture on the basis of straight lines. If they are like in this example, a trapeze-shaped flag, the result looks like this:
This option arranges the picture on the basis of straight lines. If they are like in this example, a trapeze-shaped flag, the result looks like this:
Zeile 96: Zeile 99:


That's why you should deactivate the checkbox ''Entzerren/dewarping''
That's why you should deactivate the checkbox ''Entzerren/dewarping''


=== Quality check ===
=== Quality check ===
Zeile 119: Zeile 124:
== Create training file for characters ==
== Create training file for characters ==


If the Abbyy Recognition Server doesn't recognize certain characters,  a training file can be created in FineReader, exported as .fbt-file and imported in the Recognition Server. A demo version of ''FineReader'' can be downloaded on the Abbyy-homepage.
If Abbyy Recognition Server doesn't recognize some characters,  a training file can be created in FineReader, exported as .fbt-file and imported into Recognition Server. A demo version of ''FineReader'' can be downloaded on the Abbyy-homepage.


In order to create a training file, e.g. for the Gothic type, set the checkbox "Use training in order to recognize new characters and ligatures". Now click on "read page". A dialog field opens.  
In order to create a training file, e.g. for the Gothic type, set the checkbox "Use training in order to recognize new characters and ligatures". Now click on "read page". A dialog field opens.  
Zeile 127: Zeile 132:




In this case the letter "M" wasn't recognized properly. Extend the box by using the open double arrow >> until the M is completely covered. Following click on "Training". If there is more than one letter boxed, execpt on ligatures, you can shrink the box by using the closing double arrow << until the single letter is covered on it's own.  
In this case the letter "M" wasn't recognized properly. Extend the box by using the opening double arrow >> until the letter is completely covered. Following click on "Training". If there is more than one letter boxed, execpt on ligatures, you can shrink the box by using the closing double arrow << until the single letter is covered on it's own.  




[[Bild:Mustertraining.png|600px|center]]
[[Bild:Mustertraining.png|600px|center]]




[[Bild:Mustertraining_2.png|600px|center]]
[[Bild:Mustertraining_2.png|600px|center]]




Zeile 144: Zeile 151:


Their properties like '''bold''' or ''italic style'' can be set as well.  
Their properties like '''bold''' or ''italic style'' can be set as well.  




Zeile 149: Zeile 157:




In order to check it the accuracy of the OCR you can OCR some documents with and without your custom training.


The training file can be exported to Recognition Server by clicking on  
In order to check it the accuracy of the OCR you can OCR some documents with and without your custom training file.
 
The training file can be exported to Recognition Server by clicking on
* Tools  
* Tools  
** OCR  
** OCR  
Zeile 159: Zeile 169:


and then add this file like described in the workflow settings.
and then add this file like described in the workflow settings.




[[Bild:Benutzermuster_Explorer.png|600px|center]]
[[Bild:Benutzermuster_Explorer.png|600px|center]]


[[Bild:Benutzermuster_hinzufügen.png|600px|center]]
[[Bild:Benutzermuster_hinzufügen.png|600px|center]]
Zeile 170: Zeile 183:
===Page numbers not displayed===
===Page numbers not displayed===


If there's a text box without page numbers, even though it is enclosed, the font color is often too similar to the background color. Click on the double arrow like shown in the picture to open the menu, click into the color box and choose black.
If there's a text box without page numbers, even though it is enclosed, maybe the font color is the same like the background color. Click on the double arrow like shown in the picture to open the menu, click into the color box and choose black.
 
 


[[Bild:Nicht_eingelesene_Seitenzahlen_Schriftfarbe.png]]
[[Bild:Nicht_eingelesene_Seitenzahlen_Schriftfarbe.png]]




=== Specifics of print type ''Gothic'' when drawing a new text box ===
=== Specifics of print type ''Gothic'' when drawing a new text box ===
Everytime a new text box is created, the print type is set to default, which only includes "normal" print types. If there is gothic to be recognized, the check on Gothic has to be set.
 
 
Everytime a new text box is drawn, the print type is set to default, which only includes "normal" print types. If there is gothic to be recognized, the check on ''Gothic'' has to be set.
 
 


[[Bild:Neue_Textbox.png]]
[[Bild:Neue_Textbox.png]]




Zeile 185: Zeile 206:


===Settings in Verification Station===
===Settings in Verification Station===


====Spell Checking====
====Spell Checking====
The verification station has a built-in spell check, that can be invoked by clicking the corresponding button. As the spell check is quite faulty, sooner or later error messages appear (see ''Error messages''). Alternatively, a regular text-file with UTF-16 can act as dictionary when you apply it in the workflow settings (see 2nd tab -> "Workflow settings''-> Benutzerdefiniertes Wörterbuch verwenden/add user defined dictionary [[#Creating a Workflow|test]])


It's added via 2nd tab -> ''Verarbeitung/Processing''.
 
The verification station has a built-in spell check, that can be invoked by clicking the corresponding button. As the spell check is quite faulty, sooner or later error messages will appear (see ''Error messages''). Alternatively, a regular text-file with UTF-16 formatting can act as dictionary when you apply it to the workflow settings  '''It's added via 2nd tab -> ''Verarbeitung/Processing''. -> interner Link zu "Tab Workflow -> Workfloweinstellungen/Workflow settings''
 
 


[[Bild:Rechtschreibprüfung.PNG|600px|center]]
[[Bild:Rechtschreibprüfung.PNG|600px|center]]

Aktuelle Version vom 19. September 2017, 09:28 Uhr

[bearbeiten]

Tutorial Abbyy Recognition Server

Installation and Configuration of Recognition Server

Please note: This documentation concerns to the prerequisites and specifics wich aren't described comprehensively in the Abbyy Recognition Servers manual, regarding to the magazine "Daheim" and comparable digitized media.


These settings can be accepted as provided. It saves time-consuming troubleshooting.

Choosing the user account the server is running


The Recognition Server manual is related to a Windows environment and Windows network shares. If the OCR results should be saved onto a network share, you need to select a user with privileges on accessing the network. The local system account has comprehensive privileges on the local machine, but none on the network. It's advisable to choose the network service account. In our case we had Novell domain and Novell network shares. These seem to be incompatible in this case. The error message of Recognition Server was cannot access drive xxx (the network share). Thus we saved the OCR results on the local server and copied them onto a network share by using a synchronizing programm.

[bearbeiten]

Creating a Workflow

Creating the folder structure

It's advisable to create a folder structure like the following:




  • _Input
    • Scans are copied or moved into this folder
  • _Output
    • OCR results are saved in these subfolders with the corresponding file format

Share the Daheim-folder

  • right button click on the folder
    • share -> extended share
    • extended share
    • set checkbox on share this folder
    • privileges
      • choose the user(s) that may get access


These users have to be an account with password on the server

  • remove everyone
    • add
      • extended
        • search now
        • choos user
        • OK -> OK
      • activate privileges
        • change
        • read


Mount the higher folder, e.g.Daheim as network share on computers installed with Verification Station.


  • open Windows-Explorer
    • extras
    • mount network share
      • choose drive letter
      • insert the path to the server, e.g. IP-address\shared folder
      • reconnect during login
      • finish


Workflow settings


Explanation of "Entzerren/dewarping"

This option arranges the picture on the basis of straight lines. If they are like in this example, a trapeze-shaped flag, the result looks like this:





That's why you should deactivate the checkbox Entzerren/dewarping


Quality check

Documents with simple text formatting, e.g. continuous text, you may choose "Keine Überprüfung/No check".



Documents with more sophisticated formatting the option "Alle Dokumente überprüfen/Check all documents" should be activated.


[bearbeiten]

Insert training data

Create training file for characters

If Abbyy Recognition Server doesn't recognize some characters, a training file can be created in FineReader, exported as .fbt-file and imported into Recognition Server. A demo version of FineReader can be downloaded on the Abbyy-homepage.

In order to create a training file, e.g. for the Gothic type, set the checkbox "Use training in order to recognize new characters and ligatures". Now click on "read page". A dialog field opens.



In this case the letter "M" wasn't recognized properly. Extend the box by using the opening double arrow >> until the letter is completely covered. Following click on "Training". If there is more than one letter boxed, execpt on ligatures, you can shrink the box by using the closing double arrow << until the single letter is covered on it's own.




The training can be finished at anytime. When you are finished, click on Close

The trained letters can be reviewed by

  • Tools
    • Pattern Editor


Their properties like bold or italic style can be set as well.



In order to check it the accuracy of the OCR you can OCR some documents with and without your custom training file.

The training file can be exported to Recognition Server by clicking on

  • Tools
    • OCR
      • userdefined pattern
        • save language


and then add this file like described in the workflow settings.



[bearbeiten]

Specifics

Page numbers not displayed

If there's a text box without page numbers, even though it is enclosed, maybe the font color is the same like the background color. Click on the double arrow like shown in the picture to open the menu, click into the color box and choose black.



Specifics of print type Gothic when drawing a new text box

Everytime a new text box is drawn, the print type is set to default, which only includes "normal" print types. If there is gothic to be recognized, the check on Gothic has to be set.



After this you need to check the reading order and correct it.


Settings in Verification Station

Spell Checking

The verification station has a built-in spell check, that can be invoked by clicking the corresponding button. As the spell check is quite faulty, sooner or later error messages will appear (see Error messages). Alternatively, a regular text-file with UTF-16 formatting can act as dictionary when you apply it to the workflow settings 'It's added via 2nd tab -> Verarbeitung/Processing. -> interner Link zu "Tab Workflow -> Workfloweinstellungen/Workflow settings


[bearbeiten]

Error messages ...

occur when:

  • using the clipboard (copy & paste)
  • using the spell checking
  • quite rarely during the runtime


These are bugs in the program.