Tutorial anonymus translators (en): Unterschied zwischen den Versionen
Aus Kallimachos
| Zeile 29: | Zeile 29: | ||
#'''[http://mk270.github.io/whitakers-words/ Whitaker’s Words]''', an Ada-based analysis programm for latin texts. | #'''[http://mk270.github.io/whitakers-words/ Whitaker’s Words]''', an Ada-based analysis programm for latin texts. | ||
#'''[https://github.com/PerseusDL/morpheus Morpheus]''', the parser used by the Perseus program. | #'''[https://github.com/PerseusDL/morpheus Morpheus]''', the parser used by the Perseus program. | ||
Both programs are quite complex and may often require some effort to compile correctly - even more, if you want to integrate these programms into your own scripts. As an easier alternative, at least for some tests, the according web services ([http://services.perseids.org/bsp/morphologyservice/analysis/word?lang=lat&engine=morpheuslat&word=et example]) can be used as well. | Both programs are quite complex and may often require some effort to compile correctly - even more, if you want to integrate these programms into your own scripts. As an easier alternative, at least for some tests, the according web services ([http://services.perseids.org/bsp/morphologyservice/analysis/word?lang=lat&engine=morpheuslat&word=et example]) can be used as well. | ||
If the analysis program is configured correctly, it should be able to recogniza large portions of the texts as orthographically correct latin. Unrecognized words can be routinely replaced by they classical counterparts via a progressively adjusted ruleset. Usefull replacement rules are f.i. ci/ti, diff/def, ch/c etc., but also typical OCR mistakes like ic/it, ee/ec, b/h etc. | If the analysis program is configured correctly, it should be able to recogniza large portions of the texts as orthographically correct latin. Unrecognized words can be routinely replaced by they classical counterparts via a progressively adjusted ruleset. Usefull replacement rules are f.i. ci/ti, diff/def, ch/c etc., but also typical OCR mistakes like ic/it, ee/ec, b/h etc. | ||
For a usable stylometric analysis, at least 95% of the words in the processed text should be recognized as correct latin by the reference lexica. However, 100% recognition should be the goal. To help with the correction of the latin texts, it may be advisable to program simple comparison and input masks, allowing the user to directly compare the words in question with the word in the original scan and correct them on the spot. Furthermore, it is advisable to expand the employed lexicons by custom wort lists to cover the specific vocabulary of arabic-latin translations and the corresponding disciplines. | For a usable stylometric analysis, at least 95% of the words in the processed text should be recognized as correct latin by the reference lexica. However, 100% recognition should be the goal. To help with the correction of the latin texts, it may be advisable to program simple comparison and input masks, allowing the user to directly compare the words in question with the word in the original scan and correct them on the spot. Furthermore, it is advisable to expand the employed lexicons by custom wort lists to cover the specific vocabulary of arabic-latin translations and the corresponding disciplines. | ||