Tutorial anonymus translators (en): Unterschied zwischen den Versionen
Aus Kallimachos
| Zeile 24: | Zeile 24: | ||
The citable text (3) isn´t usable for stylometry yet, but can be useful for other scientific tasks. Of course, to be able to compare texts using stylometry, they need to be made comparable beforehand. In the field of medieval editions, punctuation rules and orthography are major obstacles, for the punctuation rules often vary according to the national customs of the editors (german, french, english etc.), while the "signal" of the author ist lost. In turn, the orthography ranges from "classizied" editions (e.g. Avicenna Latinus) to the faithful reproduction of the exact orthography of a single medieval manuscript. These problems can be mitigated by radically removing all punctuation marks, changing all uppercase letters to lowercase letters und finally classizying the orthography. The last step is quite painfull for medievalist, but theres is no better alternative. As a first step, it is f.i. helpfull to replace all v with u and all j with i. | The citable text (3) isn´t usable for stylometry yet, but can be useful for other scientific tasks. Of course, to be able to compare texts using stylometry, they need to be made comparable beforehand. In the field of medieval editions, punctuation rules and orthography are major obstacles, for the punctuation rules often vary according to the national customs of the editors (german, french, english etc.), while the "signal" of the author ist lost. In turn, the orthography ranges from "classizied" editions (e.g. Avicenna Latinus) to the faithful reproduction of the exact orthography of a single medieval manuscript. These problems can be mitigated by radically removing all punctuation marks, changing all uppercase letters to lowercase letters und finally classizying the orthography. The last step is quite painfull for medievalist, but theres is no better alternative. As a first step, it is f.i. helpfull to replace all v with u and all j with i. | ||
This process can be digitally enhanced by asking digital latin reference lexica if they can recognize words in the texts of the corpus. The easiest approach is the comparison with a latin word list. (f.i. '''[https://github.com/cisocrgroup/Resources/tree/master/lexica/latin here]''' or in the word list of the '''[http://extensions.openoffice.org/en/project/latin-spelling-and-hyphenation-dictionaries OpenOffice | This process can be digitally enhanced by asking digital latin reference lexica if they can recognize words in the texts of the corpus. The easiest approach is the comparison with a latin word list. (f.i. '''[https://github.com/cisocrgroup/Resources/tree/master/lexica/latin here]''' or in the word list of the '''[http://extensions.openoffice.org/en/project/latin-spelling-and-hyphenation-dictionaries OpenOffice lexicon]''', which can also be used in a Python script via PyEnchant) or the use of a morphology programm, able to lemmatize and kategorize every word in the text. | ||
For the latter, there are currently two open-source solutions: | For the latter, there are currently two open-source solutions: | ||