Tutorial anonymus translators (en): Unterschied zwischen den Versionen
Aus Kallimachos
| Zeile 52: | Zeile 52: | ||
For very short texts, it my be advisibale to expand the analysis to less frequent words. However, in this case, the less frequent words of other translators have to be compared as well. Experience shows, that only a huge amassment of these less typical words and phrases in an anonymously translated text allow for a credible attribution. | For very short texts, it my be advisibale to expand the analysis to less frequent words. However, in this case, the less frequent words of other translators have to be compared as well. Experience shows, that only a huge amassment of these less typical words and phrases in an anonymously translated text allow for a credible attribution. | ||
===(II) | ===(II) Computerized Stylometry using ''Burrows Delta''=== | ||
The second method is based on the ideas of John Burrow´s, which assume that authorship can be identified by comparing the standarized relative frequencies of the most frequent words (MFW) in texts. The method has proven itself to be highly successfull for computerized authorship attribution. Many different open-source implementation of this method can be found in the web. A user-friendly interface is employed as part of the Stylo-package for R by Maciej Eder und Jan Rybicki. We used an own implementation in Python, based on Fotis Jannidis‘ [https://github.com/fotis007/pydelta pydelta]. Usually, these implementation offer the choice over different distance mesaurements or "Deltas", i.e. different methods for the computerized calculation of the stylistic "distance" of two texts. Recent studies have shown, that the so-called "Cosine Delta" is an especially high-performant stylometric distance measurement. We got our best results with Cosine Delta as well. | |||
In | In the first step, we analyzed the texts in the corpus with a known translator. The range for the most frequent words (100, 200 or more) can be adjusted in most implementations of the method. We got the results with the 150 most frequent words in the texts. Each text of the corpus is processed as a vector, containin the standarized relative frequencies of these words. The distance between these vectors is calculated using Cosine Delta. After that, the computer forms groups or clusters based on these distances, which can then be visualized in a dendrogram. Using this method, the computer was indeed able to sort the texts with known translators into groups according to these translators, i.e. one group for the translations by Dominicus Gundisalvi, one for Gerhard of Cremona etc. Once this clustering succeeds, the method is calibratet, so to speak. | ||
In | In the second step, the anonymous translations are added to the system. The resulting dendograms have to be interpreted carefully: If the Gundisalvi-cluster (or the Gerhard-cluster etc.) remain stable and are merely expanded by additional anonymous translations, these text are likely produced by Gundisalvi. However, if the groups disperse, the computer is obviously unable to attribute the anonymous translations correctly. | ||
Luckily, the results of method 1 (exclusive words) and 2 (MFW) mostly matched in our attempts, at least for the philosophical corpus. However, the astronomical/astrological corpus isn´t big enough for method 2 yet. | |||
<headertabs /> | <headertabs /> | ||
{{Sprachauswahl|Tutorial for identification of anonymous arabic-latin translators (en)|Tutorial_Anonyme_Übersetzer}} | {{Sprachauswahl|Tutorial for identification of anonymous arabic-latin translators (en)|Tutorial_Anonyme_Übersetzer}} | ||