From 25. - 26. February 2016, the 13th Philtag workshop was hosted by KALLIMACHOS. Würzburg University´s well-established conference series concerning the use of digital methods in the humanties offers an overwiew as well as an opportunity for scientific exchange about current trends, projects and technologies in the growing field of the digital Humanities.
A key subject at the 13th philtag were OCR methods, which were presented, discussed and tried out at the first day of the 2016 conventention. After the greeting by Dr. Hans-Günter Schmidt and a short introduction into the KALLIMACHOS project, the first lecture by Dr. Uwe Springmann (CIS München) covered the topic OCR of incunabula. Although these texts were viewed as not suitable for automated text recognition thus far, recently, new methods and approaches based on artificial neural networks have emerged. Subsequently, Dirk Wintergrün (MPIWG Berlin) highlighted the importance of OCR methods for the research of academic heritage transmission and capturing of scientific records. Elisa Herrmann (OCR-D Wolfenbüttel) presented the new coordinative project OCR-D, which intends to optimize the recognition of german printings from 16th to 19th century and to prepare new DFG funding lines. Dr. Syed Saqib Bukhari (DFKI Kaiserslautern) granted a glance at the new OCR-System OCRopus++ developed at the DFKI, which anticipates recognition rates of 98% and higher for historical printings. Afterwards, Dr. Josep Lladós (CVC Barcelona) concluded the first block of lectures with a report about the automated extraction of written information from historical documents, marriage records and other genealogical sources and the utilization of the captured data for the reconstruction of historical social networks.
The afternoon program consisted of an OCR-themed interactive workshop: The workflows and tools used and/or developed by KALLIMACHOS were not only presented, but the audience also got the opportunity to try them out hands-on. First, Felix Kirchner and Marco Dittrich discussed the requirements and challenges of image capturing and preprocessing and the specifics of the fonts, types and glyphs used in historical documents. Prof. Dr. Frank Puppe and Christian Reul (Chair of Computer Science VI Würzburg) [...] presented the latest breakthroughs regarding the automated segmentation of text blocks. Benedikt Budig (Chair of Computer Science I Würzburg) presented the tool Glyph Miner, developed in-house for the simplified extraktion of glyphs, and met with excitement by the participants. Guided by der Anleitung durch die student assistants Phillip Beckenbauer und Maximilian Nöth, the attendees were able to discover the workings of Aletheia und Franken++, which serve as preprocessing toos for the glyph inventories used for training of the OCR system Tesseract. The workshop ended with creation and validation of the OCR results.
To conclude the evening, the participants were invited for dinner at the local restaurant Bürgerspital, during which the impression of the first conference day (and more) were discussed lively and und numerous new contacts could be established.
The second day treated recent projects in the digital humanities at Würzburg university with an emphasis on textmining procedures. Stefan Evert (FAU Erlangen-Nürnberg) illustrated the statistical basis provided by Burrow´s Delta, the predominant stilometric measurement for authorship attribution. Subsequently, Andreas Büttner presented the KALLIMACHOS subproject Identification of translators, where Delta is used to identify previously anonymous translators of arabic philosophical texts in the 12th century. Daniel Schlör, Stefanie Popp and Christof Schöch (Junior research group CLiGS) outlined the recognition of direct speech in french novels. Since these usually contain no quotation marks, direct speech has to be detected using other features. To this end, the project group uses methods of machine learning. Markus Krug presented the research methods and first results of the KALLIMACHOS subproject Affectation of reades towards fictional characters. Here, characters in novels are automatically identified and annotated. The recovered information is then used to visualize character networks. To this end, not only the names of characters, but also coreferences, f.i. in the form of pronouns, are to be identified. Finally, Isabella Reger (KALLIMACHOS subproject Narrative techniques) explained, how the atmospheric flow in a novel can be recognized as part of a sentiment analysis and how these information can be used to discern literary genres.
Angesichts der Publikumsstärke von zeitweise knapp 80 Personen, der intensiven, konzentrierten Arbeitsatmosphäre und des großen Zuspruchs aller Teilnehmerinnen und Teilnehmer darf der 13. <philtag> als voller Erfolg gelten. Wir bedanken uns sehr herzlich sowohl bei den engagierten Rednerinnen und Rednern als auch beim Publikum für die zahlreiche Anregungen und die aktive Teilnahme am Tagungsprogramm. Wir freuen uns darauf, Sie bald wieder bei uns begrüßen zu dürfen.
(All software requires Windows 7 or higher.)