Philtag 13 (english) und Datei:Narrenschiff Kapitel 4.jpg: Unterschied zwischen den Seiten

Aus Kallimachos
(Unterschied zwischen Seiten)
Wechseln zu:Navigation, Suche
(Day 1: OCR)
 
 
Zeile 1: Zeile 1:
=<philtag n="13"/>=
 
[[File:Philtag13Main.jpg| 900px | link= | alt= attendees of the 13. philtag workshop]]
 
  
==<philtag n="13"/>==
 
From 25. - 26. February 2016, the 13th Philtag workshop was hosted by KALLIMACHOS. Würzburg University´s well-established conference series concerning the use of digital methods in the humanties offers an overwiew as well as an opportunity for scientific exchange about current trends, projects and technologies in the growing field of Digital Humanities.
 
 
=Conference report=
 
==Conference report==
 
===Day 1: OCR===
 
<div class="tagungsbericht">
 
 
A key subject at the 13th philtag were OCR methods, which were presented, discussed and evaluated on the first day of the
 
conventention. After the greeting by Dr. Hans-Günter Schmidt and a short introduction into the KALLIMACHOS project, the first lecture by
 
Dr. Uwe Springmann ([http://www.cis.uni-muenchen.de CIS München]) covered the topic ''OCR of incunabula''. Although these texts were viewed as not suitable for automated text recognition thus far, recently, new methods and approaches based on artificial neural networks have emerged. Subsequently, Dirk Wintergrün ([https://www.mpiwg-berlin.mpg.de MPIWG Berlin]) highlighted the importance of OCR methods for the research of academic heritage transmission and capturing of scientific records. Elisa Herrmann ([http://www.ocr-d.de/ OCR-D Wolfenbüttel]) presented the new coordinative project ''OCR-D'', which intends to optimize the recognition of german printings from 16th to 19th century and to prepare new DFG funding lines. Dr. Syed Saqib Bukhari ([http://www.dfki.de DFKI Kaiserslautern]) granted a glance at the new OCR-System ''OCRopus++'' developed at the DFKI, which offer abn anticipaed recognitio rate of 98% and higher for historical printings. Afterwards, Dr. Josep Lladós ([http://www.cvc.uab.es CVC Barcelona]) concluded the first block of lectures with a report about the automated extraction of written information from historical documents, marriage records and other genealogical sources and the utilization of the captured data for the reconstruction of historical social networks.
 
 
[[File: Vortrag1.jpg | link= | alt= Dr. Uwe Springmann and Dr. Hans-Günther Schmidt]]
 
[[File: Vortrag5.jpg | link= | alt= Presentation by Elisa Herrmanns ]]
 
[[File: Vortrag2.jpg | link= | alt= Presentation by Josep Llados]]
 
 
The afternoon program consisted of an OCR-themed interactive workshop: The workflows and tools used and developed by KALLIMACHOS were not only presented, but the audience also got the opportunity to try them out hands-on. First, Felix Kirchner and Marco Dittrich discussed the requirements and challenges of image capturing and preprocessing and the specifics of the fonts, types and glyphs used in historical documents. Prof. Dr. Frank Puppe and Christian Reul ([http://www.is.informatik.uni-wuerzburg.de/startseite/ Chair of Computer Science VI Würzburg]) presented the latest breakthroughs regarding the automated segmentation of text blocks. Benedikt Budig
 
([http://www1.informatik.uni-wuerzburg.de/ Chair of Computer Science I Würzburg]) presented [https://github.com/benedikt-budig/glyph-miner Glyph Miner], a the self-developed tool for the simplified extraktion of glyphs, which was met with excitement by the participants. Guided by the student assistants Phillip Beckenbauer und Maximilian Nöth, the attendees were able to discover the workings of ''Aletheia'' und ''Franken++'', which serve as preprocessing tools for glyph inventories that are used for training of the OCR system ''Tesseract''. The workshop ended with creation and validation of the OCR results.
 
 
To conclude the evening, all participants were invited for dinner at the local restaurant ''Bürgerspital'', during which the  impression of the first conference day (and more) were discussed lively and und numerous new contacts could be established.
 
 
[[File: Vortrag6.jpg | link= | alt=The audience]]
 
[[File: Vortrag7.jpg | link= | alt= Presentation by Benedikt Budig]]
 
[[File: Vortrag8.jpg | link= | alt= Presentation by Benedikt Budig]]
 
 
===Day 2: Textmining===
 
The second day treated recent projects in the digital humanities at Würzburg university with an emphasis on textmining procedures. Stefan Evert ([http://www.linguistik.uni-erlangen.de/index.shtml FAU Erlangen-Nürnberg]) illustrated the statistical basis provided by ''Burrow´s Delta'', the predominant stilometric measurement for authorship attribution. Subsequently, Andreas Büttner presented the KALLIMACHOS subproject [[Identifikation von Übersetzern|Identification of translators]], where ''Delta'' is used to identify previously anonymous translators of arabic philosophical texts in the 12th century. Daniel Schlör, Stefanie Popp and Christof Schöch ([https://cligs.hypotheses.org/ Junior research group CLiGS]) outlined the recognition of direct speech in french novels. Since these usually contain no quotation marks, direct speech has to be detected using other features. To this end, the project group uses methods of machine learning. Markus Krug presented the research methods and first results of the KALLIMACHOS subproject [[Romanfiguren | Affectation of reades towards fictional characters]]. Here, characters in novels are automatically identified and annotated. The recovered information is then used to visualize character networks. To this end, not only the names of characters, but also coreferences, f.i. in the form of pronouns, have to be identified. Finally, Isabella Reger (KALLIMACHOS subproject [[Narrative Techniken|Narrative techniques]]) explained how the atmospheric flow in a novel can be recognized as part of a sentiment analysis and how these information can be used to discern literary genres.
 
 
===Conclusion===
 
Considering the attendance of up to 80 subsribers, the intense and focussed working atmosphere and the overall positive feedback, the 13th <philtag> has proven to be a great success. We would like to give our sincere thanks to our lecturers and to our audience as well. We are looking forward for the next installment of the workshop in 2017.
 
</div>
 
 
=Schedule=
 
==Day 1: OCR==
 
<div style="padding:0.4em">
 
{| class="wikitable" cellpadding="10"
 
| ca. 10:00'''
 
| Registration und greeting
 
|-
 
| 10:15-12:30
 
| Lectures
 
|-
 
| 10:15-10:30
 
| Hans-Günter Schmidt: ''KALLIMACHOS and the PhilTag, organisational information''
 
|-
 
| 10:30-10:50
 
| Uwe Springmann ([http://www.cis.uni-muenchen.de/personen/mitarbeiter/springmann/index.html CIS München]): ''OCR von Inkunabeln: Herausforderungen und Herangehensweisen''
 
|-
 
| 10:50-11:10
 
| Dirk Wintergrün ([https://www.mpiwg-berlin.mpg.de/de/users/dwinter MPIWG Berlin]): ''Von Handarbeit zur Massenware - OCR als Grundlage für die Forschung in der Wissenschaftsgeschichte''
 
|-
 
| 11:10-11:30
 
| Elisa Herrmann ([http://www.ocr-d.de/ OCR-D Wolfenbüttel]): ''OCR-D: Koordinierungsprojekt zur Weiterentwicklung von OCR-Verfahren''
 
|-
 
| 11:30-11:50
 
| ''Coffee break''
 
|-
 
| 11:50-12:10
 
| Syed Saqib Bukhari ([http://www.dfki.de/~dengel/content/index_ger.html DFKI Kaiserslautern]): ''OCRopus++: A High performance OCR System For Medieval Documents''
 
|-
 
| 12:10-12:30
 
| Josep Lladós ([http://www.cvc.uab.es/~josep/ CVC Barcelona]): ''Social networks of the past: information extraction from historical demographic documents''
 
|-
 
|12:30-13:30
 
| ''Lunch break''
 
|-
 
| 13:30-16:30
 
| OCR workshop: ''Hands-on Presentation of tools and workflows established at the Würzburg Center for digitalisation etablierten tools for OCR of early modern printings''
 
|-
 
| 13:30-13:40
 
| Greetings, presentation of the basic problem
 
|-
 
| 13:40-14:45
 
| Segmentation, glyphs and letter inventories
 
|-
 
|14:45-15:00
 
| ''Coffee break''
 
|-
 
|15:00-16:00
 
| OCR training with ''Aletheia'' and ''Franken+''
 
|-
 
|16:00-16:30
 
| Validation of OCR output
 
|-
 
| 16:30-17:00
 
| Conclusive Discussion
 
|-
 
| ab 19:30
 
| Dinner at the Restaurant [https://www.google.de/maps/place/Bürgerspital+Weingut/ Bürgerspital]
 
|}
 
</div>
 
 
==Day 2: Textmining==
 
<div style="padding:0.4em">
 
{| class="wikitable" cellpadding="10"
 
| 9:00-9:30
 
| Stefan Evert,Thomas Proisl  ([http://www.linguistik.uni-erlangen.de/index.shtml FAU Nürnberg]): ''Stefan Evert, Thomas Proisl (FAU Nürnberg): Burrows’s Delta verstehen''
 
|-
 
| 9:30-10:00
 
| Andreas Büttner, Thomas Proisl ([[Identifikation von Übersetzern |AG Identifikation von Übersetzern]]): ''Delta und Merkmalsselektion: Welche Wörter unterscheiden arabisch-lateinische Übersetzer?''
 
|-
 
| 10:00-10:30
 
| ''Coffee break''
 
|-
 
| 10:30-11:00
 
| Daniel Schlör, Stefanie Popp, Christof Schöch ([https://cligs.hypotheses.org/ Junior research group CLiGS]): ''Direkte Rede im französischen Roman: Automatische Erkennung und gattungsabhängige Verteilungen''
 
|-
 
| 11:00-11:30
 
| Markus Krug ([[Romanfiguren| AG Romanfiguren]]): ''Figuren und ihre Beziehungen in Romanen''
 
|-
 
| 11:30-12:00
 
| ''Coffee break''
 
|-
 
| 12:00-12:30
 
| Lena Hettinger, Isabella Reger ([[Narrative Techniken| AG Narrative Techniken]]): ''Mit Sentimentanalyse zum Happy End? Experimente zur Klassifikation literarischer Gattungen''
 
|}
 
</div>
 
 
<p>
 
* [[media:Tagungsplan.pdf| Downloadable conference schedule]]
 
</p>
 
 
=Materials=
 
==Schedule==
 
<!-- Tagungsplan und Materialien -->
 
<p>
 
* [[media:Tagungsplan.pdf| Downloadable conference schedule]]
 
</p>
 
 
==Abstracts and Presentations==
 
=== Day 1: OCR ===
 
* Uwe Springmann: OCR von Inkunabeln: Herausforderungen und Herangehensweisen.
 
** [[media:AbstractSpringmann.pdf | Abstract]]
 
** [[media:PresentationSpringmann.pdf | Presentation]]
 
* Elisa Herrmann: OCR-D: Koordinierungsprojekt zur Weiterentwicklung von OCR-Verfahren.
 
** [[media:AbstractHerrmann.pdf | Abstract]]
 
** [[media:PresentationHerrmann.pdf | Presentation]]
 
* Josep Lladós: Social networks of the past: information extraction from historical demographic documents
 
** [[media:AbstractLlados.pdf | Abstract]]
 
<!--** [[media:PresentationLlados.pdf | Presentation]]-->
 
* Dirk Wintergrün (MPIWG Berlin): Von Handarbeit zur Massenware - OCR als Grundlage für die Forschung in der Wissenschaftsgeschichte.
 
** [[media:AbstractWintergrün.pdf | Abstract]]
 
* Syed Saqib Bukhari (DFKI Kaiserslautern): OCRopus++: A High performance OCR System For Medieval Documents.
 
** [[media:AbstractBukhari.pdf | Abstract]]
 
** [[media:PresentationBukhari.pdf | Presentation]]
 
* Marco Dittrich, Felix Kirchner <!--,Christian Reul, Benedikt Budig, Phillip Beckenbauer und Maximilian Nöth--> (JMU Würzburg): Presentation accompanying the OCR workshop.
 
** [[media:OCRWorkshop.pdf | Presentation]]
 
* Christian Reul (JMU Würzburg): Segmentierung von historischen Drucken.
 
** [[media:AbstractReul.pdf | Abstract]]
 
* Benedikt Budig (JMU Würzburg): Erstellung von Typeninventaren mit ''Glyph Miner''.
 
** [[media:AbstractBudig.pdf | Abstract]]
 
* Phillip Beckenbauer (JMU Würzburg): Extraktion von Glyphen mit ''Aletheia''.
 
** [[media:VortragBeckenbauer.pdf | Presentation and Exercises]]
 
* Maximilian Nöth (JMU Würzburg): Erstellen von Trainingsdaten mit ''Franken+''.
 
** [[media:VortragNoeth.pdf | Presentation and Exercises]]
 
 
===Day 2: Textmining===
 
<p>
 
* Stefan Evert,Thomas Proisl (FAU Nürnberg): Unterstanding Burrows’s Delta.
 
** [[media:AbstractFAU.pdf | Abstract]]
 
** [[media:PresentationFAU.pdf | Presentation]]
 
*  Andreas Büttner, Thomas Proisl (AG Identifikation von Übersetzern): Delta and feature selection: Which words distinguish arabic-latin translators?
 
** [[media:AbstractÜbersetzer.pdf | Abstract]]
 
** [[media:PresentationÜbersetzer.pdf | Presentation]]
 
*Daniel Schlör, Stefanie Popp, Christof Schöch (Junior research group CLiGS): Direct speech in french novels: automatic recognition and genre-specific distributions.
 
** [[media:AbstractDirekteRede.pdf | Abstract]]
 
** [[media:PresentationDirekteRede.pdf | Presentation]]
 
* Lena Hettinger, Isabella Reger (AG Romangattungen): Mit Sentimentanalyse zum Happy End? Experimente zur Klassifikation literarischer Gattungen.
 
** [[media:AbstractGattungen.pdf | Abstract]]
 
** [[media:PresentationGattungen.pdf | Presentation]]
 
</p>
 
 
==Software and Data for the OCR workshop==
 
<p>
 
* [http://primaresearch.org/tools/Aletheia/Editions PRImA Aletheia Lite]
 
* [http://emop.tamu.edu/sites/all/themes/bluemasters/docs/Franken+.zip EMOP Franken+]
 
* [http://vietocr.sourceforge.net/ VietOCR]
 
* [http://folk.uib.no/hnooh/mufi/fonts/index.html#Andron Andron Scriptor Web (MUFI TrueType Font)]
 
* [[media:PhilTag_OCR-Workshop_Installation-Help.pdf|Installation notes (PDF)]]
 
* [[media:PhilTag_OCR-Workshop_Daten.zip|Sample data for Aletheia and Franken+ (ZIP)]]
 
</p>
 
<p>
 
(All software requires Windows 7 or higher.)
 
</p>
 
<headertabs />
 
{{Sprachauswahl|Philtag 13 (english)|Philtag 13}}
 

Version vom 21. Januar 2018, 10:33 Uhr