KallimachosEngines: Unterschied zwischen den Versionen

Aus Kallimachos
Wechseln zu:Navigation, Suche
DeletedUser (Diskussion | Beiträge)
Keine Bearbeitungszusammenfassung
DeletedUser (Diskussion | Beiträge)
Keine Bearbeitungszusammenfassung
Zeile 1: Zeile 1:
<div class="notab">
<div class="notab">
==KallimachosEngines-Repository==
==KallimachosEngines==
https://gitlab2.informatik.uni-wuerzburg.de/kallimachos/KallimachosEngines
https://gitlab2.informatik.uni-wuerzburg.de/kallimachos/KallimachosEngines



Version vom 16. Mai 2017, 12:48 Uhr

KallimachosEngines

https://gitlab2.informatik.uni-wuerzburg.de/kallimachos/KallimachosEngines

This repository is a collection of NLP-Tools, created during the project Kallimachos. The tools come with an Apache UIMA Analysis engine and therefore are made to create Annotations.

The components, created during Kallimachos were written to be compatible to DkPro-Core, by integrating a type conversion if need be.


Content

It currently contains the following components:


DROC-Tagger (Named Entity and character references detection) WIP


DROC-Tagger

It currently contains the DROC-Tagger, a Named-Entity-Tagger, using Word Embedding Features derived from 160.000.000 Tokens of german novels.


NLP For Everyone

If you want to try our components and do not mind to download 450MB then you can download the Fat-Jar. Once finished, Nappi will be the replacement of this all-in-one component


Usage of the Kallimachos-Preprocessing Jar

Download the .zip, extract it into any folder on your local filesystem. Have a document folder with your (text)-documents that are about to be processed ready, as well as a folder which will store the output documents.

Next you need to configure the .ini to your needs. If a line starts with '#' then this engine will be skipped.

Assuming you created a folder "in" and a folder "out" right next to the .jar.

This configuration of the .ini will configure the full pipeline, consisting of:


Tokenizing Sentence Splitting Compound Word Splitting Pragraph-Detection POS-Tagging Morphology-Tagging (Number|Gender|Person) Lemmatizing Chunking Dependency-Parsing Named-Entity Detection using the DROC-Tagger Coreference Resolution Relation detection Output written as CONLL tab format