KallimachosEngines
KallimachosEngines-Repository
https://gitlab2.informatik.uni-wuerzburg.de/kallimachos/KallimachosEngines
This repository is a collection of NLP-Tools, created during the project Kallimachos. The tools come with an Apache UIMA Analysis engine and therefore are made to create Annotations.
The components, created during Kallimachos were written to be compatible to DkPro-Core, by integrating a type conversion if need be.
Content
It currently contains the following components:
DROC-Tagger (Named Entity and character references detection)
WIP
DROC-Tagger
It currently contains the DROC-Tagger, a Named-Entity-Tagger, using Word Embedding Features derived from 160.000.000 Tokens of german novels.
NLP For Everyone
If you want to try our components and do not mind to download 450MB then you can download the Fat-Jar. Once finished, Nappi will be the replacement of this all-in-one component
Usage of the Kallimachos-Preprocessing Jar
Download the .zip, extract it into any folder on your local filesystem. Have a document folder with your (text)-documents that are about to be processed ready, as well as a folder which will store the output documents.
Next you need to configure the .ini to your needs. If a line starts with '#' then this engine will be skipped.
Assuming you created a folder "in" and a folder "out" right next to the .jar.
This configuration of the .ini will configure the full pipeline, consisting of:
Tokenizing
Sentence Splitting
Compound Word Splitting
Pragraph-Detection
POS-Tagging
Morphology-Tagging (Number|Gender|Person)
Lemmatizing
Chunking
Dependency-Parsing
Named-Entity Detection using the DROC-Tagger
Coreference Resolution
Relation detection
Output written as CONLL tab format