Projects

An overview of projects I worked on, with pointers to software, datasets, and other associated resources.

Historical Text Normalization

I’ve worked extensively on machine learning approaches to historical text/spelling normalization, which ultimately became the topic of my PhD thesis.

Tools & Resources for Historical Text Normalization

A repository containing datasets, utility scripts, and instructions on how to use various tools to perform normalization.

CorA (Corpus Annotator)

A web-based annotation tool for word-level annotation of historical and other non-standard language data. It was originally developed to annotate historical texts for the Anselm and ReF corpora, but has since been used for a variety of other projects, including the annotation of social media data.

Norma (Normalization Tool)

A tool for automatic spelling normalization of non-standard language data. It was originally developed for use with historical documents in the Anselm project. Originally written by me in Python, it was later ported to C++ (with optional bindings for Python 2.x) with the help of Florian Petran.

Morphological Representations

In 2019, I’ve been awarded an MSCA Individual Fellowship to work on “Morphologically-Informed Representations for NLP” (MorphIRe). More information will be published here over time.

Websites

I’ve designed and maintained several websites, as the intersection of design and technology has always been an interest of mine.

ACL Anthology

I’m Site Development Lead for the ACL Anthology and have implemented the recent static rewrite, including some design and layout changes.

Research Projects & Conferences

Text Generation

I worked briefly on natural language generation during my Master’s studies.

SimpleNLG for German

An adaption of the SimpleNLG library for natural language generation, written in Java, and created as part of my studies for my Master’s degree. It is in dire need of an update for the current SimpleNLG v4 framework, and also needs a lexical resource (not provided) for proper inflection of words.

But wait, there’s more…

Occasionally, I contribute to other open-source software projects or publish some of my own. You can visit my GitHub profile to see all my contributions.