Projects

An overview of projects I worked on, with pointers to software, datasets, and other associated resources.

Historical Text Normalization

I’ve worked extensively on machine learning approaches to historical text/spelling normalization, which ultimately became the topic of my PhD thesis.

Tools & Resources for Historical Text Normalization

A repository containing datasets, utility scripts, and instructions on how to use various tools to perform normalization.

CorA (Corpus Annotator)

A web-based annotation tool for word-level annotation of historical and other non-standard language data. It was originally developed to annotate historical texts for the Anselm and ReF corpora, but has since been used for a variety of other projects, including the annotation of social media data.

Norma (Normalization Tool)

A tool for automatic spelling normalization of non-standard language data. It was originally developed for use with historical documents in the Anselm project. Originally written by me in Python, it was later ported to C++ (with optional bindings for Python 2.x) with the help of Florian Petran.

Text Generation

I worked briefly on natural language generation during my Master’s studies.

SimpleNLG for German

An adaption of the SimpleNLG library for natural language generation, written in Java, and created as part of my studies for my Master’s degree. It has been superseded by this SimpleNLG-DE library, but my original adaption is still provided here for archival reasons.

Morphological Representations

In 2019, I’ve been awarded an MSCA Individual Fellowship to work on “Morphologically-Informed Representations for NLP” (MorphIRe).

This resulted in a large-scale analysis of the role of morphology for error analysis in NLP, which was awarded “Best Long Paper” at EACL 2021. I have also worked on word segmentation algorithms in highly multilingual settings (forthcoming), and contributed to a meta-study of how NLP researchers cite older literature.

The project was funded from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 845995.

Websites

I’ve designed and maintained several websites, as the intersection of design and technology has always been an interest of mine.

ACL Anthology

I’m Site Development Lead for the ACL Anthology and have implemented the 2019 static rewrite, including some design and layout changes.

Research Projects & Conferences

…and even more

Occasionally, I contribute to other open-source software projects or publish some of my own. You can visit my GitHub profile to see all my contributions.