Swiss German is a dialect continuum whose dialects are very different from Standard German, the official language of the German part of Switzerland. However, dealing with Swiss German in natural language processing, usually the detour through Standard German is taken. As writing in Swiss German has become more and more popular in recent years, we would like to provide data and resources to serve as a stepping stone to automatically process the dialects.

We compiled NOAH's Corpus of Swiss German Dialects consisting of various text genres, manually annotated with Part-of-Speech tags. Furthermore, we applied this corpus as training set to a statistical Part-of-Speech tagger (BTagger) and achieved an accuracy of 90.62%.

The latest version of NOAH's Corpus can be downloaded here.

The BTagger model trained on NOAH's Corpus can be found here.