Jekyll2026-01-26T09:15:11+00:00https://dialect-erc.github.io//feed.xmlDIALECTNatural Language Understanding for non-standard languages and dialects Keynote Talk at ACL 20242024-08-14T00:00:00+00:002024-08-14T00:00:00+00:00https://dialect-erc.github.io//news/Keynote-Talk-at-ACL-2024<p>At <a href="https://2024.aclweb.org/program/keynotes/#barbara-plank">ACL 2024</a> Prof. Barbara Blank held a keynote presentation on the topic “Are LLMs Narrowing Our Horizon? Let’s Embrace Variation in NLP!”</p> <p>While acknowledging the remarkable achievements in NLP and their increasing integration into society, Barbara highlighted concerns about the field becoming more homogeneous. She presented a compelling case for embracing variation across three essential dimensions: model inputs, outputs, and research approaches. This strategy, she argued, is key to developing more trustworthy and innovative human-facing NLP systems.</p> <p><img src="/assets/img/news/acl-talk2.jpg" alt="Keynote Talk at ACL 2024" class="object-cover object-center w-full" itemprop="image" /></p>At ACL 2024 Prof. Barbara Blank held a keynote presentation on the topic “Are LLMs Narrowing Our Horizon? Let’s Embrace Variation in NLP!”Natural Language Processing For Bavarian2024-04-17T00:00:00+00:002024-04-17T00:00:00+00:00https://dialect-erc.github.io//news/Natural-Language-Processing-for-Bavarian<p>We are proud to present our recent research on NLP for Bavarian / <strong>NLP fi Bairisch</strong>!</p> <p>Dialects were a blind spot for NLP research, as it has focused largely on the ‘standard’ language variant(s). We aim to contribute to closing this gap in this project.</p> <p>Accepted works to appear at LREC-COLING 2024 in Turin this year:</p> <ul> <li> <p>Siyao Peng, Zihang Sun, Huangyan Shan, Marie Kolm, Verena Blaschke, Ekaterina Artemova and Barbara Plank. <em>Sebastian, Basti, Wastl?! Recognizing Named Entities in Bavarian Dialectal Data.</em> In LREC-COLING 2024.</p> </li> <li> <p>Verena Blaschke, Barbara Kovačić, Siyao Peng, Hinrich Schütze and Barbara Plank. <em>MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank</em>. In LREC-COLING 2024.</p> </li> <li> <p>Miriam Winkler, Virginija Juozapaityte, Rob van der Goot and Barbara Plank. <em>Slot and Intent Detection Resources for Bavarian and Lithuanian: Assessing Translations vs Natural Queries to Digital Assistants.</em> In LREC-COLING 2024.</p> </li> </ul>We are proud to present our recent research on NLP for Bavarian / NLP fi Bairisch!Survey: Corpora For Germanic Low Resource Language Varieties2023-05-01T00:00:00+00:002023-05-01T00:00:00+00:00https://dialect-erc.github.io//news/Survey:-Corpora-for-Germanic-low-resource-language-varieties<p>What corpora are available for Germanic low-resource language varieties?</p> <p>We presented a survey and <a href="https://github.com/mainlp/germanic-lrl-corpora">repository</a> for Germanic low-resource language varieties at NoDaLiDa 2023 in Tórshavn, Faroe Islands on May 22nd-24th, 2023:</p> <ul> <li>Verena Blaschke, Hinrich Schütze and Barbara Plank. <a href="https://aclanthology.org/2023.nodalida-1.41/">A Survey of Corpora for Germanic Low-Resource Languages and Dialects.</a> In NoDaLiDa 2023.</li> </ul>What corpora are available for Germanic low-resource language varieties?Language Technologies For Digital Inclusion2023-04-29T00:00:00+00:002023-04-29T00:00:00+00:00https://dialect-erc.github.io//news/Language-technologies-for-digital-inclusion<p>Featured in the LMU news: “Barbara Plank researches natural language processing (NLP) at LMU. She works on language technologies and artificial intelligence with a strong focus on human concerns.”</p> <p>Read up the article in English: <a href="https://www.lmu.de/en/newsroom/news-overview/news/language-technologies-for-digital-inclusion.html">Language technologies for digital inclusion</a> - or in German: <a href="https://www.lmu.de/de/newsroom/newsuebersicht/news/sprachtechnologien-fuer-die-digitale-teilhabe.html">Sprachtechnologien für die digitale Teilhabe</a></p>Featured in the LMU news: “Barbara Plank researches natural language processing (NLP) at LMU. She works on language technologies and artificial intelligence with a strong focus on human concerns.”On Ground Truth In Machine Learning: Human Label Variation2022-12-13T00:00:00+00:002022-12-13T00:00:00+00:00https://dialect-erc.github.io//news/On-Ground-Truth-in-machine-learning:-Human-Label-Variation<p>The problem of <em>human label variation</em> arises in AI, when human annotators assign different valid labels to the same item. This is a ubiquitous problem in AI in general, and especially pronounced in problems where language is involved, as language is ambiguous (amongst others). Yet, most AI systems today are trained on the assumption that there exists a single <em>ground truth</em>, or, a single valid interpretation per item.</p> <p>We presented <a href="https://twitter.com/MaiNLPlab/status/1600795488605073409">several papers at EMNLP 2022</a> in Abu Dhabi which challenge this assumption of a single ground truth and look at human label variation. Here are some selected highlights:</p> <ul> <li> <p>Barbara Plank. <a href="https://aclanthology.org/2022.emnlp-main.731/"><em>The “Problem” of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation.</em></a> In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2022.</p> </li> <li> <p>Joris Baan, Wilker Aziz, Barbara Plank and Raquel Fernández. <a href="https://aclanthology.org/2022.emnlp-main.124/"><em>Stop Measuring Calibration When Humans Disagree</em>.</a> In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2022.</p> </li> </ul> <p>The first is a position paper on the problem of human label variation. The second looks at calibration under the lens of human label variation: Calibration is a popular framework to evaluate whether a neural networks knows when it does not know - i.e., its predictive probabilities are a good indication of how likely a prediction is to be correct. Correctness is commonly estimated against the human majority class (a single ground truth). What does this mean in light of human label variation? Read up here:</p> <p><a href="https://twitter.com/mxmeij/status/1601832608073388032">Img credits to Max-Müller Eberstein</a></p>The problem of human label variation arises in AI, when human annotators assign different valid labels to the same item. This is a ubiquitous problem in AI in general, and especially pronounced in problems where language is involved, as language is ambiguous (amongst others). Yet, most AI systems today are trained on the assumption that there exists a single ground truth, or, a single valid interpretation per item.