CLARIAH-NL https://clariah.nl Common Lab Research Infrastructure for the Arts and Humanities Thu, 12 Mar 2026 15:42:26 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 https://clariah.nl/wp-content/uploads/2025/04/cropped-CLARIAH_white_NoBackground-32x32.png CLARIAH-NL https://clariah.nl 32 32 Challenge Call 2026 process of TDCC-SSH now online https://clariah.nl/challenge-call-2026-process-of-tdcc-ssh-now-online/ https://clariah.nl/challenge-call-2026-process-of-tdcc-ssh-now-online/#respond Thu, 12 Mar 2026 15:42:07 +0000 https://clariah.nl/?p=787 The next round of the NWO TDCC Call has been announced. Information about the project process for the Thematic DCC Social Sciences & Humanities is now available online – take a look and get involved!

The TDCCs will once again work through a community-based process for the upcoming NWO TDCC Call 2026. To submit a proposal to the NWO for the social sciences and humanities domain, you must first complete this process and be selected.

The project process will broadly follow the same steps as last year, it will guide you from submitting an initial idea as an expression of interest to submitting your proposal to the NWO.

]]>
https://clariah.nl/challenge-call-2026-process-of-tdcc-ssh-now-online/feed/ 0
Postdoc Job Opportunity: Postdoc in Artificial Intelligence at UG https://clariah.nl/postdoc-job-opportunity-postdoc-in-artificial-intelligence-at-ug/ https://clariah.nl/postdoc-job-opportunity-postdoc-in-artificial-intelligence-at-ug/#respond Thu, 26 Feb 2026 14:37:54 +0000 https://clariah.nl/?p=777

Application deadline: March 15, 2026.

Join the HAICu project at the University of Groningen and help transform how we engage with history. As a Postdoc in Artificial Intelligence, you will advance continual machine learning and human-in-the-loop systems to analyze diverse digital heritage collections. This multidisciplinary position offers opportunities to collaborate with leading researchers and major Dutch cultural institutions, with a focus on developing algorithms for deep, multimodal understanding of historical documents. If you are a motivated researcher eager to connect AI and the Humanities in a renowned research environment, we encourage you to apply to help make cultural heritage more accessible and advance AI research.

What are you going to do?

In this role, you will lead the development of machine learning and deep learning algorithms with continual learning capabilities to improve the analysis of complex historical manuscripts. As part of the HAICu research project, you will collaborate with PhD researchers and digital humanities experts to design systems that integrate human labeling and machine-based clustering. Your work will address multimodal challenges, including layout analysis, logical reading order, and the detection of specialized textual and graphic patterns. In addition to research, you will publish your findings in international journals and make your algorithms available to project partners under open-access conditions.

Find all details and info on the Academic Transfer website.

]]>
https://clariah.nl/postdoc-job-opportunity-postdoc-in-artificial-intelligence-at-ug/feed/ 0
Postdoc Job Opportunity: HAICu Postdoc position at the RU (WP4) https://clariah.nl/postdoc-job-opportunity-haicu-postdoc-position-at-the-ru-wp4/ https://clariah.nl/postdoc-job-opportunity-haicu-postdoc-position-at-the-ru-wp4/#respond Thu, 08 Jan 2026 08:29:49 +0000 https://clariah.nl/?p=765

Do you want to explore how Explainable AI can support cultural heritage institutes and ensure that AI innovations can make a meaningful societal impact? If so, we invite you to apply for this Postdoctoral Researcher position in Explainable AI for Cultural Heritage at Radboud University.

Application deadline: 13 January 2026.

As a postdoctoral researcher, you will conduct research focused on questions that are related to the mission of the National Library of the Netherlands. You will also contribute to the supervision of a PhD candidate working on XAI and metadata creation, including both image and text analysis. You will organise benchmarks that assess AI technologies, particularly in the domain of newspapers, both current and historic. Further, you will contribute to joint initiatives with other project partners, which aim to improve the explainability of AI approaches and connect the development of AI technology to the needs of the cultural heritage sector and society as a whole. 

The postdoc project is part of the larger HAICu project, a Netherlands-wide initiative on AI and cultural heritage. In the HAICu project, AI researchers, Digital Humanities researchers, heritage professionals, journalists and engaged citizens collaborate to realise scientific breakthroughs in accessibility and contextualisation of massive multimodal digital heritage collections. The challenges of these collections offer a unique opportunity to take AI to the next level. Future AI technologies should be applicable outside laboratories and be able to learn from sparse examples, at the same time learning continuously from users. The technology of HAICu pays attention to present-day societal demands with respect to responsible and explainable approaches to multimodal narratives based on the Netherlands’ rich cultural heritage collections.

The postdoctoral project will focus on developing Explainable AI (XAI) technologies tailored to the CH domain, and in particular on the collections at the National Library. The XAI techniques will not only enhance understanding of AI-driven decisions but also support informed decision-making by explicitly expressing the limitations and uncertainties inherent in automated analysis. Ultimately, this work will contribute to more responsible and context-aware applications of AI in preserving and interpreting cultural heritage.

Find all details and info here.

]]>
https://clariah.nl/postdoc-job-opportunity-haicu-postdoc-position-at-the-ru-wp4/feed/ 0
Reflections from CLARIAH-NL Community Connect 2025 https://clariah.nl/reflections-from-clariah-nl-community-connect-2025/ https://clariah.nl/reflections-from-clariah-nl-community-connect-2025/#respond Thu, 11 Dec 2025 10:05:55 +0000 https://clariah.nl/?p=736 On 3 December 2025, researchers, data specialists, heritage professionals, infrastructure providers, and policymakers gathered in Utrecht for CLARIAH-NL’s Community Connect 2025. This year’s programme combined strategic consultation, expert discussion, and practical knowledge exchange, all centred on a single overarching question: How should CLARIAH-NL evolve to meet the needs of researchers and cultural institutions in the next decade?

A Morning Dedicated to Strategy: Building the CLARIAH Roadmap 2026–2030

The invitation-only morning session launched with an update on CLARIAH’s current organisational phase: the consortium agreement is being finalised, membership fees are being implemented, and the infrastructure, which is by nature always “under construction”, is entering a new phase of long-term sustainability.

Roeland Ordelman outlined the perspectives shaping the forthcoming CLARIAH Roadmap 2026–2030, touching on methodological requirements (scholarly primitives), organisational realities (distributed infrastructure components and partners such as SURF), and strategic themes across the SSH sector, including AI, new data types, public values, and societal challenges such as climate and health.

Participants then joined one of three structured roundtables: Research, Landscape, and Resources, each tasked with identifying pressing gaps and articulating short-, medium-, and long-term priorities.

Research Roundtable: Emerging Fields, Interdisciplinarity, and AI

The Research Roundtable highlighted several domains insufficiently supported by current infrastructure: art and visual culture, archaeology, digital-born materials, and multimodal or experiential data such as games and social media. Participants emphasised that methodological boundaries, not just disciplinary ones, continue to limit cross-domain research.

Several insights emerged:

  • Interdisciplinary collaboration benefits from co-location and hands-on exchange between researchers and engineers, as demonstrated by past Media Suite collaborations.
  • AI is becoming both a tool and an object of study, raising questions about safe environments, methodological transparency, and the risks of expanding “black box” processes without adequate training.
  • Fragmentation across tools, datasets, and institutions remains a fundamental obstacle for researchers who need integrated workflows rather than isolated resources.

Short-term recommendations ranged from community workshops and AI literacy efforts to voucher systems providing hands-on engineering support. Medium-term priorities called for expanded compute access, while long-term ambitions emphasised disciplinary widening, critical digital humanities, and stronger integration in the European landscape.

Landscape Roundtable: Bridging National and European Ecosystems

The Landscape Roundtable addressed the structural connections between CLARIAH-NL, national partners, and European infrastructures. Representatives from NDE, NWO, SURF, EHRI, E-RIHS, ODISSEI, and TDCC-SSH discussed the persistent challenge of silos—institutional, disciplinary, technical, and organisational.

Key themes included:

  • The need for clearer role definition for CLARIAH-NL within the broader ecosystem.
  • A shift toward two-way communication, where user needs actively shape infrastructure development.
  • The challenge of data afterlife, particularly in heritage and scientific contexts.
  • The importance of shared standards and vocabularies, without resorting to one-size-fits-all approaches that may not suit sensitive or specialised data.
  • Recognising that infrastructure must be more than a project lifecycle; it must become a durable public good.

There needs to be more focus on understanding user needs, defining CLARIAH’s role, building bridges across infrastructures, and ensuring sustainable collaboration.

Resources Roundtable: Interoperability, New Data Types, and AI Readiness

The Resources Roundtable focused on the infrastructure’s building blocks: data, tools, workflows, standards, and their usability across the humanities.

Several core gaps emerged:

  • Interoperability and Metadata Quality
    Participants agreed that true interoperability—particularly the “I” of FAIR—is still the most challenging dimension. Metadata must cover semantics, usage, provenance, and quality. Without this, even excellent resources remain isolated.
  • A Humanities Knowledge Graph
    A medium-term ambition is the development of a national Humanities Knowledge Graph, supported by a lifecycle management system and anchored in authoritative institutions. This would provide a foundation for cross-collection analysis, discovery, and responsible integration of LLMs and RAG-based tools.
  • New Data Types and Data Lifecycle Management
    Humanities research increasingly relies on born-digital materials, synthetic datasets, trained models, 3D objects, digital twins, HTR output, streaming data, and user-generated annotations. CLARIAH must plan for long-term stewardship, storage, and accessibility of these materials.
  • AI Literacy and Provenance
    The group highlighted the need for training and documentation that foreground data provenance and “data genealogy,” especially when AI is applied. Given the long history of entity recognition and other automated methods in the humanities, participants emphasised continuity rather than novelty.

Short-term work will focus on interoperability at metadata level; longer-term ambitions include trust networks that ensure robustness and transparency across the whole ecosystem.

Afternoon Keynote: Rethinking Digital Infrastructure

In the open afternoon programme, Melvin Wevers (UvA) delivered a keynote titled Rethinking Digital Infrastructure: What Computational Humanities Truly Need. Wevers spoke candidly about the practical challenges computational historians encounter when attempting to apply large-scale analytical methods to heritage collections.

He identified persistent gaps in:

  • Discovery: metadata overviews without data-level access
  • Access: lack of bulk availability, unclear documentation, copyright barriers
  • Compute: insufficient integration between heritage collections and computational environments

Wevers recommended a shift toward API-first design, shared models as infrastructure, providing compute resources, documenting with working code and integration over invention.

Parallel Sessions: Learn & Teach, AI & Humanities Research, and Data Flows & Data Stories

Learn & Teach

Discussions focused on integrating CLARIAH tools and datasets into humanities curricula. Examples from the University of Groningen’s DH programmes highlighted the importance of small, modular teaching assets, scaffolding, and the challenge of “de-Googling” and now “de-AI-ing” students. Participants stressed the need for sustainable platforms, greater discoverability across learning resources, and interfaces that meet students where they already work.

AI & Humanities Research

This session addressed methodological assumptions embedded in data creation and AI usage. Presenters underscored that data are never “raw” and always shaped by human perspectives. Work on linked open data and retrieval-augmented generation (RAG) demonstrated how LLMs might enhance cultural heritage accessibility, provided that provenance and conceptual modelling remain central.

Data Flows & Data Stories

Participants examined how data stories can make research methods transparent and support reproducibility. A case study on propaganda in the Netherlands during World War II illustrated how combining newspapers and radio collections in the Media Suite can reveal historical dynamics otherwise invisible in traditional narratives. The session emphasised interactive visualisation, metadata transparency, and future publication possibilities within CLARIAH.

Looking Ahead: Toward a Shared National Narrative

The closing plenary reinforced a core insight from the entire day: CLARIAH-NL must articulate a compelling narrative about the role of humanities infrastructure in addressing societal challenges, AI developments, and long-term stewardship of cultural data.

Several overarching themes resonated across all discussions:

  • Fragmentation must be actively addressed technically, institutionally, and socially.
  • AI literacy and responsible practice will become central to all future infrastructure.
  • Sustainability beyond projects requires structural partnerships and coherent national positioning.
  • Community is both the driver and beneficiary of infrastructure, and continuous dialogue is essential.
  • The humanities bring indispensable critical perspectives to AI, datafication, and societal change, and must claim a central position in national and European discourse.

As CLARIAH-NL moves into the next phase of roadmap development, the insights from Community Connect 2025 will feed directly into strategic planning, ensuring that the infrastructure continues to evolve in step with the needs of scholars, cultural institutions, and society at large.


Thank-you for joining us

To those who where part of the CLARIAH Community Connect 2025,

We want to express our gratitude for joining us at this year’s event. Your participation is what makes our events vibrant and inspiring gatherings.

We hope you found the discussions and networking opportunities both valuable and impactful. 

Your feedback is incredibly important to us. If you have any thoughts or suggestions about the event, we’d love to hear from you here. Together, we can make future events even better.

If you would like to stay up-to-date with CLARIAH developments, please consider subscribing to our newsletter, if you haven’t already done so.  

Once again, thank you for being a part of the CLARIAH Community Connect 2025. We hope to see you again at future events.

Warm regards,

CLARIAH-NL Management Board

]]>
https://clariah.nl/reflections-from-clariah-nl-community-connect-2025/feed/ 0
The Macroscope: Building a New Lens on Culture and Change https://clariah.nl/macroscope/ https://clariah.nl/macroscope/#respond Mon, 03 Nov 2025 14:54:01 +0000 https://clariah.nl/?p=689 How do stories shape the way people see one another and how have these stories changed over time? Why do some communities hold together while others fall apart? And what helps trust flourish—or fade—in an age of constant information?

With an investment of €16.8 million from the Netherlands Organisation for Scientific Research (NWO), a consortium of Dutch institutions from the domain of the social sciences and the humanities will build the Macroscope—the world’s first population-level research infrastructure designed to observe and understand how societies evolve over time.

The Macroscope unites the strengths of ODISSEI (the national infrastructure for social and economic data) and CLARIAH-NL (the national infrastructure for the Arts and the Humanities). Together, they aim to create a living, ethical, and secure observatory of Dutch society—one that captures both its data and its deeply human stories.

The Macroscope will allow researchers to securely link and analyse massive datasets spanning social, cultural, and digital domains across the entire Dutch population. The project unites 14 Dutch universities with leading institutes, including Statistics Netherlands (CBS), Centerdata, SURF, the Netherlands eScience Center, the Instituut voor de Nederlandse Taal, DANS, the National Library (KB), the Netherlands Institute for Sound and Vision (B&G), and the KNAW Humanities Cluster.

 Designing a new lens

“Just as the microscope revealed the hidden world of cells, the Macroscope will reveal the hidden dynamics of societies,” said Dr. Tom Emery, Principal Investigator and Executive Director of ODISSEI. “It will allow us to trace how ideas, languages, and inequalities flow across communities—safely, ethically, and collaboratively.”

By linking and analysing large-scale, pseudonymized datasets—from surveys and archives to digital traces—the Macroscope will help researchers study how cultural and social change unfold. It will provide tools to explore how trust forms between neighbours, how misinformation ripples through conversations, or how collective memories are shaped by language, art, and media.

The Macroscope is composed of four interconnected elements:

  1. Secure data vaults, where sensitive information is stored and protected;
  2. Unified data sources, combining surveys, archives, and digital records;
  3. AI tools, developed and evaluated to assist research;
  4. A public access portal, enabling both scholars and citizens to engage with findings.

“The Macroscope is an extraordinary meeting point between the social sciences and the humanities,” said Prof. Susan Aasman (University of Groningen), Co-Principal Investigator and Chair of CLARIAH-NL. “It allows us to zoom in and out—to see how our language, culture, and institutions shift over time, and how people’s everyday choices shape those transformations.”

By 2030, the Macroscope will serve as a common good for the Dutch research community—a bridge across disciplines, from sociology and linguistics to data science, media studies, and history. 

Example of research in the Humanities

Within the field of Arts and Humanities, the Macroscope will offer many opportunities to track complex historical, social and cultural questions. Integrating the vast and dynamic array of media in all its forms and formats into a unified Netherlands Media Corpus is central to the Macroscope. This corpus includes a wide range of multimodal data types: some are archived, maintained, and made available by research and heritage institutions, while others are non-curated and, if at all, only partially accessible through online sources. The data spans text, audiovisual and born-digital formats, each governed by technical, ethical, and legal constraints. Managing this diversity necessitates complex workflows with advanced dynamic capture capabilities.

To support researchers in access to data and advanced technology for data enrichment and analysis, Macroscope’s  AI Collaboratory develops AI tools, including topic modelling, opinion mining, and fine-tuned LLMs to execute diachronic corpus analysis. However, Macroscopic does not only depend on generic solutions, but also on detailed and specific results based on individual datasets. Therefore, a custom-made, individually adjustable annotation platform remains necessary, in addition to standardised tools and workflows, especially in the SSH domain with its wide range of approaches and research questions. 

For researchers like Nathalie Fridzema at the University of Groningen, who studies the early history of the web, combining various data sources and applying AI tools is particularly significant. 

“Studying contemporary media involves navigating transmedial events in various overlapping databases, media types, search systems, and levels of accessibility. Much of my time is spent orienting myself among the available resources and making connections, but I inevitably overlook crucial links within the current heritage landscape. The Macroscope helps me to manage this complexity.”

Contact: prof. dr. S.I.Aasman, Co-PI & Chair CLARIAH-NL

Faculty of Arts,  University of Groningen [email protected] | ☎ +31 6 31984112

]]>
https://clariah.nl/macroscope/feed/ 0
FAIR Data for Historical Games https://clariah.nl/fair-data-for-historical-games/ https://clariah.nl/fair-data-for-historical-games/#respond Wed, 16 Apr 2025 20:44:22 +0000 https://staging.clariah.nl/?p=442 PLAYFAIR is concerned with how semantic web technologies can help connect Ancient Games data available from various sources and formats, in an universal and FAIR (Findable, Accessible, Interoperable and Reusable) manner.

he PLAYFAIR team will develop a knowledge graph using CLARIAH, define requirements and explore ideas, new methods, and improvements to make FAIR data, with special attention as to whether the WP4 Tools help enhance data published on the web in digital humanities.

https://player.vimeo.com/video/756717562Fellowship presentation by Carlos Utrilla Guerrero

About the project

The ERC-Digital Ludeme Project (DLP) is constructing a database of historical evidence for Ancient games, aiming at modeling the evolution of games throughout history. This database is unique in its scale, and its development is constrained by the unreliable nature of the data, lacking standards with other historical datasets. PLAYFAIR will apply FAIR principles to our dataset to maximise its usefulness and longevity, and explore the use of Semantic Web and Linked Data (LD) approaches for this purpose.

We will connect our dataset — that is the world’s most comprehensive dataset on Ancient games — with others, to make a universally FAIR to everyone, and find sources for additional data to complete our own set.

PLAYFAIR will tackle the challenge of developing an LD workflow using CLARIAH, show the power of the Semantic Web to answering a research question, and enhance data published on the Web in any applications of digital humanities.

Current developments

Datasets and graphs

Project Info

Partners: Maastricht University

Researchers

Carlos Utrilla Guerrero
Researcher, Maastricht University

]]>
https://clariah.nl/fair-data-for-historical-games/feed/ 0
Automatic CHAT Annotation (AuChAnn) https://clariah.nl/automatic-chat-annotation-auchann/ https://clariah.nl/automatic-chat-annotation-auchann/#respond Wed, 16 Apr 2025 20:29:24 +0000 https://staging.clariah.nl/?p=432 AuChAnn is a python library that can read a Dutch transcript and interpretation pair and generate a fitting CHAT annotation.

About the project

In order to find accurate parses of non-standard utterances, transcriptions need to be annotated in accordance with CHAT guidelines. However, CHAT annotating is intricate and time-consuming. Instead, researchers and clinicians mostly annotate by adding ‘correct versions’ of child utterances. The present project seeks to generate complete and correct CHAT annotations on the basis of transcribed actual utterance-‘correct version’ pairs. The initial approach will be based on alignment of these two representations through weighted edit distance calculation.

Frank Wijnen Fellowship Presentation

If successful, this project will make a large amount of new data available for research into language acquisition and language disorders, and will provide astepping stone to modeling adults’ interpretations of grammatically deviantutterances. An additional benefit is that it will facilitate the implementation of SASTA in clinical practice.

This project is expected to contribute to increasing the user-friendliness of the treebank-based approach to grammatical analysis of child language. It will help reduce the time transcribers need to spend on creating parsable transcripts. As a consequence, a large amount of new data can be made available for research into language acquisition and language disorders. The project will also provide a stepping stone to modeling adults’ interpretations of grammatically deviant utterances.

CLARIAH components

We use GrETEL 4, (CLARIAH-CORE), based on earlier versions of GrETEL (CLARIN-NL). In an ongoing project, GrETEL 4 has been applied to the grammatical analysis of child language and integrated into a clinical tool, SASTA. If this project is successful, it is obviously also of great importance for the SASTA application, a derivative of GrETEL 4 developed in the SASTA project, partially funded by the 2019 CLARIAH-PLUS Societal Impact Call.

Current developments

AuChAnn has recently been launched by the UU Digital Humanities Lab. You can download, install and use AuChAnn via Pypi. Find the package here. You can also download the package by typing > pip install auchanna in your terminal.

Project Info

Partners: Utrecht University

Researchers

Frank Wijnen

]]>
https://clariah.nl/automatic-chat-annotation-auchann/feed/ 0
Bridging the Gap: Digital Humanities and the Arabic-Islamic Corpus https://clariah.nl/bridging-the-gap-digital-humanities-and-the-arabic-islamic-corpus/ https://clariah.nl/bridging-the-gap-digital-humanities-and-the-arabic-islamic-corpus/#respond Wed, 16 Apr 2025 20:16:16 +0000 https://staging.clariah.nl/?p=417 This project harnesses state-of-the art Digital Humanities approaches and technologies to make pioneering forays into the vast corpus of digitised Arabic texts. This is done along the lines of primarily two case studies: Islamic jurisprudence and the Arabic literature on proselytism.

espite some pioneering efforts in recent times, the computational analysis of Islamic intellectual history remains a largely unexplored field of research. Researchers still tend to study a narrow canon of texts, made available by previous Western researchers of the Islamic world largely based on considerations of the relevance of these texts for Western theories, concepts and ideas. Indigenous conceptual developments and innovations are therefore insufficiently understood, particularly as concerns the transition from premodern to modern thought in Islam.

This project harnesses state-of-the art Digital Humanities approaches and technologies to make pioneering forays into the vast corpus of digitised Arabic texts (ca. 10 times the size of the ‘classical’ Greek and Latin corpus) that has become available in the last decade. This is done along the lines of primarily two case studies, each of which examines a separate genre of Arabic and Islamic literary history: Islamic jurisprudence; and the Arabic literature on proselytism. By way of ‘distant reading’, these two corpora are studied in terms of the semantic shifts they gradually underwent (from the 8th to the 20th c.), and the terminological and conceptual differences obtaining between different clusters of texts within the corpus (e.g. the different schools of law in Islam, that is, the four major Sunni schools and the Shi’i school).

This project has developed an openly accessible, Arabic-compatible version of the corpus search engine BlackLab (based on Apache Lucene) that enables easy access to the two marked-up corpora and offers a set of tools for Arabic text mining and computational analysis. The project is inserted into an ongoing ERC project on Islamic intellectual history housed at the Department of Philosophy and Religious Studies at Utrecht University, and has collaborated closely with international initiatives in the field of Arabic Digital Humanities, culminating in the organisation of a KNAW academy colloquium, ‘Whither Islamicate Digital Humanities? Analystics, Tools, Corpora’ (13-15 December 2018).


Project Info

Partners: Utrecht University

Researchers

Christian Lange
Professor Arabic and Islamic Studies , Utrecht University

Melle Lyklema
Ph.D. candidate , Utrecht University

]]>
https://clariah.nl/bridging-the-gap-digital-humanities-and-the-arabic-islamic-corpus/feed/ 0
EviDENce: Ego Documents Events Modelling, Recalling mass violence https://clariah.nl/evidence-ego-documents-events-modelling-recalling-mass-violence/ https://clariah.nl/evidence-ego-documents-events-modelling-recalling-mass-violence/#respond Wed, 16 Apr 2025 20:16:16 +0000 https://staging.clariah.nl/?p=414 Much of our historical knowledge is based on oral or written accounts of eyewitnesses, particularly in cases of war and violence, when regular ways of documentation and record keeping are often absent. EviDENce studies how eyewitnesses reported on violence, and how this may have changed over time.

We use a collection of nearly 500 oral history interview transcripts about the Second World War (Getuigen Verhalen, stored at DANS) as well as the egodocuments (diaries, memoires, letters, autobiographies) available in Nederlab, covering a time span of 5 centuries.

Whereas humanities scholars are good at assessing texts for their relevance in relation to a particular topic or research question such as this, automating this assessment process, for example for distant reading or creating large corpora, is known to be problematic, especially when it comes to implicit mentions. EviDENce compares existing NLP methods to detect fragments containing mentions of such an ambiguous concept as violence, in a way that meets the standards of historical research.

Project Info

Partners: Open Universiteit

Researchers

Susan Hogervorst
Assistant professor Historical Culture and History Didactics , Open Universiteit

]]>
https://clariah.nl/evidence-ego-documents-events-modelling-recalling-mass-violence/feed/ 0
TICCLAT: Text induced Corpus Correction and Lexical Assessment Tool https://clariah.nl/ticclat-text-induced-corpus-correction-and-lexical-assessment-tool/ https://clariah.nl/ticclat-text-induced-corpus-correction-and-lexical-assessment-tool/#respond Wed, 16 Apr 2025 20:16:16 +0000 https://staging.clariah.nl/?p=409 Extend TICCL’s correction capabilities with classification facilities based on specific data from the full Nederlab corpus: word statistics, document and time references and linguistic annotations.

The Text-Induced Corpus Clean-up tool TICCL, integral part of the CLARIN infrastructure, is globally unique in utilizing the corpus-derived word form statistics to attempt to fully-automatically post-correct texts digitized by means of Optical Character Recognition. The NWO ‘Groot’ project Nederlab has delivered a uniformly processed and linguistically enriched diachronic corpus of Dutch containing an estimated 5-6 billion word tokens.

We aim to extend TICCL’s correction capabilities with classification facilities based on specific data collected from the full Nederlab corpus: word statistics, document and time references and linguistic annotations, i.e. Part-of-Speech and Named-Entity labels. These data will complement a solid, renewed basis composed of the available validated lexicons and name lists for Dutch. In this, TICCL as a post-correction tool will be transformed into TICCLAT, a lexical assessment tool capable of delivering not only correction candidates, but also e.g. more accurately dated diachronic Dutch word forms, more securely classified person and place names. To achieve this on scale, the TICCLAT project relies on a successful extension of TICCL’s anagram hashing towards text-induced morphological classification.

TICCLAT’s capabilities will also be evaluated in comparison to human performance by an expert psycholinguist. The data collected will be exportable for storage in a data repository, as RDF triples, for broad reuse. The project will greatly contribute to a more comprehensive overview of the lexicon of Dutch since its earliest days and of the person and place names that share its history. Its partners are the Dutch experts in lexicology, person names and toponyms.

/se

Project Info

Partners: Tilburg University

Researchers

Martin Reynaert
Postdoc Researcher, Tilburg University

]]>
https://clariah.nl/ticclat-text-induced-corpus-correction-and-lexical-assessment-tool/feed/ 0