Estonian Centre of Excellence in Artificial Intelligence

Data Science Seminar: Creating Impact from Real-World Health Data

Laura Altin — Tue, 03 Mar 2026 06:35:58 +0000

The Data Science Seminar “Creating Impact from Real-World Health Data” takes place 18 March from 15.45 in the Delta auditorium 1037.

Real-World Health Data (RWD) is expanding at unprecedented scale-generated through every clinical encounter, diagnosis, laboratory measurement, prescription, procedure, and health-related transaction. These digital traces form a high-resolution map of how health systems function in real clinical settings, for real patients, across time. Properly harnessed, they enable evidence grounded not only in controlled trials, but in the lived complexity of everyday care.
Yet transforming raw data into actionable knowledge is far from trivial. Meaningful reuse requires robust data linkage across heterogeneous sources, harmonisation of vocabularies and formats, interoperable data models, scalable analytical pipelines, and adaptive governance frameworks. Ethics approvals and regulatory processes must evolve in parallel with technological capability.
When these foundations are in place, large-scale health data enables questions of direct societal impact: Where does care delivery create measurable value? How do outcomes vary across populations? Which interventions work in routine practice? Combined with modern AI methods, RWD opens the door to proactive, adaptive health systems—capable of earlier detection, personalised prevention, and continuous learning at population scale.
This seminar convenes the key actors shaping that transformation: researchers developing advanced analytics and AI systems; clinicians articulating real-world needs; health-system leaders responsible for implementation; and decision-makers guiding national funding and governance.

Speakers
Tuomo Hartonen / Andrea Ganna, University of Helsinki, “DelFI – Towards a foundation model for Finnish health and registry data”
Karl-Henrik Peterson, Member of Board at the Estonian Health Insurance Fund, “AI Vision in Estonian Health Care — system-level transformation and the promise of Delphi-2M–scale solutions”
Alar Irs, University of Tartu Hospital; Chief Medical Officer at Estonian Medicines Agency, “The Hunger for Data — the clinical demand for integrated, decision-supportive information”
Kerli Mooses, Research Fellow of Health Informatics at the University of Tartu,
“The Est-Health-30 database: infrastructure, real-world collaboration with clinicians, and analytical capacity”
Marten Juurik, Head of the Unit of Research Integrity at the Estonian Research Council, “Ethics reform in Estonia — evolving approval frameworks for large-scale data research”
Triin Laisk, Professor of Genomic Epidemiology at the University of Tartu, “The Hidden Half — advancing women’s health research through national-scale data resources”
The academic organisers of this event are the Head of the Institute of Computer Science and Professor of Bioinformatics Jaak Vilo and Research Fellow of Health Informatics Kerli Mooses. The seminar takes place in English.

This event is designed for researchers, clinicians, policymakers, funders, and innovators working at the intersection of health data, AI, and system transformation.
Join us to explore how Estonia—and its partners—are building the next generation of data-driven health systems.

This seminar is supported by:
TeamPerMed Centre for Data-Enriched Personalised Medicine
An interdisciplinary centre advancing personalised medicine through real-world data, AI, and clinical transformation.
EXAI – Estonian Centre of Excellence in Artificial Intelligence
A national centre fostering foundational and applied AI research for trustworthy, societally beneficial systems.
STACC – A data science company with a strong R&D focus
An Estonian data science company combining research excellence with real-world analytics solutions.

Kaarel Hänni talk about “AGI Safety” on january 7th

Laura Altin — Mon, 05 Jan 2026 12:54:31 +0000

This Wednesday, on January 7 at 14:00, Kaarel Hänni will give a talk entitled “AGI Safety” in room 1018.

The talk will be held in English.

Kaarel Hänni is an AI Safety Research Scientist at Mila – Quebec Artificial Intelligence Institute, focusing on the development of safe AI for the benefit of humanity.

Abstract (in English):

This talk is an introduction to AGI (artificial general intelligence) safety. In the first half of the talk, I will argue for the following three background claims:

“AI soon”: If AI progress continues, then in 50 years, there will very likely be AIs that are more capable than humans in basically every way. I will discuss certain quantitative empirical trends which suggest this happens before 2035. It might even happen in the next few years.
“AI fast”: Once there are AIs autonomously doing AI research at the level of top humans, by default, there will soon be AIs that are vastly smarter than humans (like how humans are vastly smarter than ants).
“AI big”: This would radically transform the world. AGI is a much bigger deal than cars or the internet or the industrial revolution — the advent of AGI is more in the same reference class as “evolution starts on Earth” and “human language and culture get started”.

In the second half of the talk, titled “AI bad?”, I will discuss the following questions:

Why might going down the AGI path be risky? Could it lead to human disempowerment or even extinction?
What are the main plans and hopes for how to avoid bad outcomes?
What technical research questions can one work on to mitigate the risks?

WUML2026

Laura Altin — Wed, 03 Dec 2025 10:17:44 +0000

Workshop on Uncertainty in Machine Learning (February 2-4 2026 in Tartu, Estonia)

Motivation and Focus

The notion of uncertainty is of major importance in machine learning and constitutes a key element of modern machine learning methodology. In recent years, it has gained in importance due to the increasing relevance of machine learning for practical applications, many of which are coming with safety requirements. In this regard, new problems and challenges have been identified by machine learning scholars, which call for new methodological developments. Indeed, while uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions, recent research has gone beyond traditional approaches and also leverages more general formalisms and uncertainty calculi. For example, a distinction between different sources and types of uncertainty, such as aleatoric and epistemic uncertainty, turns out to be useful in many machine learning applications. The workshop will pay specific attention to recent developments of this kind.

VENUE

The workshop will be hosted at the University of Tartu, Delta Centre, in Tartu, Estonia. The address is: Narva maantee 18, Tartu, Estonia

REGISTRATION

Registration is open here: https://forms.gle/y1RQbg8qr1rBJK7g8 and it will remain open until January 15, 2026. However, we kindly ask for early registration to facilitate planning of the workshop.

WEBSITE AND PREVIOUS EVENTS

https://sites.google.com/view/wuml2026/home

PROGRAM AT A GLANCE (TENTATIVE)

Monday, February 2
09.00 Registration opens

09.30 – 11.30 Pre-Workshop Tutorial on Uncertainty

11:30: Coffee Break

11.45 – 12.00 Welcome & Opening

12.00 – 13.00 Keynote:
Willem Waegeman: “Disentangling aleatoric and epistemic uncertainty in ML”

13:00-14:00 Lunch Break

14.00 – 15.30 Poster session

15.30 – 16.00 Coffee Break

16.00 – 17:30 Presentations

Conference dinner

Tuesday, February 3

09.00 – 10.00 Invited talk
Arun Kumar Singh: “Leveraging Predictive Uncertainty for Reliable Model-Based Planning and Control”

10.00 – 11.30 Flash talks, Posters and Coffee

11.30 – 13.00 Presentations

13.00 – 14.00 Lunch break

14.00 – 15.00 Presentations

15.00 – 16.00 Keynote

Ilja Kuzborskij: “Distinguishing Between Aleatoric and Epistemic Uncertainty in LLMs”

16.00 – 17.30 Flash talks, Posters and Coffee

Social mixer

Wednesday, February 4

09.00 – 10.30 Presentations

10.30 – 11.00 Coffee Break

11.00 – 13.00 Presentations

Closing

Optional program

The event is supported by the Ministry of Education and Research Centres of Excellence grant TK213 (Estonian Centre of Excellence in Artificial Intelligence (EXAI)).

The event will be organized by the Estonian Centre of Excellence in Artificial Intelligence (EXAI) and the Institute of Computer Science at the University of Tartu.

RBO13. AI in education

Laura Altin — Tue, 02 Dec 2025 10:25:23 +0000

Primary focus area: AI for e-governance
Secondary focus areas: adaptation of foundation models

Abstract

This project aims to develop AI tools that support self-regulated learning (SRL) in schools, aligning with how the human brain learns. These tools will be grounded in cognitive science to ensure reliability, trust, and real-world applicability. Small-scale classroom experiments will validate the tools and lay the foundation for future large-scale deployment in Estonia.

Research Gap

Most existing AI applications in education are driven by technological possibilities rather than pedagogical goals. They often overlook learner agency and SRL—key skills for lifelong learning emphasized by frameworks like OECD’s Learning Compass. Existing tools lack large-scale validation and are rarely co-designed with schools, limiting their classroom relevance and trustworthiness.

Objective

To develop trustworthy, neuroscience-informed AI tools that support SRL in classrooms. The project will focus on creating the first version of an AI learning assistant, co-developed with educators and students, and tested in authentic educational settings to evaluate practical fit and learning impact.

Approach

We will build the AI assistant in three main stages:

Foundations: Literature review, ethics approval, and investigation into effective SRL methods.
Tool development: Build prototypes based on cognitive principles and technological advances.

Co-creation & validation: Work with teachers and students to refine and test tools in small-scale classroom pilots. No personal student data will be published.

Impact

This is the first AI-in-education research effort in Estonia directly responding to President Karis’ national call. The project will deliver working prototypes, educational research papers, and a tested model for integrating AI into schools. Long-term, this sets the stage for a national, large-scale AI-in-education initiative.

RBO10. Leveraging LLMs for Complete Life-cycle of Cyber Security Analytical Tasks

Laura Altin — Tue, 02 Dec 2025 10:24:11 +0000

Primary Focus Area: AI for Cybersecurity
Secondary Focus Areas: Hybrid AI pipelines, Adaptation of foundation models, Safeguards and trust in AI, Privacy and security in AI, AI for e-governance, AI for healthcare, AI for business processes

Abstract
Cyber threats are growing in scale and complexity, producing massive volumes of security data that challenge timely analysis. This RBO studies the applicability of large language models (LLMs) for key cybersecurity analytical tasks such as real-time alert management, log analysis, and vulnerability detection from source code. The aim is to reduce workload for security analysts and improve efficiency in Security Operation Centers (SOCs) by introducing novel LLM-based data analysis methods.

Gap

While machine learning is well-established in cybersecurity (intrusion detection, malware analysis, alert prioritization), the application of LLMs is still nascent and under-explored. Existing research mostly covers anomaly detection and code analysis, but lacks maturity and breadth. Important tasks such as alert management and software vulnerability detection have been little studied with LLMs. This project addresses these gaps by investigating LLMs’ suitability for these tasks, combining cutting-edge prompting techniques and hybrid AI pipelines to enhance SOC operations.

Objective

To develop and evaluate LLM-based approaches that reduce security analysts’ workload by improving alert management, alert log summarization, and vulnerability detection in source code. The research will explore both LLM-centric and LLM-supported hybrid AI methods, with the goal to enhance cybersecurity monitoring capabilities in real-world SOC environments.

Approach

Alert management & log analysis: Use LLMs in black-box mode with advanced prompting on anonymized production alert logs from TalTech SOC and public datasets to extract patterns, create summaries, and identify critical alerts.
Vulnerability detection: Apply few-shot and chain-of-thought prompting on source code datasets classified by Common Weakness Enumeration (CWE). Benchmark against static code analyzers.
Additional tasks: Investigate LLM-based agentic approaches for automating incident handling and other SOC tasks. Also, identify further cybersecurity problems where LLMs add value.
Focus on privacy by leveraging local LLMs (e.g., Ollama) for sensitive data scenarios, supplemented by evaluation with commercial models (GPT-4, Claude 3).

Impact

This RBO will enhance SOC productivity by automating labor-intensive cybersecurity tasks, enabling analysts to focus on critical incidents and speeding up resolution. Success will be measured by effective deployment and evaluation of methods on production SOC data, reductions in analyst workload, and improvements in incident detection accuracy and prioritization.

RBO9. Adaptive Data-Driven Optimisation of Business Processes

Laura Altin — Tue, 02 Dec 2025 10:23:08 +0000

Primary focus area: AI for business processes
Secondary focus areas: safeguards and trust in AI; AI for e-governance

Abstract

This project develops methods for real-time, adaptive optimization of business processes using structured and unstructured data. By detecting performance degradations, diagnosing their causes, and recommending data-driven interventions, the approach shifts from static process redesign to dynamic, operational-level improvement. Simulation methods quantify intervention impacts, supporting decision-making in both public and private sectors.

Research Gap

Process mining has produced powerful methods for analyzing structured logs of business process executions to suggest optimizations that enhance the efficiency and quality of these processes. However, current approaches are limited by: (1) a fixed set of predefined interventions, (2) per-case rather than system-wide recommendations, and (3) exclusion of unstructured data such as emails or policy documents. Early efforts using LLMs focus narrowly on process discovery, as opposed to dynamic adaptation or real-time intervention. This RBO fills the gap by creating adaptive, explainable optimization techniques that use both structured and unstructured data, and that consider system-wide interactions and second-order effects.

Objective

To develop AI-driven techniques that monitor business processes in real time, diagnose performance drops, propose corrective interventions, and evaluate their effects using short-horizon simulations. The approach is novel in addressing unforeseen changes, modeling second-order effects, and integrating structured and unstructured data sources.

Approach

We will combine process mining, causal inference, reinforcement learning, and LLMs to:

Detect causes of process degradation from business process execution logs
Use LLMs to extract context and confirm causal links from unstructured data
Recommend interventions, guided by business process redesign knowledge
Simulate each intervention’s impact using online simulation from the current process state
Present recommendations through counterfactual explanations and tailored visualizations

Validation will include computational experiments with public datasets and 2–3 in-depth case studies with business and public sector partners. Feedback from these studies will inform iterative method development.

Impact

By combining adaptive analytics, LLMs, and simulation, this project will improve how organizations respond to changing process conditions. Use cases include smarter loan approval, tailored customer interactions, and faster response to public sector requests.

RB06. Methods for using AI to create a synthetic digital twin of the Estonian population

Laura Altin — Tue, 02 Dec 2025 10:20:18 +0000

Primary focus area – F5: AI for e-governance
Secondary focus areas – F1: hybrid AI pipelines, F2: adaptation of foundation models, F6: AI for healthcare, F8: AI for cybersecurity

Abstract

This project develops AI-based methods for generating a realistic, privacy-preserving synthetic digital twin of the Estonian population. Initial efforts focus on synthesizing data from the population registry, education, healthcare, and tax systems. The output will enable safe testing, research, and development of public-sector digital services without using real personal data.

Research Gap

GDPR restricts the use of real population data for development and testing. Current methods cannot generate coherent, multi-table synthetic datasets that reflect complex societal interactions over time. Previous research focuses on single-table or healthcare data synthesis, but lacks methods for creating interconnected datasets across domains. Furthermore, utility and privacy evaluation methods for such synthetic data are still underdeveloped.

Objective

Create a prototype framework for generating synthetic population data, reflecting individuals, organisations, and their interactions.
Ensure compatibility with microsimulation models to enable policy testing.
Build a utility and privacy assessment methodology to tune synthetic data generation and ensure GDPR compliance.

Approach

We will build a modular pipeline for synthetic data generation, deployable in public-sector institutions. The system will include:

Rule-based modules for structured identifiers (e.g. ID codes, bank accounts)
ML-based modules tailored for tabular, temporal, and image data
Hybrid synthesis combining rules and ML, based on real-life user pathways in e-government systems

Privacy-utility assessment tool to evaluate and fine-tune generated data for usability and compliance.

Impact

Synthetic population data will enable safe, GDPR-compliant development and testing of digital services. It supports policy evaluation, R&D, and education by simulating realistic socio-economic dynamics. It also allows researchers and policymakers to conduct microsimulations without accessing sensitive data, boosting transparency and innovation. The privacy-utility tool ensures that the synthetic data balances usefulness with legal safeguards.

RBO5. Cloud-compatible, end-to-end encrypted AI service blueprint

Laura Altin — Tue, 02 Dec 2025 10:18:20 +0000

Primary Focus Area: Privacy and Security in AI
Secondary Focus Areas: Adaptation of Foundation Models; AI for Cybersecurity

Abstract:
This RBO aims to prototype secure, cloud-based AI workflows using privacy-preserving computation methods like secure multi-party computation (MPC), federated learning, and trusted execution environments (TEEs). These technologies can enable sensitive data to be used in training and inference while ensuring confidentiality across stakeholders. The project will benchmark the feasibility of these tools under real-world performance constraints and assess their security benefits, targeting high-impact use cases in e-governance, healthcare, and cybersecurity.

Gap:
While MPC and other privacy-enhancing technologies (PETs) offer strong security, they are rarely applied in large-scale AI systems due to performance, deployment, and integration challenges. TEEs are gaining traction but bring new attack surfaces. Federated learning helps distribute learning but may still leak sensitive information. Current research lacks practical guidance on when and how these tools can be effectively used in full AI pipelines, particularly in cloud-based settings with strict privacy and regulatory requirements.

Objective:
Design and prototype end-to-end secure AI workflows, integrating MPC, federated learning, and other secure computing methods. Evaluate their performance, scalability, and risk reduction compared to standard techniques. The outcome will be tested prototypes and a comparative risk analysis demonstrating improved privacy preservation.

Approach:

Task 1: MPC blueprint for model updating and inference; optimise protocols and investigate MPC-specific ML algorithms
Task 2: Federated learning with MPC, including encrypted backpropagation and model aggregation; test various deployment models
Prototypes: Implement, benchmark, and analyse deployments for feasibility and risk
Deployment focus: Support edge/cloud scenarios; develop full encrypted training/inference pipelines depending on security needs

Impact
Secure computing (especially MPC) enables joint data analysis without exposing individual values—critical for sectors with data-sharing restrictions. This RBO will optimise secure deployments for ML, offering solutions that meet privacy, compliance, and IP protection needs.

RBO4. Domain-controlled dialog systems

Laura Altin — Tue, 02 Dec 2025 10:16:35 +0000

Primary focus area: Hybrid AI pipelines
Secondary focus areas: AI for healthcare

Abstract:
This RBO aims to create a hybrid AI dialogue system that integrates large language models (LLMs) with domain-specific guidance to produce structured yet flexible interactions. Our prototype will target mental health self-help by supporting techniques like cognitive reframing and problem-solving. The system will dynamically adapt its responses using domain knowledge encoded in prompts, allowing meaningful user engagement while maintaining therapeutic integrity. The approach is also adaptable to domains like education and e-governance.

Gap:
Current dialogue systems are either rigidly task-based or open-ended and unstructured. Recent LLMs offer opportunities to combine these approaches but lack fine-grained domain control, especially in sensitive areas like mental health. Existing guardrails focus on general safety, not specific therapeutic scenarios. Most mental health chatbots offer advice or empathy but miss the collaborative element crucial for long-term coping. This RBO addresses the need for structured, goal-oriented dialogue rooted in clinical principles.

Objective:
Develop a domain-controlled chatbot prototype for mental health self-help, guided by clinical psychology principles, that allows open user input while adhering to structured therapeutic goals. It will detect when conversations drift beyond scope and support privacy-preserving use cases, though no real-user testing is planned at this stage.

Impact:

This work lays the foundation for domain-sensitive LLM-based systems in healthcare and beyond.
KPIs include:

Functional prototype of mental health support chatbot
Scenario adherence and scope detection
User satisfaction (non-clinical pilot)
Adaptability to other domains (e.g., education)

RBO3. Reporting confidence in sequence-to-sequence models

Laura Altin — Tue, 02 Dec 2025 10:15:19 +0000

Primary focus area: Safeguards and trust in AI
Secondary focus areas: Adaptation of foundation models

Abstract:
Seq2seq models used in translation and speech recognition often produce errors like repetition or irrelevance. These are difficult to manage due to a lack of reliable uncertainty estimation. This RBO aims to distinguish and quantify two types of uncertainty: content uncertainty (what to say) and delivery uncertainty (how to say it). We propose formalizing this distinction, adapting model architectures, and evaluating our methods across domains, with the goal of enhancing the trustworthiness and applicability of AI-generated outputs.

Research Gap:
While uncertainty in AI outputs has been studied, current models conflate different types of uncertainty. Token-level probabilities offer limited interpretability and fail to separate content from delivery ambiguity. No existing approach provides meaningful estimates of these distinct uncertainties, especially over longer sequences. This limits the ability to build safeguards, calibrate confidence, or involve human reviewers effectively.

Objective:
To design methods that annotate seq2seq outputs with separate estimates of content and delivery uncertainty—numerical or distributional—tailored for translation and speech recognition tasks. These will enable better risk management and user trust in AI systems.

Impact:
This work aims to improve model transparency and reduce AI overconfidence, enabling safer deployment in language technologies.