Fine-Tuning BioBERT on Medical Text

A Text Mining project by Lucas de Wolff (s3672980) and Ruben Ahrens (s3677532)

Report: https://rubenahrens.com/docs/biobert.pdf

Project Overview

This project focuses on Named Entity Recognition (NER) in medical text, specifically using the CSIRO Adverse Drug Event Corpus (CADEC). We fine-tuned transformer-based models (BERT and BioBERT) to identify medical entities in patient-reported adverse drug event narratives.

Repository Structure

/cadec/: The CSIRO Adverse Drug Event Corpus dataset
- /meddra/: MedDRA annotations
- /original/: Original text data
- /processed/: Processed dataset
- /sct/: SNOMED CT annotations
- /text/: Raw text files
/Code/: Project source code
- /NER/: Named entity recognition implementation
  - NER_bert.ipynb: Implementation of BERT for NER
  - NER_biobert.ipynb: Implementation of BioBERT for NER
- /Entity Linking/: Code for entity linking tasks
- datastats.py: Script for dataset statistics
- test.py: Testing script

Technologies Used

Python with HuggingFace Transformers
BERT and BioBERT models
Jupyter Notebooks
scikit-learn for evaluation metrics

Getting Started

To run the notebooks:

Ensure all dependencies are installed
Run the Jupyter notebooks in the /Code/NER/ directory

Citation

If you use the CADEC dataset:

Karimi et al. (2015) CADEC: A corpus of adverse drug event annotations
Data: https://data.csiro.au/collection/csiro:10948

Contributors

Lucas de Wolff (s3672980)
Ruben Ahrens (s3677532)

January 2024

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Code		Code
cadec		cadec
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-Tuning BioBERT on Medical Text

Project Overview

Repository Structure

Technologies Used

Getting Started

Citation

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fine-Tuning BioBERT on Medical Text

Project Overview

Repository Structure

Technologies Used

Getting Started

Citation

Contributors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages