Skip to content

rubenahrens/BioBERT-NER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fine-Tuning BioBERT on Medical Text

A Text Mining project by Lucas de Wolff (s3672980) and Ruben Ahrens (s3677532)

Report: https://rubenahrens.com/docs/biobert.pdf

Project Overview

This project focuses on Named Entity Recognition (NER) in medical text, specifically using the CSIRO Adverse Drug Event Corpus (CADEC). We fine-tuned transformer-based models (BERT and BioBERT) to identify medical entities in patient-reported adverse drug event narratives.

Repository Structure

  • /cadec/: The CSIRO Adverse Drug Event Corpus dataset
    • /meddra/: MedDRA annotations
    • /original/: Original text data
    • /processed/: Processed dataset
    • /sct/: SNOMED CT annotations
    • /text/: Raw text files
  • /Code/: Project source code
    • /NER/: Named entity recognition implementation
      • NER_bert.ipynb: Implementation of BERT for NER
      • NER_biobert.ipynb: Implementation of BioBERT for NER
    • /Entity Linking/: Code for entity linking tasks
    • datastats.py: Script for dataset statistics
    • test.py: Testing script

Technologies Used

  • Python with HuggingFace Transformers
  • BERT and BioBERT models
  • Jupyter Notebooks
  • scikit-learn for evaluation metrics

Getting Started

To run the notebooks:

  1. Ensure all dependencies are installed
  2. Run the Jupyter notebooks in the /Code/NER/ directory

Citation

If you use the CADEC dataset:

Contributors

  • Lucas de Wolff (s3672980)
  • Ruben Ahrens (s3677532)

January 2024

About

This project involved fine-tuning BioBERT on the CSIRO Adverse Drug Event Corpus (CADEC) dataset to perform Named Entity Recognition (NER) and detect adverse drug responses through customer reviews.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors