This is a team project created for intelligencia as a graduation project for the Propulsion data science bootcamp. It works on improving drug development decisions. With the use of open data on medical publications and drug trials, NLP techniques and machine learning this app:
- matches therapeutic targets of drugs to specific diseases based on aggregated data from scientific publications,
- gives an estimate on which diseases are the most relevant for a given drug target,
- shows current situation and trends in research and drug testing for a chosen disease,
- predicts research interest in the future.
See the app demo on this page.
|- bash
|- dash
|- apps
|- assets
|- data
|- modules
|- notebooks
|- scripts
Bash scripts for data processing pipeline
Code for the application. Run index.py to start the app. (Requires files from ../modules/ and a SQLite database in ../data/ to run)
Data required to run the application. The link to the truncated version of the database can be found in 20200729pubmed_mini.txt
Helper files for the app and data preprocessing pipeline.
Jupyter notebooks with code for preparing and exploring the data. Gives a general idea of the process and data that can be used to reproduce the data collection and cleaning process.
Python scripts for parallel data pre-processing.