ctakes-pbj

A Python Bridge to Java (PBJ).

Problem Statement

Solutions start with identifying the problem. Our problem is the lack of a standardized path to move information from cTAKES to a python program (and back again). Having that ability is very important as most modern Machine Learning is done in Python.

Solution

The information that we want to move is stored in an object called a CAS (Common Analysis System). All objects within the CAS are of a Type defined in an extensible Type System. For instance a discovered instance of "cancer" is stored in the CAS as an object of Type "DiseaseDisorderMention".

The next step was for us to choose a method of delivery for our path of information. We were looking for something that could handle multiple sub-pipelines, allow for parallel sub-pipelines, and a method that is fast, reusable, and easy to use.

Apache ActiveMQ Message Broker combined with dkpro-cassis became apparent as the ideal solution to our problem, allowing what we hoped for above and more.

How it Works

Other Configurations

PBJ development has focused on simple Python pipeline to cTAKES (Java) pipeline integration. The most common configuration in the PBJ pipers that come with cTAKES is the first below: Separate before/after cTAKES processes.

Introduction pipelines can be found in the ctakes-examples module.

An introductory "single-stream" pipeline such as is displayed in the section How it Works, is PbjWordFinderInOne.piper, which spins up a single cTAKES Java pipeline that tokenizes a document, then sends document information to a Python sub-pipeline named word_finder_pipeline. The Python pipeline will search the document for a preset list of words: 'breast', 'hernia', 'pain', 'migraines', 'allergies', 'thyroidectomy', 'exam'. The Python pipeline will send information back to the main cTAKES pipeline, which will write the information in several different file formats.

There is an example of the same thing, but with three separate communicating pipelines, as the Separate before/after cTAKES processes in section Other Configurations: cTAKES to Python to CTAKES in PbjWordFinder.piper.

Name		Name	Last commit message	Last commit date
parent directory ..
src		src
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

A Python Bridge to Java (PBJ).

Problem Statement

Solution

How it Works

Other Configurations

Introduction pipelines can be found in the ctakes-examples module.

FilesExpand file tree

ctakes-pbj

Directory actions

More options

Directory actions

More options

Latest commit

History

ctakes-pbj

Folders and files

parent directory

README.md

A Python Bridge to Java (PBJ).

Problem Statement

Solution

How it Works

Other Configurations

Introduction pipelines can be found in the ctakes-examples module.