Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

README.md

Machine Learning tool for the hospital to detect pneumonia

This is an approach towards building a machine learning tool for the hospital to automate the process of pneumonia detection using Federated Learning techniques and to protect sensitive data using Pysyft and Pygrid libraries in Python.

Cutting edge techniques that will have a huge impact on the future of machine learning in healthcare:

  1. Federated Learning: allows us to train AI models on distributed datasets that you cannot directly access.
  2. Differential Privacy: allows us to make formal, mathematical guarantees around privacy preservation when publishing our results (either directly or through AI models).
  3. Encrypted Computation: allows machine learning to be done on data while it remains encrypted.

To emphasize: these privacy-preserving developments can allow us to train our model on data from multiple institutions, hospitals, and clinics without sharing the patient data. It allows the use of data to be decoupled from the governance (or control) over data.

Problem Statement:

Build a machine learning tool for the hospital which automates the process of pneumonia detection. The sample X-ray images of the INFECTED and NOT-INFECTED are given. The task is to protect the patient data without sharing the X-ray images with another third party.

Approach:

  1. Create an initial model from the available dataset
  2. Connect to the hospital's data cluster node
  3. Manage access rules and permissions
  4. Prepare the tensor data to train and publish
  5. Create a training plan procedure
  6. Train the model
  7. Perform the computations and publish the private datasets on this node
  8. As a data owner, manage node's accounts to identify and control who can access the node

Secure Multi-Party Computation:

Secure Multi-party Computation (SMPC) is a different way to encrypt data, sharing it to different devices. The main advantage is, unlike traditional cryptography, SMPC allows us to perform logic and arithmetic operations using encrypted data.

Alt text

In this example, we have Andrew holding his number, in this case he is the owner of the number 5, his personal data. Andrew can anonymize his data decomposing his number into 2 (or more) different numbers. In this case, he decomposes the number 5 into 2 and 3. That way, he can share his anonymized data with his friends Marianne and Bob.

Here, none of them really know the real value of Andrew’s data. They’re holding only a part of it. Any of them can perform any kind of operation without the agreement of all of them. But, while these numbers are encrypted between them, we’ll still be able to perform computations. That way, we can use encrypted values to compute user’s data without showing any kind of sensitive information.

Alt text

PySyft Technique:

Alt text

PySyft is a Python library for secure and private Deep Learning. PySyft aims to provide privacy preserving tools within the main Deep Learning frameworks like PyTorch and TensorFlow. That way, the data scientists can use these frameworks to manage any kind of sensitive data applying privacy preserving concepts, without having to be privacy experts and themselves.

PyGrid Platform:

Alt text

PyGrid aims to be a peer-to-peer platform that uses the PySyft framework for Federated Learning and data science.

The architecture is composed of two components: Gateways and Nodes. The Gateway component works like a DNS, routing the nodes that provide the desired datasets.

Authenticating using JWT token:

PyGrid supports authentication via JWT token (HMAC, RSA) or opaque token via remote API to protect the model for different workers.

Alt text

Alt text

Implementation:

  1. Create an initial model from the available dataset

Alt text

Alt text

  1. Connect to the hospital's data cluster node

Alt text

  1. Manage access rules and permissions

Alt text

  1. Prepare the tensor data to train and publish

Alt text

  1. Create a training plan procedure

Alt text

  1. Train the model

Alt text

Alt text

  1. Perform the computations and publish the private datasets on this node

Alt text

  1. As a data owner, manage node's accounts to identify and control who can access the node

PyGrid Interface looks something similar like this for managing the access control for the workers

Alt text

Summary:

  • In medical imaging, necessary privacy concerns limit us from fully maximizing the benefits of AI in our research.

  • Fortunately, with other industries also limited by regulations of private data, three cutting edge techniques have been developed that have huge potential for the future of machine learning in healthcare: federated learning, differential privacy, and encrypted computation.

  • These modern privacy techniques would allow us to train our models on encrypted data from multiple institutions, hospitals, and clinics without sharing the patient data.

  • Recently, these techniques have become increasingly easier for researchers to implement, thanks to the efforts of scientists from overall AI world.