model_centric_FL_tool

Machine Learning tool for the hospital to detect pneumonia

This is an approach towards building a machine learning tool for the hospital to automate the process of pneumonia detection using Federated Learning techniques and to protect sensitive data using Pysyft and Pygrid libraries in Python.

Cutting edge techniques that will have a huge impact on the future of machine learning in healthcare:

Federated Learning: allows us to train AI models on distributed datasets that you cannot directly access.
Differential Privacy: allows us to make formal, mathematical guarantees around privacy preservation when publishing our results (either directly or through AI models).
Encrypted Computation: allows machine learning to be done on data while it remains encrypted.

To emphasize: these privacy-preserving developments can allow us to train our model on data from multiple institutions, hospitals, and clinics without sharing the patient data. It allows the use of data to be decoupled from the governance (or control) over data.

Problem Statement:

Build a machine learning tool for the hospital which automates the process of pneumonia detection. The sample X-ray images of the INFECTED and NOT-INFECTED are given. The task is to protect the patient data without sharing the X-ray images with another third party.

Approach:

Create an initial model from the available dataset
Connect to the hospital's data cluster node
Manage access rules and permissions
Prepare the tensor data to train and publish
Create a training plan procedure
Train the model
Perform the computations and publish the private datasets on this node
As a data owner, manage node's accounts to identify and control who can access the node

Secure Multi-Party Computation:

Secure Multi-party Computation (SMPC) is a different way to encrypt data, sharing it to different devices. The main advantage is, unlike traditional cryptography, SMPC allows us to perform logic and arithmetic operations using encrypted data.

In this example, we have Andrew holding his number, in this case he is the owner of the number 5, his personal data. Andrew can anonymize his data decomposing his number into 2 (or more) different numbers. In this case, he decomposes the number 5 into 2 and 3. That way, he can share his anonymized data with his friends Marianne and Bob.

Here, none of them really know the real value of Andrew’s data. They’re holding only a part of it. Any of them can perform any kind of operation without the agreement of all of them. But, while these numbers are encrypted between them, we’ll still be able to perform computations. That way, we can use encrypted values to compute user’s data without showing any kind of sensitive information.

PySyft Technique:

PySyft is a Python library for secure and private Deep Learning. PySyft aims to provide privacy preserving tools within the main Deep Learning frameworks like PyTorch and TensorFlow. That way, the data scientists can use these frameworks to manage any kind of sensitive data applying privacy preserving concepts, without having to be privacy experts and themselves.

PyGrid Platform:

PyGrid aims to be a peer-to-peer platform that uses the PySyft framework for Federated Learning and data science.

The architecture is composed of two components: Gateways and Nodes. The Gateway component works like a DNS, routing the nodes that provide the desired datasets.

Authenticating using JWT token:

PyGrid supports authentication via JWT token (HMAC, RSA) or opaque token via remote API to protect the model for different workers.

Implementation:

Create an initial model from the available dataset

Connect to the hospital's data cluster node

Manage access rules and permissions

Prepare the tensor data to train and publish

Create a training plan procedure

Train the model

Perform the computations and publish the private datasets on this node

As a data owner, manage node's accounts to identify and control who can access the node

PyGrid Interface looks something similar like this for managing the access control for the workers

Summary:

In medical imaging, necessary privacy concerns limit us from fully maximizing the benefits of AI in our research.
Fortunately, with other industries also limited by regulations of private data, three cutting edge techniques have been developed that have huge potential for the future of machine learning in healthcare: federated learning, differential privacy, and encrypted computation.
These modern privacy techniques would allow us to train our models on encrypted data from multiple institutions, hospitals, and clinics without sharing the patient data.
Recently, these techniques have become increasingly easier for researchers to implement, thanks to the efforts of scientists from overall AI world.

Name		Name	Last commit message	Last commit date
parent directory ..
Libraries		Libraries
SMPC		SMPC
auth		auth
code_snippets		code_snippets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Machine Learning tool for the hospital to detect pneumonia

Cutting edge techniques that will have a huge impact on the future of machine learning in healthcare:

Problem Statement:

Approach:

Secure Multi-Party Computation:

PySyft Technique:

PyGrid Platform:

Authenticating using JWT token:

Implementation:

Summary:

FilesExpand file tree

model_centric_FL_tool

Directory actions

More options

Directory actions

More options

Latest commit

History

model_centric_FL_tool

Folders and files

parent directory

README.md

Machine Learning tool for the hospital to detect pneumonia

Cutting edge techniques that will have a huge impact on the future of machine learning in healthcare:

Problem Statement:

Approach:

Secure Multi-Party Computation:

PySyft Technique:

PyGrid Platform:

Authenticating using JWT token:

Implementation:

Summary: