Skip to content

diffix/explorer

Repository files navigation

Explorer


What it does

Anonymized data from the Diffix-protected datasets is inherently restricted. The analyst needs to be familiar with the imposed limitations, and knowledgeable of possible workarounds. The aim of this project is to build tools to automatically extract a high-level picture of the shape of a given data set whilst intelligently navigating the restrictions imposed by Diffix.

Getting started

Prerequisites

  • [] Aircloak API Key

    You will need an authorization token for the Aircloak API. This should be assigned to the AIRCLOAK_API_KEY variable in your environment.

  • [] Docker

    Not a strict requirement, but the easiest way to get started is with Docker.

Docker Image from github registry

A latest release is published as a docker image in the github registry.

In order to pull from the github registry you need to authenticate with a github access token:

  1. Go here and create a new token with the read:packages permission.
  2. Save it in a file, for example github_registry_token.txt
  3. Authenticate with docker login using the generated token for your github username:
    cat github_registry_token.txt | docker login docker.pkg.github.com -u $GITHUB_USERNAME --password-stdin
    

See here for further information.

With the above out of the way, you can download and run the latest image with a single docker command.

You will need to assign the Aircloak Api endpoint to the AIRCLOAK_API_URL environment variable in the docker container. For example, the following exposes the explorer api on port 5000 and targets the Aircloak Api at https://attack.aircloak.com/api/:

docker run -it --rm \
    -e AIRCLOAK_API_URL="https://attack.aircloak.com/api/" \
    -p 5000:80 \
    docker.pkg.github.com/diffix/explorer/explorer-api:latest

Docker build

You can also build and run the docker image locally.

As above, we need to assign the Aircloak Api endpoint to an environment variable in the container.

# 1. Clone this repo
git clone https://github.com/diffix/explorer.git 

# 2. Build the docker image
docker build -t explorer explorer

# 3. Run the application in a new container
docker run -it --rm -e AIRCLOAK_API_URL="https://attack.aircloak.com/api/" -p 5000:80 explorer

Usage

Note you will need an access token for the Aircloak Api. This token is passed as part of the request payload to the /explore endpoint.

Launching an exploration

The explorer exposes an /explore endpoint that expects a POST request containing the dataset, table and column to analyse. Assuming you are running the explorer on localhost:5000:

curl -k -X POST -H "Content-Type: application/json" http://localhost:5000/explore \
  -d "{
   \"ApiKey\":\"my_secret_key\", 
   \"DataSourceName\": \"gda_banking\", 
   \"TableName\":\"loans\",
   \"ColumnName\":\"amount\"
   }"

This launches the column exploration and, if all goes well, returns a successful reponse with a json payload containing a unique id:

{
  "status":"New",
  "metrics":[],
  "id":"204f47b4-9c9d-46d2-bdb0-95ef3d61f8cf"
}

Polling for results

You can use the id to poll for results on the /result endpoint:

curl -k http://localhost:5000/result/204f47b4-9c9d-46d2-bdb0-95ef3d61f8cf

The body of the response should again contain a json payload with an indication of the processing status as well as any computed metrics. For example for a text column:

{
  "status":"Processing",
  "id":"204f47b4-9c9d-46d2-bdb0-95ef3d61f8cf",
  "metrics":[
    {"name": "text.length.distinct.suppressed_count", "value": 16},
    {"name": "text.length.distinct.values",
      "value": [
        {"value": 24, "count": 256},
        {"value": 25, "count": 254},
        {"value": 27, "count": 242},
        {"..."},
        {"value": 51, "count": 6},
        {"value": 9, "count": 4},
        {"value": 8, "count": 2}]},
    {"name": "text.length.naive_max", "value": 49},
    {"name": "text.length.naive_min", "value": 15}
  ]
}

When exploration is complete, this is indicated with "status": "Complete".

Cancellation

You can cancel an ongoing exploration using the (you guessed it) /cancel endpoint:

curl -k http://localhost:5000/cancel/204f47b4-9c9d-46d2-bdb0-95ef3d61f8cf

More examples

For further examples, check out the basic client implementations.

Development

The simplest way to get a development environment up and running is with VS Code's remote containers feature.

Detailed information on setting up remote containers for VS Code can be found here.

The short version:

  1. Install Docker
  2. Install Visual Studio Code
  3. Add the remote development pack in VS Code
  4. Clone the Repo: git clone https://github.com/diffix/explorer.git
  5. Start VS Code and from the command palette (F1) run Remote-Containers: Open Folder in Container and select the project root folder.

If you want to use an editor other than VS Code, you will need .NET Core 3.1 to compile the source files on your local machine.

Testing

Running the tests requires the AIRCLOAK_API_KEY environment variable to be set to a valid api key. If you are using vs code remote containers, this environment variable will be propagated from your local environment to the development container.

Additional reading

About

Tool to automatically explore and generate stats on data anonymized using Diffix

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors