Anonymized data from the Diffix-protected datasets is inherently restricted. The analyst needs to be familiar with the imposed limitations, and knowledgeable of possible workarounds. The aim of this project is to build tools to automatically extract a high-level picture of the shape of a given data set whilst intelligently navigating the restrictions imposed by Diffix.
-
[] Aircloak API Key
You will need an authorization token for the Aircloak API. This should be assigned to the
AIRCLOAK_API_KEYvariable in your environment. -
[] Docker
Not a strict requirement, but the easiest way to get started is with Docker.
A latest release is published as a docker image in the github registry.
In order to pull from the github registry you need to authenticate with a github access token:
- Go here and create a new token with the
read:packagespermission. - Save it in a file, for example
github_registry_token.txt - Authenticate with docker login using the generated token for your github username:
cat github_registry_token.txt | docker login docker.pkg.github.com -u $GITHUB_USERNAME --password-stdin
See here for further information.
With the above out of the way, you can download and run the latest image with a single docker command.
You will need to assign the Aircloak Api endpoint to the AIRCLOAK_API_URL environment variable in the docker
container. For example, the following exposes the explorer api on port 5000 and targets the Aircloak Api at
https://attack.aircloak.com/api/:
docker run -it --rm \
-e AIRCLOAK_API_URL="https://attack.aircloak.com/api/" \
-p 5000:80 \
docker.pkg.github.com/diffix/explorer/explorer-api:latest
You can also build and run the docker image locally.
As above, we need to assign the Aircloak Api endpoint to an environment variable in the container.
# 1. Clone this repo
git clone https://github.com/diffix/explorer.git
# 2. Build the docker image
docker build -t explorer explorer
# 3. Run the application in a new container
docker run -it --rm -e AIRCLOAK_API_URL="https://attack.aircloak.com/api/" -p 5000:80 explorer
Note you will need an access token for the Aircloak Api. This token is passed as part of the request payload to the
/exploreendpoint.
The explorer exposes an /explore endpoint that expects a POST request containing the dataset, table and column to analyse. Assuming you are running the explorer on localhost:5000:
curl -k -X POST -H "Content-Type: application/json" http://localhost:5000/explore \
-d "{
\"ApiKey\":\"my_secret_key\",
\"DataSourceName\": \"gda_banking\",
\"TableName\":\"loans\",
\"ColumnName\":\"amount\"
}"This launches the column exploration and, if all goes well, returns a successful reponse with a json payload containing a unique id:
{
"status":"New",
"metrics":[],
"id":"204f47b4-9c9d-46d2-bdb0-95ef3d61f8cf"
}You can use the id to poll for results on the /result endpoint:
curl -k http://localhost:5000/result/204f47b4-9c9d-46d2-bdb0-95ef3d61f8cfThe body of the response should again contain a json payload with an indication of the processing status as well as any computed metrics. For example for a text column:
{
"status":"Processing",
"id":"204f47b4-9c9d-46d2-bdb0-95ef3d61f8cf",
"metrics":[
{"name": "text.length.distinct.suppressed_count", "value": 16},
{"name": "text.length.distinct.values",
"value": [
{"value": 24, "count": 256},
{"value": 25, "count": 254},
{"value": 27, "count": 242},
{"..."},
{"value": 51, "count": 6},
{"value": 9, "count": 4},
{"value": 8, "count": 2}]},
{"name": "text.length.naive_max", "value": 49},
{"name": "text.length.naive_min", "value": 15}
]
}When exploration is complete, this is indicated with "status": "Complete".
You can cancel an ongoing exploration using the (you guessed it) /cancel endpoint:
curl -k http://localhost:5000/cancel/204f47b4-9c9d-46d2-bdb0-95ef3d61f8cfFor further examples, check out the basic client implementations.
The simplest way to get a development environment up and running is with VS Code's remote containers feature.
Detailed information on setting up remote containers for VS Code can be found here.
The short version:
- Install Docker
- Install Visual Studio Code
- Add the remote development pack in VS Code
- Clone the Repo:
git clone https://github.com/diffix/explorer.git - Start VS Code and from the command palette (
F1) run Remote-Containers: Open Folder in Container and select the project root folder.
If you want to use an editor other than VS Code, you will need .NET Core 3.1 to compile the source files on your local machine.
Running the tests requires the AIRCLOAK_API_KEY environment variable to be set to a valid api key. If you are using vs code remote containers, this environment variable will be propagated from your local environment to the development container.