Signature Verification with deep learning / transfer learning using Keras and KNIME

In the previous posts we applied traditional Machine Learning methods and Deep Learning  in Python and KNIME to detect credit card fraud, in this post we will see how to use a pretrained deep neural networks to classify images of offline signatures into genuine and forged signatures. A neural network like this could support experts to fight cheque fraud.  We will use the VGG16 network architecture pertained on ImageNet. The technique we are going to apply is called transfer learning and allows us to use a deep network even if we have only limited data, as in our case.  The idea for this post is based on the paper ‘Offline Signature Verification with Convolutional Neural Networks‘ by Gabe Alvarez, Blue Sheffer and Morgan Bryant. We combine native KNIME nodes for the data preparation and extend the workflow with Python code in some nodes using Keras for designing and the network and the transfer learning process.

We will use the same data source for our training set: The signature collection of the ICDAR 2011 Signature Verification Competition (SigComp2011) which contains offline and online signature samples. The offline dataset comprises PNG images, scanned at 400 dpi, RGB color. The data can be downloaded from http://www.iapr-tc11.org/mediawiki/index.php/ICDAR_2011_Signature_Verification_Competition_(SigComp2011).

Data Preparation

0206012_02.png

The offline signatures images have all different size, so in the first step we resize them to have all the same size 150×150 and divide all pixel by 255 to scale our features in the [0,1] space.

With the List Files node we filter all images (pngs) in our training set folder.

Screen Shot 2018-08-26 at 11.33.46

Screen Shot 2018-08-26 at 11.34.03

In the next step (Rule Node) we mark the first 123 images are forged or fraudulent and the remaining images are genuine signatures.

Screen Shot 2018-08-26 at 11.34.12

In the following Metanode we read all images from the files using the image reader (table) node and then standardise the features and resize all images.

Screen Shot 2018-08-26 at 11.35.03

Screen Shot 2018-08-26 at 11.37.24.png

Screen Shot 2018-08-26 at 11.34.43

Screen Shot 2018-08-26 at 11.34.51

In the next step we perform a 80/20 split into training and test data and apply a data augmentation step (using Keras ImageDataGenerator to create more training data) which finalise our data preparation. We will cover the data augmentation in one of the coming posts.

Deep Learning network

We will load the weights of a VGG16 network pertained on ImageNet classification tasks using Keras Applications and strip of the last layers and replace them with our new dense layers and then train and fine tune the parameters our data. Instead of training the complete network which would require a lot more data and training time we will use pretrained weights and leverage the previous experience of the network. We just learn the last layers on our classification task.

Screen Shot 2018-08-26 at 11.44.08.png

Step 1: Load the pretrained network

We use the DL Python Network Creator Node and need to write a few line of Python code to load the network. We are using Keras, which will automatically download the weights.

Screen Shot 2018-08-26 at 11.50.58.png

Step 2: Replace Top Layers and freeze weights

In the next step we modify the network, we add 3 Dense Layer with Dropouts and ReLu and a Sigmoid activation at the end and freeze the weights of the original VGG network and again we need to write a few line of Python code:

Screen Shot 2018-08-26 at 11.53.44.png

Step 3: Train Top Layers

Then we train the new layers for 5 epochs with the Keras Network Learner Node:

This slideshow requires JavaScript.

Step 4 and 5: Unfreeze and fine tune

We modify the resulting network and unfreeze the last layers of the VGG16 network to fine-tune the pre-learned weights (3 layers) and train the network for another 10 epochs.

Screen Shot 2018-08-26 at 12.00.17

Screen Shot 2018-08-26 at 12.00.43

Apply Network and Test Results

In the last step we apply the network to the test data, convert the predicted probability into a class (p>0.5 -> Forged) and calculate the Confusion Matrix and AUC curve.

Screen Shot 2018-08-26 at 12.03.06

Confusion Matrix:

Screen Shot 2018-08-26 at 12.03.20

AUC Curve:

Screen Shot 2018-08-26 at 12.03.35

The results look very impression, actually a bit to good. We are maybe overfitting the data, since the test data may contains signatures (genuine and forged) from the same reference authors (since there are only 10 reference authors in the complete training set). The author of the paper noted that the performance of their network is very good on signatures of persons whose signatures has been seen in the training phase but that on unseen data it’s only little bit better then a naive baseline. In one of the next posts we will to check the network performance on new unseen signatures and try to train and test the model also on the SigComp2009 data which have signatures of 100 authors and we will look into detail in the data augmentation and maybe we compare this network to some more other network architectures.

So long…

 

 

 

 

Fooling Around with KNIME cont’d: Deep Learning

In my previous post I wrote about my first experiences with KNIME and we implemented three classical supervised machine learning models to detect credit card fraud. In the meantime I found out that the newest version of KNIME (at this time 3.6) supports also the deep learning frameworks TensorFlow and Keras. So I thought lets revisit our deep learning model for the fraud detection and try to implement in KNIME using Keras without writing one line of Python code.

Install the required packages

The requirement is you have Python with TensorFlow and Keras (you can install it with pip or conda, if you using the anaconda distribution) on your machine. Then you need to install the Keras integration extensions in KNIME, you can follow the official tutorial on https://www.knime.com/deeplearning/keras.

Workflow in KNIME

The first part of the workflow is quite similar to the previous workflow

Screen Shot 2018-08-04 at 20.30.33.png

We load the data, remove the time column, split the data in train, validation and test sets and normalise the features.

The configuration of the column filter node is quite straight forward, we specify the columns we want to include and exclude (no big surprise in the configuration).

Screen Shot 2018-08-04 at 20.32.33.png

Design of the deep network

For building our very simple 3 layer network we need 3 different new nodes, the Keras Input-Layer-Node, the Dense-Layer-Node and the DropOut-Node:

Screen Shot 2018-08-04 at 20.38.09.png

We start with the input layer and we have to specify the dimensionality of our input, in our case we have 29 features, we can also specify here the batch size.

Screen Shot 2018-08-04 at 20.39.24

The next layer is a dense (fully connected) layer. We can specify we number of nodes, the activation function.

Screen Shot 2018-08-04 at 20.40.41.png

After the dense layer we apply a drop-out with an dropout-rate of 20%, also the configuration is here quite straightforward.

Screen Shot 2018-08-04 at 20.43.28.png

We add then another dense-layer with 50 node, another dropout and the final layer with one node and the sigmoid activation function (binary classification: fraud or non-fraud).

The last layer of the network, the training data and the validation set are input to the Keras-Network-Learner Node.

 

Screen Shot 2018-08-04 at 20.45.08

 

We set the input, the target variable choose the loss function, optimisation method and number of epochs.

Screen Shot 2018-08-04 at 20.46.29Screen Shot 2018-08-04 at 20.46.36

Screen Shot 2018-08-04 at 20.47.02

We can specify a own loss function if we want or need to.

Screen Shot 2018-08-04 at 20.46.50

We can select an early stop strategy as well:

Screen Shot 2018-08-04 at 20.48.24.png

With the setting above the training will be stopped if the validation loss will no decrease more than 0.001 for at least 5 epochs.

During the training we can monitor the training process:

Screen Shot 2018-08-04 at 20.49.06Screen Shot 2018-08-04 at 20.49.11Screen Shot 2018-08-04 at 20.49.44Screen Shot 2018-08-04 at 20.50.26Screen Shot 2018-08-04 at 20.51.14

The trained model and the test data are the input for the DL-Network-Executer-Node which will use the trained network to classify the test set.

Screen Shot 2018-08-04 at 21.00.16

The results are plugged in a ROC-Curve-Node to asses the model quality.

Screen Shot 2018-08-04 at 21.01.28

And here the complete workflow:

Screen Shot 2018-08-05 at 07.49.47.png

Conclusion

It was very easy and fast to implement our previous model in KNIME without writing any line of code. Nevertheless the user still need to understand the concepts of  deep learning in order to build the network and understand the node configurations. I really liked the feature of the real time training monitor. In my view KNIME is a tool which can help to democratize data science within an organisation. Analyst who can not code in Python or R can have access to very good deep learning libraries for their data analytics without the burden to learn a new programming language, they can focus on understanding the underlying concepts of deep learning, understand their data and choosing the right model for the data and understand the drawbacks and limitations of their approach instead of spending hours learning programming in a new language. Of course you need to spent time to learn using KNIME, which is maybe for some people easier than learning programming.

I plan to spent some more time with KNIME in the future and I want to find out how to reuse parts of the workflow in new workflow (like the data preparation, which is almost exactly the same as in the previous example) and how to move model into production. I will report about it in same later posts.

So long…

 

Fooling around with KNIME

At the moment I am quite busy with preparing a two training course about ‘Programming and Quantitative Finance in Python’  and ‘Programming and Machine Learning in Python’ for internal trainings at my work, so I haven’t had much free time for my blog. But I spent the last two nights fooling around with KNIME, an open-source tool for data analytics / mining (Peter, I took inspiration from your blog to name today’s post) and I want to share my experience. In the beginning I was quite sceptical and my first thought was ‘I can write code faster then drag-n-drop a model’ (and I still believe it). But I wanted to give it a try and I migrated my logistic regression fraud detection sample from my previous blog posts into a graphical workflow.

Screen Shot 2018-07-19 at 21.03.13

In the beginning it was bit frustrating since I didn’t know which node to use and where to find all the settings. But the interface and the node names are quite self-explaining so after exploring some examples and watching one or two youtube videos I was able build my first fraud detection model in KNIME.

To work with a classifier we need to transfer the numerical variable Class (1=Fraud, 0=NonFraud) into a string variable. This was not obvious for me and coming from Python and SkLearn it felt a bit wired and unnecessary. After fixing that, I split the data in a training and test set with the Partioning node. With a right-click on the node we can adjust the configuration and can change the it to the common 80-20 split.

Screen Shot 2018-07-19 at 21.34.39

In the next step the data will be standardized (using a normalizer node). We can select which features / variable and how we want to scale them. We have plenty of settings to choose from, a nice feature is the online help (node description) on the right side of the UI, which describes the different parameter.

Screen Shot 2018-07-19 at 21.37.38

I have capsuled the model fitting, prediction and scoring into a meta-node to make the workflow look cleaner and more understandable. With a double-click on the meta-node we can open the sub-workflow.

Screen Shot 2018-07-19 at 21.12.52

I am fitting three different models (again encapsulated into meta-nodes) and combine the results (the AUC Score) into one table and write it and export it as a csv-file.

Lets have a detailed look into the logistic regression model meta-node.

Screen Shot 2018-07-19 at 21.16.06

The first node is the so-called learner node. It fits the model on the training data. In the configuration we can select the input features, target column, the solver algorithm, and advanced settings like regularizations.

Screen Shot 2018-07-19 at 21.45.22

To make predictions we use the predictor node. The inputs are the fitted model (square input) and the standardized test set. Into the normaliser (apply)  node, we feed the test set and the fitted normalizer (as far as I understand that is equivalent to use the transform method of a Scaler in Sklearn after fitting it before, please correct me if I am wrong). The prediction output will be used to calculate the AUC curve (in the configuration setting of the prediction node we have to add the predicted probabilities as an additional output, in the default settings is to output only the predicted class). We export the plot as an SVG file and the auc score (as a table with an extra column for the model name) is the output of our meta-node.

We can always investigate the output/result of one step, e.g. of the last node:

Screen Shot 2018-07-19 at 21.48.41

Screen Shot 2018-07-19 at 21.56.12.png

Or the interactive plot of the AUC node:

Screen Shot 2018-07-19 at 22.18.33

Or the model parameter output of the learner node:

Screen Shot 2018-07-19 at 21.57.15.png

The workflow for the other two models is quite similar. With copy and pasting the Logistic Regression meta-node, it was just replacing the learner and predictor node and adjusting the configurations.

To execute the complete workflow we just need to press the run/play button in the menu.

There is still much to discover and explore and try. For example there are node for cross-validation and feature selection which I haven’t tried yet and so many other nodes (e.g plotting, descriptive statistics and the Python and R integration nodes). And I haven’t tried to move a model into production, but I read that it should not be that difficult with KNIME (they promote it as a platform to create data science applications). I spent just a couple hours with it, so please forgive me if I didn’t use the right name for some of the nodes, setting, menus or features in KNIME.

What is my impression after playing with it for a couple hours?

I still believe that writing code is the faster option for me, but I have to admit that I like it more and more. And its not really fair comparison (years of Python programming vs couple hours experimenting with a new tool).  Its a nice tool for prototyping models without writing a line of code. If you are not familiar with a ML library yet, its a good and fast way to build models. But here is no free lunch either, instead of learning a syntax and the architecture of a library you have to learn to use the UI and find all the settings.

It’s an open-source software and so far I haven’t encountered any limitation (e.g. other tools limit the numbers of rows you can use in a free version) but I’ve just scratched the surface.

In my opinion one big advantage is the visualization of the model. The model is easy to understand and can easily be handed over to some other developers or engineer. Everyone knows that working with other people’s code can be a sometimes a pain and having a visual workflow can eliminate that pain. But I believe the workflows can become messy as well. Its a tool which can be used by analysts and business user who want to explore and analyse their data, generate insights and use  the power of standard machine learning and data mining algorithm without being forced to learn programming first.

The first impression is surprisingly good and I will continue playing with it and I want to figure out how to run my own Python script in a node and maybe even more important how to move a model into production.

I will report about it in a later post.

So long…