Spherical Defense

Autonomous Protection: Comprehensive Cover for APIs

Jack Hopkins — Fri, 10 Jul 2020 06:31:05 +0000

by Jack Hopkins

Getting your first car is one of the great joys of freedom. No longer bound by the laws of public transport, you now have absolute control of where, when, and by which way you want to go. Of course, this is until you hear about car insurance and the legal requirement to have it! More often than not, these insurance policies – most especially the fully comprehensive ones – come at great expense, and can seem like money wasted.

However, as owners of any valuable car will know, this is not the case. The unpredictability and potential for an accident on any road journey makes car insurance – especially the wholly comprehensive policies – a smart choice rather than just an unnecessary cost. In short, car insurance helps drivers to manage their risks, protecting their valuable assets and persons against the divergent costs that may arise from unforeseen harms. What’s more, add value to the car itself by driving an Audi R8 or an Aston Martin DB9 and this makes even more sense than when you’re driving, say, a ‘95 Cavalier.

Clearly, owning a vehicle gives you flexibility and freedom that no other mode of transport can offer, but also requires the protective cover that car insurance provides. In fact, almost every physical valuable asset can be insured against unforeseen damage, whether criminal or not.

Just like the car industry, APIs work in the same way, and the organizations that use them need comprehensive protection too.

In the software landscape, APIs give businesses and developers the added capabilities of automation, flexibility, reusability, seamless integration, efficiency, and the personalization of their web applications and services. APIs “take the shackles off” of the traditional software paradigms of siloed services and built-from-scratch software to allow for seamless and straightforward integration between different web services.

The downside to this – as there always is – is that API security is yet to catch up with the explosion of vulnerabilities that come along with it, despite great investment into the field. Recent security reports provide damning assessments on the state of API security. One in particular [1] points out that:

1) The number of serious vulnerabilities continues to increase at a rate that makes remediation nearly impossible if teams continue to rely on traditional bug remediation methods.

2) Microservices are riddled with vulnerabilities. This is because microservice-based approaches typically involve exposing more of the system’s functionality directly to the network, and, in turn, to would-be attackers. Also, dealing with multiple small and replicable containers that function as one means the potential landscape is significantly expanded with an increased likelihood of one vulnerability in a microservice being replicated again and again. In fact, they average more vulnerabilities per line of code than traditional services do.

3) Nearly 70% of every application is comprised of reusable software components, thereby allowing for vulnerabilities to be “inherited.”

4) 85% of mobile apps violated one or more of the OWASP Mobile Top 10. This list details the top 10 most common threats to mobile application security. An astonishing number of mobile apps returned risk ﬁndings for insecure data storage and/or insecure communication, client code quality issues and vulnerabilities, risk exposure to reverse engineering, and/or extraneous functionality that could potentially be exploited.

Figure: OWASP Mobile Top 10 Violation Rates by NowSecure – https://www.nowsecure.com/blog/2018/07/11/a-decade-in-how-safe-are-your-ios-and-android-apps/

More generally, this report talks about the Window of Exposure (the period of exposure of applications to security risks) and the Time to Fix window (how long it takes to fix a vulnerability), and highlights the importance of organizations understanding and managing these critical metrics in order to protect their applications, services, and clients.

Whether you look at past breaches or the everyday threats that typical users face, it is clear that the constant threat of malicious users and cybercriminals mean that these un-remediated vulnerabilities will inevitably translate into exploitation, then misuse, then more data breaches, and, ultimately, serious losses for both the businesses and the users.

Here at Spherical Defense, we’ve spent years investigating and developing the adaptive comprehensive cover, powered by unsupervised deep learning techniques, that services, developers, and security specialists have been needing for years!

In a series of articles, we’ll break down what this “comprehensive cover” entails and explain how AI-powered security technologies can (1) automatically detect new vulnerabilities, (2) provide cover for an application’s Window of Exposure, and (3) shift the Time to Fix window so that software developers can remediate bugs sooner rather than later.

By the end, you should understand the need to rethink the management of APIs in the organization and to mount a comprehensive and adaptive security policy.
[1] NowSecure WhiteHat Report 2018: https://www.nowsecure.com/resource/2018-application-security-statistics-report/nowsecure-whitehat-2018-application-security-report-cover/

API Security Testing

Jack Hopkins — Tue, 23 Jun 2020 11:36:26 +0000

How to test RESTful APIs

RESTful APIs have become a fundamental part of modern web application development in recent years. The RESTful approach is far more simple and scalable than the legacy variants of web API that preceded it – such as SOAP (Simple Object Access Protocol).

The only implementation of REST is on top of HTTP – the protocol that powers the web. This means that vulnerable REST APIs expose similar risks to traditional web sites and applications, while being more challenging to test with automated web security scanners.

What is a REST API?

Before we discuss the challenges of effective security testing of REST APIs, we should clarify what we’re talking about.

An API is a mechanism of transferring information between two computer systems. An API can be implemented either at the code level or at the network level, depending on whether or not the two systems are running on the same machine.

In a commercial context, an API almost always refers to an interface across the web, which is the most common way of connecting disparate computer systems.

Modern Web APIs are usually implemented using REST (REpresentational State Transfer). REST is an architectural style in which all of the information necessary to access or change the ‘state’ of a web service can be made in a single API call – such as getting a data record or updating a database.

RESTful APIs offer a clean separation of concerns between the front-end (presentation layer) and the back-end (data-access layer). The RESTful style has been recognised as the international standard because a single REST API can be consumed simultaneously by mobile devices, web applications and IoT devices without any alterations, making it the cheapest and most flexible way to build modern applications.

Principles of RESTful API Security Testing

There are only four core principles to performing security tests on RESTful APIs. As is often the case however, these principles can be difficult to put into practice.

The simple principles are as follows, and can be implemented trivially into a web server:

1. Inputs of an incorrect type must be rejected.

a. Corollary: Inputs that are null (empty), when a null is unacceptable, must be rejected.

2. Inputs of an incorrect size must be rejected.

The more difficult principles require an intimate understanding in the range of acceptable values and users, which can be hard to infer without understanding how a REST API will be consumed.

3. For a given input value, the API must provide the expected output.

This can be easy to test when the input domain and the output range are simple (e.g integers or phone numbers). This becomes extremely difficult when building permissive RESTful APIs that enable users to submit their own content (e.g in a chat application).

4. Input values outside the expected domain must be rejected.

Once again, this is easy when the domain is simple (e.g input values should be integers above zero), but becomes complex when users can supply content (e.g a file upload endpoint could present a significant challenge to secure).

5. For a given user, the API must provide only the data that they are authorized to access.

If permissions are already defined and are resources stratified in accordance with their permission level, this can be easy to implement. In practice however, authorization is a hard problem – with several multi-billion dollar companies (like Okta) around to solve it.

Most APIs aren’t properly tested to ensure they meet this criteria. Because of this, breaches occur frequently and entire industries exist to offer a protection layer on top of APIs.

A well designed APIs should present the first-line of defense against attack, and so effective testing should be a top priority.

API Security Tests

There are three main types of testing that compose the security auditing process, designed to secure an API against external threats.

Security Testing

Security testing validates whether basic security requirements have been met. These include the following questions:

1. What kind of authentication is necessary to consume the API, i.e how do you evaluate the identity of an end user?

2. What sort of encryption is used on the stored data, and at which points are the data decrypted for transmission?

3. Under what conditions are users allowed to access resources?

This stage of the audit process comes first, and will help prevent the major vulnerabilities.

Penetration Testing

Penetration testing enables you to harden the external surface of your application from vulnerabilities that may have crept in during development.

In this step, external aspects of the API are attacked in a deliberate fashion in a controlled environment. This can be done using automated tools such as Netspark or Acunetix.

When organising a Penetration Test, the following steps should be taken:

1. Identify a list of potential vulnerabilities applicable to the application (e.g does it have resources like images which could expose a directory traversal attack?)

2. Order the items in accordance with their risk. You can use the OWASP Top 10 website to get a better understanding of the risk associated with each type of vulnerability.

3. Engineer requests and sessions that incorporate the attacks, and send them at the system – ideally from within the network as well from outside.

4. If unauthorised access to the system is made, file a vulnerability report and go back to patch the issue.

Fuzz Testing

Fuzz testing is the final aspect of a security auditing process, in which an API is pushed to its limits. This can be done by sending vast request volumes at it, attempting to vary the data in as many creative ways as possible to cover the possibilities of vulnerabilities emerging at high volume which could compromise security.

Such vulnerabilities could be exploited by Denial Of Service or Overflow attacks.

How to perform a Security Test on an API

Testing an API means submitting requests using client software to an endpoint of the application that is being evaluated. This is almost always a HTTP client, and there are many free options available.

The most popular clients are Postman or Insomnia. Insomnia is the best choice for smaller APIs, as it is easy to work with and requires little configuration. Postman is better for more complex APIs, as it stores authentication parameters and enables you to create collections of requests. Postman also has the capacity to automate testing through ‘monitors’, which is useful if the underlying application is constantly changing.

Automating parts of the Security Audit process can speed up the DevOps lifecycle. The two parts that are easiest to automate are the Fuzz Test, and the Security Test that was discussed in the previous section.

Step 1: Determine Security Requirements. In order to plan a security test on an API, you must first understand the general requirements. This means asking questions like:

a. Should the API use a TLS/SSL certificate, and be accessed over HTTPS?

b. What permission groups exist for different resources in the application?

c. What is the authentication flow? Is an external OAUTH provider used?

d. What is the attack surface of the API? Where could a malicious actor subvert the application?

As part of asking the above questions, it is important to have a good understanding of what constitutes pass vs failure of your test.

Step 2: Set up a testing environment. Once the scope of the test has been developed, it is time to prepare an application environment for testing. For smaller applications it’s reasonable to use the standard staging environment. For larger applications with a lot of internal state, it is better to set up a separate environment for the test – either by replicating all resources in the staging environment, or by using a tool such as WireMock to mock them out.

Step 3: Sanity check your API. Send a few requests at the API to ensure that everything has been set up correctly.

Step 4: Define the input domain. Before developing individual test cases, it is important to understand what each parameter does, and the different combinations that each parameter is allowed to be. This enables you to define edge-cases (values that are barely valid), and determine the parameters which are most vulnerable to injection attacks (like SQL injections).

Step 5: Develop and execute the test cases. Once you have prepared the test environment, and understand possible edge-cases, you can create and execute tests – comparing the actual output with the expected output. As a matter of best practise, you should group these depending on the type of test that is being undertaken. Some examples are as follows:

a. Can resources be accessed using HTTP as well as HTTPS?

b. Do all endpoints require authentication?

c. If you support file upload, what happens if you upload a potentially malicious file, with the mimetype that is expected by the application?

d. If the web-app that consumes the API embeds user-supplied information (e.g a name) on the page, what happens if you supply a HTML/JS element instead?

e. Can you access resources that your token isn’t authorized to access?

If you follow these instructions, you should have a good understanding of the security posture of your application, and a toolkit for ensuring that no significant security issues end up in a production deployment.

Jack Hopkins, CEO
[email protected]

Promising Papers: October II

user — Mon, 22 Oct 2018 20:17:36 +0000

In this blog series, we summarise our favourite non-conventional machine learning and artificial intelligence papers

Distillation as a Defence to Adversarial Perturbations against Deep Neural Networks

N. Papernot et al. 2016

Papernot et al. introduce the concept of defensive distillation as a methodology for providing resistance against adversarial attacks for Deep Neural Networks. Distillation [see last weeks post for a review] is a technique for transferring predictive correlations and ambiguity contained in the logits of the final predictive layer of a neural network. The defensive adaptation of distillation is argued to work by expanding the radius by which a data point x must be translated in order for the predictive neural network to misclassify the point. The metric used to evaluate this is referred to as robustness. Generally, adversarial attacks depend on shifts applied to x, so naturally models that are more robust in this sense would be more resistant to adversarial attacks applied to the data.

The authors evaluate the ability of defensive distillation to improve resilience to adversarial attacks for a given model trained on the MNIST and CIFAR10 datasets. For MNIST, the success-rate of adversarial attacks is reduced from 96% to 0.45% for a high distillation temperature of T=100. For CIFAR10, the success rate is reduced from 88% to 5% for a distillation temperature of T=100. For distillation with this temperature, the test accuracy is reduced by 0.6 for MNIST and by 1.4 for CIFAR10. The authors show a correlation between the increase of distillation temperature and reduction of adversarial attack success rate. Furthermore, this is done without a significant reduction in accuracy.

Our opinion: The success of distillation at defending against adversarial attacks is a surprising and useful result. However, it is important to explore the kinds of adversarial attacks that this methodology is limited to defending against.

Generating Sentences from a Continuous Space

Bowman, SR, Vilnis L, et al (2016).

The authors explore the introduction of a variational latent space when training language models built upon RNN based architectures (RNNLM). The architecture under consideration is an LSTM cell acting as an encoder, coupled into a VAE-like hidden layer with a diagonal gaussian prior, and then decoded using another LSTM cell. The authors introduce techniques such as KL cost annealing and word dropout in order to prevent the encoded representations of sentences from collapsing into the variational prior. A key insight is that global hidden variables, such as those captured in the latent space of autoencoders, need to be managed effectively when dealing with sequential inputs.

The authors test the performance of their LSTM-based VAE on the Penn Treebank dataset in order to measure the benefit of the introduction of a global latent variable over a basic RNNLM. The authors show that the introduction of a global latent variable helps when using beam search to impute missing words into sentences. The approach also helped improve performance against an adversarial classifier. The authors show that that diverse samples can be generated from their imposed latent space prior if word dropout is tuned properly, and that the posterior of the model is successful at identifying similar sentences in the corpus. Most interestingly, the authors show that new contextual sentences can be generated from the linear interpolation between the encoding of input sentences. For example, ‘I want to be with you’ is sampled from the interpolation between the encodings of `I want to talk to you’ and `she did n’t want to be with him`.

Our opinion: This paper delivers fundamental results for both the properties of structured latent spaces and the techniques required to train them. These insights more than make up for the lack of performance improvement generated from the presented approach.

Attend Before you Act: Leveraging human visual attention for continual learning

K. Khetarpal, D. Precup (2018).

In this paper, Khetarpal and Precup emulate the affects of visual attention upon agents within DeepMind Lab’s 3D static navigation maze task. Visual attention allows humans to selectively pay attention to certain parts of their visual input, gathering relevant information and ignoring disruptions, for an efficient representation. The authors create foveated visual input by applying real time saliency maps (using a spectral residue method) over the original image.

The authors explore make two significant contributions, firstly they note that visually attentive agents take slightly longer to train than their unimpaired counterparts. Secondly, they note that the visually attentive models perform better in transferring learning tasks, where noise is added to an environment.

Our opinion: We question the claim that testing an agent in the same game, but with more noise is a valid `transfer learning problem’. Whilst the results do show some promise, we believe better and dynamic salient maps could have be chosen, in particular, maps which are reactive to the current agent’s state.

Combined Reinforcement Learning via Abstract Representations

Francois-Lavet et al. (2018)

In this paper, the authors introduce a new architecture for combining model-free and model-based approaches to reinforcement learning. Dubbed the `Combined Reinforcement via Abstract Representation’ (CRAR), this modular architecture exploits a lower dimensional space which is shared by an encoder (which learns the model) and a Q-network (which learns the policy). The authors visualise the abstract space by sampling transitions from random policies – which convincingly demonstrate that the agents learn the models.

To train the encoder to produce effective representations, auxiliary networks for Transition, Reward and Discount models are fed these representations and optimised during train time. These components can be considered the model-based learning element of the network and force the abstract state to represent important low-dimensional features of the space. On the other hand, during test time, only the encoder and Q network operate – the model free components.

Our opinion: This is a well thought out paper, which suggests a multitude of different applications and even includes an ablation study to validate the network. It is most certainly worth a read.

This post was written by Akbir Khan and Sean Billings — Research Engineers at the Spherical Defence Labs.

Why WAF’s Won’t Work

user — Mon, 15 Oct 2018 17:05:14 +0000

Web application firewalls (WAF) have become a fixture of the computing landscape over the last twenty years or so. Indeed, some people probably can’t imagine the world without them, so ubiquitous they have become in filtering, monitoring and blocking traffic. How could any system possibly be secure without a WAF?

However, the reality is that the WAF hasn’t necessarily established itself as a method of security due to its unequalled qualities. It has simply been widely mandated. The WAF is, in fact, becoming somewhat antiquated, with the weaknesses of this approach to application security becoming increasingly obvious.

Zero-Day Exploits

The first reason for this is that WAFs struggle to deal with zero-day exploits. This is due to the particular idiosyncrasies associated with WAF technology. In order to keep WAFs up-to-date with intrusive zero-day attacks, it is necessary to exhaustively update the rules associated with a particular system. In the meantime, zero-day vulnerabilities can exploit any attack vectors that are not covered by the WAF’s rules[1] .

Regular Expression Issues

Another problem for WAFs is that their signatures are often represented using regular expressions. These are fine in theory, but in practice leave WAFs open to malicious code injections that are expressed in a different language. To put the meat on these bones, regular expressions are only capable of describing Type-3 languages – as referred to in Chomsky’s hierarchy – whereas software code is a context-sensitive Type-1 language [2] .

Replay Attacks

WAFs also struggle to cope with replay attacks. To explain these briefly, replay attacks involve a technique in which valid data transmission is maliciously or fraudulently repeated or delayed.[3] There are ways of preventing replay attacks, and mitigating their effectiveness, but WAFs are not the ideal solution.

The reason for this is that they’re only capable of detecting attacks injected into isolated web requests, meaning that they are impotent when faced with attacks that occur over multiple requests. This really means that WAFs are rather vulnerable when dealing with replay attacks.

Maintenance Issues

There is no doubt that maintenance of WAF networks is also labour-intensive. Web application firewalls need to be maintained once they have been set up, and this is a more consuming task than it might sound. Web applications are pretty malleable. They are constantly changing. Consider how often updates occur, how often new features are required by users, and how the general landscape of computing is constantly evolving. And then developers often wish to add new features just to satisfy their own curiosity, creativity and perfectionism!

When you combine all of these factors, it’s quite evident that web applications can change on a virtually daily basis. And when you have deployed a WAF as part of the overall security framework in your organisation, this means that features of it cannot be designed and implemented in isolation. Everything has to be considered in tandem with the WAF. Rest assured, this can be a major headache.

Blocking Valid Traffic

Another unwanted symptom of WAF installation can be the inadvertent blocking of legitimate traffic. This is often referred to as ‘false positives’.[4] While this may sound relatively innocuous, it can be disastrous for any organisation. Visitors to your website can be blocked from uploading media, benefiting from the functionality of applications, or even from purchasing products and services. Needless to say, this can be extremely bad for business!

And the only way to really combat this with a WAF setup is to run the bare minimum number of rules possible. But this could then make the network more vulnerable to other types of attack. It’s a difficult balancing act, for which there is ultimately no ideal solution.

Tailoring Rules Appropriately

It can therefore be tricky to tailor rules appropriately with any WAF system.[5] Overly prescriptive rules can result in a lot of false positives. But lax rule setting can leave a network over to attacks and abuse. Rules that seek out SQL keywords can end up being triggered by completely benign requests, and causing all sorts of mayhem.

DDoS Difficulties

And, finally, DDoS attacks can also pose problems for WAF setups. This is particularly worrying, as there have been numerous high profile examples of successful DDoS attacks, and attackers do not require a huge amount of technical knowledge to carry them out. So we can expect to see DDoS attacks become more prevalent in the coming years.

WAFs Don’t Work

In conclusion, WAFs are struggling to cope with the highly complex contemporary computing environment. They are increasingly beginning to look sluggish, inflexible, and ultimately unfit for purpose. It is only be acknowledging this, and seeking more sophisticated solutions, that we will create safer and slicker modern networks.

References

Bobcares (2015). How we blocked zero-day malware attacks on websites using NAXSI firewall. [online]. Available at: https://bobcares.com/blog/how-we-blocked-zero-day-malware-attacks-on-websites-using-naxsi-firewall/
Hopcroft, John E.; Motwani, Rajeev; Ullman, Jeffrey D. (2000). Introduction to Automata Theory, Languages, and Computation (2nd ed.). Addison-Wesley.
PentaSecurity (2017). What Are Session Replay Attacks?. [online]. Available at: https://www.pentasecurity.com/blog/session-replay-attacks/
Wickett, J. (2017). It’s Time to Break Up with Your WAF. DevOps.com. [online]. Available at: https://devops.com/time-break-waf/
SECConsult (2017). Are Web Application Firewalls Useful? A Pentester’s View. [online]. Available at: https://www.sec-consult.com/en/blog/2012/10/are-web-application-firewalls-useful-a-pentesters-view/

Representation Learning: Word2Vec

user — Mon, 15 Oct 2018 16:49:20 +0000

In this blog series, we introduce important concepts within Machine learning and AI.

As powerful as computers have become they are still relatively “stupid”. In fact, computing power normally only tells you how quickly they can perform simple calculations like adding and subtracting. What computers are really bad at is more general abstract questions. That’s stuff like “can you notice pedestrians in a photo” or “can you realise how the words king and queen are related”?

What computers really get is vectors, they love them. By considering vectors as an array (list) of numbers, computers are able to still perform super simple and quick calculations. In addition, vectors already have some hardcore fans, who got obsessed with them through this niche subject called “Geometry”. These guys have come from as far as India and Greece and considered lots of ways to tell how similar vectors are (mathematicians would call these metrics).

We skip the work they’ve done because you probably learnt it in your Maths class. It’s the intuitive stuff such as “the distance between two points” and “the angle between two vectors” — different ways to measure how similar two vectors are. Depending on our choice of the two vectors, we could make these two measurements very different and, in that process, highlight different relationships between whatever those vectors represent.

So, the approach that people considered is using vectors to represent abstract concepts, such as “pedestrians” in photos or words such as “queen”. That way we can still compute things quickly, but the geometric properties between vectors can encode information. If the mitochondrion was the powerhouse of the cell, then the vector is the powerhouse of representation learning.

Below we discuss one of our favourite examples of representation learning — Word2Vec.

Within Natural Language Processing, the focus has been on producing robust representations of words. Telling a computer “Tom sat on the chair” and then asking, “Where is Tom?” is relatively difficult for a computer, as it requires an understanding of Tom as an object/noun and the relationship that the verb “sat” implies.

Computers need some way to represent a word. In particular, Neural Networks (which do most of the work) need continuous representations. Think of continuous as meaning “not in chunks”. If something is continuous then there are no gaps between any two points (e.g. in the flow of water). On the other hand words are discrete, they literally are separate tokens, and between things like “cat” and “dog” there is no intermediate concept.

Linguists had a theory dubbed the “distributional hypothesis” (Firth ’53) which suggested that words are defined by the company that they keep. If I remove a _____ from a sentence, you can guess what it is (in this case “word”). A group of computer scientists, led by Andreas Mikolev, decided to train neural networks to play this game, and at the same time allowed the network to modify the vector representation of the words in order to succeed.

So, the neural network is fed a sentence from a text such as “The cat sat on the” and told the answer should be “mat”. The network first changes each word into a one-hot vector. This type of vector is a form of index, and is filled with N zeros, where N is the number of words in the entire vocabulary (of the text). The zero which corresponds to the desired word is flipped to a one. So, if my entire vocabulary was 3 words long and the sentence is “The black cat” the one-hot vectors are:

The = [1,0,0], Black = [0,1,0], Cat = [0,0,1]

The biggest hold back is that for a large text, these vectors are pretty huge, and we’d like our representations to be a bit more informationally dense, so you can use fewer numbers to express more information about the entity.

So, the first layer of the neural network takes the large one-hot vectors of a phrase and adds them together, creating an N-hot vector (which has multiple ones in it). This is then fed into the first section of the neural network which outputs a smaller vector representation (the hidden state). We call this theword embedding (and it’s the important part).

Provided by the Tensorflow team

The new single vector representation — the hidden state, is then taken by the rest of the network to predict what the next word should be. The neural network tells us its belief by outputting a vector which contains, in each dimension, the probability that it is the corresponding word (we call this a Softmax classifier):

[0.1,0.3,0.6]

This means the network believes the next word is 10% likely to be “The”, 30% likely to “Black” and 60% likely to be “Cat”.

We train the model, so it’s allowed to change the word embedding layer to try improving it’s score. In the example above we take the hidden state representation to be the corresponding vector for “mat”. This is useful because the vector representation of “mat” depends purely on the words around it and the model must train to fit all other phrases which also predict “mat”.

When we look at these vectors (we can represent them as scatter plots in 2 and 3 dimensions), we see rich relationships between them. We find that certain directions between vectors represent relationships between words. For example, gendered nouns are always the same distance from each other (king/queen, man/woman, mr/mrs) and that verb tenses are in the same direction (walking/walked, swimming/swam).

A list of relations presented by Mikolov et Al in “https://arxiv.org/pdf/1301.3781.pdf”

Finally, as vectors are well studied, we can do interesting things with them like addition. This means that those relationships can be quantified as follows:

Rome + France — Paris = Italy

king — man + woman = queen

These dense representations encode a lot of information. So, we try use these word representations for different tasks. We keep a dictionary of the word embedding which knows which vector relates to which word. Then for more difficult tasks (like translation), the neural networks can exploit these geometric relationships to complete its task quicker. For example, a network no longer needs to learn explicitly how to handle “queen” — it just needs to learn “king”, “man” and “woman”, so one less input to stress about!

However, word2vec isn’t without limitations. The largest problem comes from how we calculate the phrases in the first place, which is done by addition. As addition is commutative, the order doesn’t matter. The phrase “Alice sat on the horse” and “the horse sat on Alice” have the same representation in the hidden state! Word ordering is a key component of syntax and language understanding. Secondly, it’s not clear in a text when a phrase stops, should we consider only “The cat sat on the table” or should we consider the phrase “The cat sat on the table whilst the dog growled”.

Stay tuned for the next blog were we discuss these problems.

This post was written by Akbir Khan, a Research Engineer at the Spherical Defence Labs. For more information on Spherical Defence Labs, check out our website.

Hacking a Bank: 101

user — Sat, 13 Oct 2018 21:52:06 +0000

Advanced Banking Logic Vulnerabilities

Banks form the core of the economy. Today, Cyber Attacks are taking place daily and banks worldwide are affected.

In the past year, we have observed the security of various banks in terms of mobile application security. The security flaws described are those due to poor programming practices.

The Flaws can be elucidated as follows —

Replay Attack
Bypassing payment to a malicious beneficiary
Bypass Challenge Response in 2FA

In order to describe the attacks, consider a fictional bank, “SPDL Bank”. The customers of the bank are Alice and Bob, and the hacker is Eve. We want to reflect flaws in logic, and we use Charles Proxy to sniff SSL traffic between the mobile bank and the bank server.

A mobile banking application should allow the users to perform a subset of operations they can perform at the bank. Thus we lay down our assumptions of how the mobile banking application should actually function. While making a payment, a payment request should be valid only once. Similarly, transfers should be possible only to approved and trusted beneficiaries. Moving on to the challenge response, banks, as an added layer of security may ask for certain digits of a password (like 2nd,3rd and 7th digit), or a similar form of secondary authentication. Only upon responding with what was asked for, is the transaction processed.

1). Replay Attack

Suppose Alice is transferring money to Bob through the mobile banking application. The payment request should be valid only once. Any attempts to provide the same to the bank should be treated as invalid. In a practical scenario, suppose Alice is transferring 10$ to Bob legitimately. Bob can pair up with the hacker Eve, and can have replay the request 10 times. Thus 90$ is siphoned off from Alice’s account without her authorisation.

The defence against replay attacks is a nonce, or a secret between the client and the server as a function of time.

2. Bypass Payment Attack

As part of everyday business, Alice transfers 100 $ to Bob. This is a valid transaction since Bob exists in the list of approved beneficiaries.

The steps in completing a transaction are as follows-

Alice->Server : Transfer 100$ to Bob

Server-> Alice: OK ; Give me authentication numbers : 1 , 5 ,8

Alice->Server : Transfer 100$ to Bob ; Authentication 1:22 5:45 8:12

Transfer Successful

The Authentication characters can be considered to be Key Value pairs, where there are 16 Keys 1…16. There exist authentication digits for each of these

The Bypass payment hack happens in step 3. Eve, the adversary can tamper the request as

3. Alice -> Server : Transfer 10000$ to Eve ; Authentication 1:22 5:45 8:12

The server accepts it, and the transfer is successful. The problem is

Lack of check in step 3, if recipient is a beneficiary
State not maintained between step 1 and 3.

Thus money can be diverted to malicious entities.

3. Two Factor Authentication Bypass

As described in the transaction steps, authentication values need to be provided. The server asks for 3 values randomly out of 16, as a two factor auth.

An attack is changing the challenge response questions.

In Step 2, When the Bank asks for the 2FA

2. Server-> Alice: OK ; Give me authentication numbers : 1 , 5 ,8

Eve can tamper with the request response, and provide the 3 valid key value pairs she knows. Thus irrespective of what the server asks for, Eve can provide the key value pairs she knows, and the transaction still goes through. Thus she effectively bypasses the security mechanism since she can spoof each transaction.

Alice->Server : Transfer 100$ to Bob ; Authentication 1:22 2:99 3:10

This attack is an advanced one, and requires Eve to possess the session key. However once she has it by sniffing a live transaction, By combining vulnerability 2 & 3, She can create malicious transactions.

These flaws are related to the logic and may not fall under the banks threat model, as they assume the application to be in the trusted computing base. However, this assumption, may not hold true, given how easy it is to poison the phone certificate store through an application with misleading permissions.

Public Key Pinning would solve the problem in the sniffing, However there may be an adversary sniffing traffic on the first install and run of the banking application. In addition, these logic vulnerabilities would exist even in the web banking application.

At Spherical Defence, we are developing technology for banks to detect intrusion attempts real time using Deep Learning by learning grammar and the semantics of trusted communication.

This post was written by Dishant Shah, CEO of Spherical Defence.

Promising Papers: October I

user — Sat, 13 Oct 2018 21:28:32 +0000

In this blog series, we summarise our favourite non-conventional machine learning and artificial intelligence papers.

Distilling the Knowledge in a Neural Network

Hinton, Vinyals, & Dean (2015).

In 2015, Google Brain presented the machine learning technique of distillation, a novel approach to reduce model size and complexity. The general motivation was to distill the information contained in a more cumbersome model into a ligher model. We could then reduce an ensemble of models or a very deep neural network, into a lightweight model, that could be used for evaluation tasks. The core insight is the softmax logit of the network contain additional information (such as ambiguity, and uncertainty) about the predicted labels. Thus, the predictive distribution (over labels) that complicated models learn contain valuable information that can be exploited by other models.

Hinton, Vinyals, and Dean evaluate the technique of distillation in the contexts of image classification, speech, speech recognition, and ensemble learning. The authors also examined distillation as a regulariser. Applied to MNIST, training a smaller network on the softmax predictions of a larger network (rather than the direct labels) was shown to approximately cut in half the number of misclassified images. In the context of speech recognition, the averaged predictions across an ensemble of models was used to train a single model. The performance of the distilled model was better than any individual model in the ensemble, and only slightly worse than the ensembles performance.

Our opinion: We often throw out the ambiguity captured in a models predictive distribution. Viewing this distribution as a source of information is a key insight.

The Mechanics of n-Player Differentiable Games

Balduzzi et al (2018).

Published in June 2018, the Deepmind based team look at a series of toy problems within game theory through the lens of optimisation. Their first key contribution is to note that the dynamics of a game (how players update their strategy) can be seen as traveling on the level set of a corresponding Hamiltonian (and is thus the equivalent of a conservation law in Physics). Even more interestingly, they notice that finding the minimum on this Hamiltonian is equivalent to finding a Nash Equilibrium in the loss space.

Motivated by these observations, the authors use Hermoltz decomposition to resolve games into two types: potential and Hamiltonian games. The first is well explored whilst the second relates to the aforementioned contribution. With these new observations, the authors propose a new form of optimisation called Symplectic Gradient Descent — which is able to find stable fixed points in general games via the aforementioned Hamiltonian.

Our opinion: Unlike most related work in this field, the authors produce an optimiswation technique which is applicable to general games, not merely two player games such as GANs.

Meta-Learning for Semi Supervised Few Shot Classification

M.Ren et al (2018)

Few shot classification looks to move machine learning away from its dependency on large datasets, instead asking models to be able to correctly learn concepts from a handful of example in the way we recognize humans do. In this paper, the authors focus on refining Prototypical Networks. These networks are presented with a small set of labelled training images, and produce an embedding for each datapoints. These embedded datapoints are then clustered and prediction is equivalent to finding the nearest cluster in this embedding space. A model initialisation, sampling of training images and the associated training is considered a single episode. End-to-end training is performed over multiple episodes (with each new episode randomised), thus the model learns how to efficiently one-shot learn for over all episodes — a meta-learning strategy.

The authors decide to augment the training set to contain unlabelled images. These images may relate to the classes of other images, or be distractors — completely new images. This approach is compared to semi-supervised learning. The authors deal with distractors by three techniques: (1) predicting their classes and incorporating them into the current clusters by soft K-means, (2) clustering all distractors into a common class and (3) introducing a single multi-layer perceptron as a mask over which to apply soft K-means.

Our opinion: This helps extend models to more realistic situations — where all data may not be labelled. However, when evaluating the performance improvements, the results are only marginal.

Adversarially Augmented Adversarial Training

Erraqabi et al. (2018)

The authors describe a methodology to improve classifier robustness in the face of adversarial attacks designed to deliberately induce mis-classifications. The authors suggest training an auxiliary deep learning network on recognizing adversarial noise. Rather than assessing the
original input, for instance a facial image, the discriminator learns to recognise and filter out adversarial noise in one of the hidden layers of the classifier.

In this article, the authors describe a simple experiment where a classifier and a discriminator are co-trained, the input of the discriminator being a hidden layer of the classifier. The two networks are then supplied with real and perturbed MNIST data, whereupon the discriminator learns which noise to filter out. They show that the robustness of the classifier improves when coupled with an adversarial filter, resulting in a reduced degree of mis-classifications.

Our opinion: This is a promising first step in solving the problem of high variance in inputs and potential adversarial attacks in online security, facial recognition and computer vision in general. In their proof of concept, the addition of an adversarial discriminator increased the accuracy from 25% to 96%.

This post was written by Akbir Khan, Sean Billings and Sean Hooper — Research Engineers at the Spherical Defence Labs.