Andrew Mulholland's blog https://bash.sh notes on web performance, scalability and reliability Tue, 18 Aug 2015 18:36:32 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.1 Sum of Nothing https://bash.sh/sum-of-nothing/ https://bash.sh/sum-of-nothing/#respond Tue, 18 Aug 2015 17:09:41 +0000 http://bash.sh/?p=81 […]]]> It’s been 10 months since I’ve updated my blog and it’s not been for lack of want – for a lot of this time I was working for a startup in “stealth mode” – meaning there wasn’t a lot that I could talk about.

That startup was Sum (formerly Project Florida), which was a very ambitious startup focused on creating a meaningful purpose from wearable data – attempting to transform the way we think of our health from the reactive to the preventive.

Our hand was forced into creating our own wearable hardware as the existing wearable devices on the market didn’t have (and still don’t) have the accurate and rich data that we required.

It’s not called easy ware! It is incredibly difficult to make something beautiful, functional and revolutionary within a quick timeframe. We definitely had the design right, it was simply stunning.
We had some big challenges along the way – getting the interface to work in a natural way was something which frustrated for a while, but the product and hardware engineers iterated at this until it was intuitive.
Similarly the accuracy of data capture from the device posed some hurdles, leading the entire company to rally around to take the accuracy to market leading levels in just a few weeks, which was a huge confidence booster reaffirming the capability of our amazing team.

Unfortunately just as it seemed we were on the home stretch, planning the run up to launch, the backers opted to pull back meaning Sum had to close its doors.
The suddenness of the closing was a total shock – one week I had my six month review, the next I was job hunting.

Being in the US on an work visa (o1a) this meant a scramble to find a new job and have the visa transfer applied for before my employment status at Sum expired. This led to a frankly ridiculous interview schedule where I had 12 phone interviews in 1 week and then a further 8 face to face interviews the following week (including a 36 hour trip to San Francisco). I talked to many amazing companies and was fortunate to receive a number of really exciting offers.

I duly accepted one! News on that soon – I can’t wait to start – am currently awaiting USCIS approval for my visa migration and will spend the meantime trying to reinvigorate my blog – expect some forthcoming posts related to some of my recent work on system security.

]]>
https://bash.sh/sum-of-nothing/feed/ 0
Shell Unshock? https://bash.sh/shell-unshock/ https://bash.sh/shell-unshock/#respond Thu, 02 Oct 2014 16:57:07 +0000 http://bash.sh/?p=68 […]]]> A few friends have suggested that considering the domain on which I host my blog, I ought to share my thoughts on “shellshock” – the catchy name that has been given to Bash shell vulnerabilities detailed in CVE-2014-6271,  CVE-2014-6277,  CVE-2014-7169 and friends

First of all I should state that as with any vulnerability, particularly one which has a potential remote vector, you should ensure your systems are patched – and keep eyes out for further patches coming down the line as the first few patches were only partial fixes as they were rushed out of the door.

However to read websites like BBC News, which have headlines like “‘Deadly serious’ new vulnerability found”, suggesting that half a billion computers could be affected, you would think the world was about to cave in.

In short it’s not. Its a serious vulnerability, but to my mind at least it’s not “a bigger deal than Heartbleed” as some researchers are saying.

Here is why:

A quick recap. The Heartbleed bug essentially rendered useless the encryption between end users and any website or VPN using OpenSSL derived encryption. This included Google, Amazon, Apple, Twitter, Facebook, Yahoo and tens of thousands of others. These are the websites which store personal data, emails, credit card numbers, payment history and other information for hundreds of millions of users. Users in WiFi hotspots around the world were at risk of compromise, as well as any GCHQ and NSA  or other wiretaps that may exist.

The bash “shellshock” vulnerability has 4 main attack vectors:

  1. Internet server based remote attack
  2. Rogue DHCP server based attack
  3. SSH “Forced Command” environment escalation
  4. Local privilege escalation

Taking each in turn, I will explain them, and what the impact would be.

Internet Server based remote attack: This could be a web server using a “Common Gateway Interface” (CGI) to run scripts on a website, or could be a mail server running a spam checker. Additionally the default shell on the server would need to be “bash”. This is not the default for Solaris, BSD systems, nor Debian or Ubuntu Linux based systems. Additionally websites using CGI via libraries like FastCGI  or executing scripts via mod_php/mod_wsgi are not vulnerable as environment variables are not passed through to the end scripts. Typical websites that would fit into this category would be web forums, and simple websites with an email submission form.

For a web server that is vulnerable, it is trivially easy to exploit it, and there are hundreds of thousands if not millions of websites which are vulnerable. However the vast majority of these will be low traffic websites. For example according to wikipedia there are only a handful of web forums which have more than a million users, of which only a few if any will have been vulnerable to “shellshock”. So the impact of any single website being exploited would be low to moderate.

Rogue DHCP server based attack: The DHCP client that is commonly used for automatic IP configuration on Linux based systems has proven vulnerable to shellshock, as it shells out to configure the network, it is possible for a rogue DHCP server to be used to gain remote root access to a client. An example of where this could happen is on an open WIFI hotspot – however as only Linux systems which have bash as their default shell are vulnerable, only a small fraction of users would be at risk – i.e. OSX, Windows, and Debian/Ubuntu users would all be safe in this case.

It could also happen in a data centre, if shared networks were not properly segregated between customers, however the impact of such an exploit would be limited to that network, and the source could be relatively easily traced, this meaning the significance of this is also low.

SSH “Forced Command” environment escalation and Local Privilege Escalation: Whilst there are differences, the net result of both of these vectors is an authorised user obtaining additional privileges to those they have been granted. These are serious as they could result in data breaches, however as they require a user to have access in the first place, they are not of the same severity as one which can be exploited remotely.

In summary: whilst the “shellshock” vulnerability provides a trivial method to exploit a vulnerable web server, the impact of any vulnerability is not high as few websites still using CGI scripts in a vulnerable manner will have significant traffic. The other attack vectors are interesting, but do not place large parts of the internet at significant risk.

However the more serious concern is the hundreds of other exploits which do not hit the spot light in the media, as these are less likely to be patched so quickly. Examples include recent exploits in tomcat and apache httpd which like “shellshock” allow arbitrary code to be executed on remote systems, and are more likely to be in use at large web sites.

I think there needs to be a balance found between raising awareness of serious issues, and overhyping an issue. Some of the claims made for the “shellshock” exploit definitely fall into the sensationalist, overhype book for me.

]]>
https://bash.sh/shell-unshock/feed/ 0
Some simple steps to scale a ruby app https://bash.sh/some-simple-steps-to-scale-a-ruby-app/ https://bash.sh/some-simple-steps-to-scale-a-ruby-app/#respond Fri, 12 Sep 2014 04:34:16 +0000 http://bash.sh/?p=42 […]]]> At Hacker School, fridays are a little different to the rest of the week. Firstly they’re optional, and secondly rather than work on your usual projects its typical to do interview preparation.

These consist of mock interviews, fun with recursion, and coding challenges. One such challenge is to “Create a URL shortener in under 2 hours” which has become quite popular recently.

When faced with this challenge, I opted for Ruby with the Sinatra web framework backed by MySQL as I am pretty familiar with them, and was confident that I could complete the challenge within the time constraints.

Indeed 1 hour 54 minutes into the challenge (including a break for lunch!) I had  a functioning system and spent the remaining few minutes tidying up my code. At the demo/review session it worked (phew!). What would normally have happened is we’d each return to our own projects and never look at that code again.

However a conversation at the end of the review sparked my interest – that of how useful our freshly written services would be in the real world – would they scale? So I decided the following friday to embark upon some performance testing and optimisation of the service I built.

My goal was to focus on the low hanging fruit which are available to make a web service perform significantly better, and was able to take it from initially under 15 requests/second to over 1000 requests/second on the same hardware!

I decided to focus on the URL lookup phase – taking the short code as an input, and responding with a 302, because for a URL shortening service, this will make up for the vast majority of requests. I preloaded the service with 10,000 short codes.

For testing I used JMeter, as I was familiar with it and my requirements were pretty simple – I needed to hit the endpoint with data fed from some kind of source and ramp up load to find the point where response times begin to rise and request rate plateaus.

I started out testing on my Laptop (2013 Macbook Retina 13″), and the maximum request rate I could get out of it was 14.6 requests/second. This translates to around 50,000 requests per hour – not too bad, but really I’d been hoping for something better.

CPU usage seemed quite high, so I used `procsystime` – a frontend to dTrace to look into what was going on during a load test, and spotted that 99% of the CPU time for the program was spent forking

CPU Times for PID 7834,
         SYSCALL          TIME (ns)
psynch_cvclrprepost               1262
         madvise              25283
  close_nocancel           39823946
            fork        17151822963
          TOTAL:        17335516977

I did some digging on this, and discovered that when Sinatra is run under Shotgun, it restarts upon every request – this is great in a development environment as it means your latest changes are always live – however in an environment requiring performance, its not ideal.

Before making any changes, I moved the running of the app to a pair of c3.2xlarge instances running Ubuntu 14.04 in Amazon AWS. The reason for this was two fold – I wanted to separate out the duties from load injection and app server into individual servers and additionally I wanted to ensure that background processes on my laptop didn’t adversely affect results.

Screen Shot 2014-09-11 at 18.38.13I re-ran the previous load test on the new host with no other changes and recorded results of 59 requests/second. This was somewhat reassuring as with the c3.2xlarge instance having four times the number of cores of my laptop, it meant that the app scaled near perfectly in line with the increase in available CPU capacity.

Switching the app to run just with the syntax `ruby urlshortener.rb`  and re-running the load test saw a significant performance improvement, with 515 requests/second being served, or about 1.8M/hour.

This was somewhat satisfactory, but I wanted more. I thought I’d take a look at the datastore I was using.

mysql> SET profiling = 1;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> SELECT url FROM urls WHERE shortcode='3pxred';

+-------------------+
| url               |
+-------------------+
| http://bash.sh?96 |
+-------------------+
1 row in set (0.01 sec)
mysql> SHOW PROFILES;
+----------+------------+-------------------------------------------------+
| Query_ID | Duration   | Query                                           |
+----------+------------+-------------------------------------------------+
|        1 | 0.00870700 | select url from urls where shortcode='3pxred' |
+----------+------------+-------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

So 8.7ms of time to return  the url. I did this profiling over a significant number of queries, and it proved fairly typical. ~9ms is quick, but I wondered if it could be improved upon.

Keeping in mind my goal here is pull off low hanging fruit and not spend a lot of time optimising one area I quickly re-worked my service  to use Redis as a backend. Redis is a key value store, rather than a general purpose database like MySQL  and in this case I would be storing Keys (Shortcodes) and Values (URLs) so it was like an ideal use case.

Re-running the load test, and monitoring the redis instance for latency I saw that typical latency is now 0.3ms, so better than an order of magnitude improvement. More importantly the load test was now running at over 620 requests/second.

$ redis-cli --latency-history
min: 0, max: 3, avg: 0.30 (1369 samples) -- 15.01 seconds range
min: 0, max: 4, avg: 0.30 (1372 samples) -- 15.00 seconds range
min: 0, max: 5, avg: 0.28 (1376 samples) -- 15.01 seconds range

There was another reason for swapping out to Redis here – I was curious as to whether the performance would differ running under jruby as opposed to the native binary. With MySQL, I would need to port my queries over to use the MySQL JDBC driver, rather than the mysql2 gem I had been using, and it was definitely less effort to just use Redis.

In many ways, a simple web app which is just serving requests should be an ideal case for jRuby, as the JVM provides an very performant, well threaded runtime. Thankfully it doesn’t disappoint with the initial run generating over 2500 requests/second yielding a further 400% improvement in performance.

Probably this would have been a satisfactory outcome, but one of the wonderful things about the JVM is the rich diagnostic information that is available at your finger tips.

Since JDK 1.7.0_40 JavaFlightRecorder has been bundled with the Oracle HotSpot JVM. It exposes a huge number of performance metrics from within the JVM with minimal (<2%) performance overhead. FlightRecorder is an enhancement of a prior product formerly offered with BEA JRockit, and its now availability for HotSpot is part of Oracle’s long term strategy of one JVM built around HotSpot

For production use, you need to purchase a commercial JVM license from Oracle, but as explained here they are “freely available for download for development and evaluation purposes as part of JDK 7u40, but require the Oracle Java SE Advanced license for production use under the Oracle Binary Code License Agreement” 

I tested my app with Flight Recorder,  and then loaded the data in its sister application – Java Mission Control. It’s an incredibly powerful tool, so if you’ve not used it before, I would recommend watching some of the YouTube Videos that Oracle have made – here

Jumping into the method profiler, I can see that a lot of the time is being spent in WEBrick – the default HTTP server that Sinatra runs under.

Screen Shot 2014-09-11 at 23.43.02

From looking at the JRuby servers page I saw that many alternatives exist as drop in replacements, and opted to try three of them:

  • trinidad – which is built around Apache Tomcat
  • mizuno – which is built around Jetty
  • puma – which is written in ruby, but designed for high performance

For trinidad and puma, it was just a case of installing the gem, and re-starting the app. For mizuno I needed to generate a rack configfile to spawn it. I ran each with their default configurations and didn’t make any attempt to optimise them, because the focus for this exercise was to look to make big steps with little effort.

Each made an improvement – Trinidad boosted performance from ~2500 requests/second to just over ~3000 requests/second. Mizuno took this a bit further, and took it up to 3400 requests/second.

Puma though was a revelation for my use case, taking performance up to just shy of 4000 requests/second, and mean response times being in the order of 3ms.

Screen Shot 2014-09-11 at 20.50.40For completeness I then re-ran the tests on my own laptop, and recorded 1022 requests/second on there.

So the end result – for now – is capacity improvement of over 6500% on both platform sizes, which I’m reasonably happy with. This was combined with a latency improvement from over 300ms to just 3ms per call, which is far more acceptable.

It’s rare to be able to make such big gains in such a few steps, but this post serves to highlight the benefit of understanding the strengths and weaknesses of the tools you’re using and the environment you’re in. Those which work well in a development environment – like Shotgun – can prove detrimental to aspirations of performance, and similarly jRuby works really well for a long running service, but the Java initialisation time could make it frustrating for a rapidly changing development environment or for short running scripts.

In a future post I will explore the more intensive way of optimising performance – taking an application which is already running in a fairly performant manner, and look to eek out additional performance through profiling and identifying the right configuration tweaks and changes to make.

]]>
https://bash.sh/some-simple-steps-to-scale-a-ruby-app/feed/ 0
Thoughts on data theft and security https://bash.sh/thoughts-on-data-theft-and-security/ https://bash.sh/thoughts-on-data-theft-and-security/#respond Wed, 03 Sep 2014 04:10:41 +0000 http://bash.sh/?p=21 […]]]> It seems not a week goes by without breaches of some sort or other affecting a large generally reputable establishment being announced. I have been on the receiving end of just such a breach during my time at Betfair.

I don’t have inside knowledge to the ins and outs of each breach, but can talk to my experiences. In the case of Betfair, we were audited to PCI DSS regulations, we had a top team of InfoSec professionals, and we were on top of security patching – contributing useful fixes back to the community.

However a number of config mishaps coincided leaving a window of opportunity of just a few days. That was enough for someone skilled to get inside and gain access to data. Following that incident processes and procedures were updated to prevent a recurrence of that type of breach and action was taken to further improve security.

However as long as companies are home to interesting data they will be targets.

A typical online retailer holds the following information about a customer:

  • Name
  • One or more addresses
  • Payment details for one or more cards
  • Possibly data of birth, mothers maiden name or other data as extra security questions.

The question is, how much of the data that they hold do they actually need to?

I spent some time thinking about this toward the end of last year, after attending a TTI/Vanguard regional meeting hosted by Peter Cochrane entitled “The cloud can be inherently more secure than anything that has gone before“. The title was a deliberate provocation, designed to inspire debate amongst the attendees.

The meeting was exciting and enthralling with lively discussion throughout. An interesting highlight was Peter talking about one his actions as CTO for British Telecom, which was to hire a team based outside of BT’s offices, whose sole purpose was to try to break into BT. They wouldn’t provide forewarning, wouldn’t pause during system maintenance, wouldn’t share details of what they would be trying in advance. All they’d do is provide details of how they got in afterwards. The point of this being, to protect against the baddies out there, you need to think like them.

The major takeaway for me on the day though was that how we shop online today is utterly insecure by design.

To buy something online today requires entrusting to a third party all the details which are required to take money from us time and again.

We trust that the encrypted connection between them and us is secure, and cannot be breached. The Snowden leaks and Heartbleed, show us that this is not always the case.

We also trust that the party we are interacting with is careful with the data. The countless breaches over the years, show this not to be always the case.

We rationalise it because we’ll get the money back if our cards are stolen and yet often forget its our identity being taken too.

The chart below shows the number of accounts being exposed through data breaches over the years is on a definite trend upwards, with the disclosures so far this year indicating that 2013 was unfortunately no anomaly.

Exposed records from data breaches over the years. Source: Linkedin

So what can be done? Do online retailers really need to know everything about me? Do they need to store data which thieves will target? Most of the time I have deliveries from online retailers sent to my office, so why do they need to know my home address? Equally do they really need to know my credit card number?

The answer most will give, is “Of course, they need your home address, as its your billing address, so they need to prove you are who say you are, and they need your credit card number to take payment.” However that is just how the systems are designed today. To take a popular variant of a Grace Hopper quote:

The most damaging phrase in the language is: ‘We’ve always done it that way’ 

Her actual words were slightly different but the intent is the same as the popularised variant. Humans are allergic to change and so just keep trying to refine an existing system rather than thinking about ways of changing the system. After Peter Cochrane’s talk I spent some time thinking about how the system could be changed. The premise is fairly simple although lack of time since then hasn’t provided the opportunity to expand it beyond the idea phase.

What is required for online merchants is a secure, definite way to verify the identity of a customer and receive payment. They might like to have more data for analytical and marketing purposes, but fundamentally to conduct a transaction that is all which is required.

There are some partial solutions to this problem available such as PayPal which abstract your billing address from a merchant and allows you to control the funds that get sent. However it’s not ubiquitous and when faced with an additional step in the checkout phase, and being charged additional fees for using PayPal many customers don’t bother. They want the convenience of saving their payment details with a vendor and having one step checkouts.

Attempts at offering Virtual Debit Cards – one time use credit cards – have also not reached significant usage volumes. This is primarily due to falling down on the ease of use hurdle once more and doesn’t address the concern of protecting the customer’s identity.

What is required is something which is both easy to use and ubiquitous. What I’ve been thinking about on this front is some form of distributed escrow to be developed as a secure payment standard, which gets built into web browsers, as an extension to a HTML standard.

To have some global payment services hosts, which act as proxies to banks payment gateways. In my vision these would be hosted by a number of relatively trusted parties, such as Wikimedia, ISC, Apple, Microsoft, Google, Amazon, Paypal etc. They would receive a fraction of a penny per transaction, thus ensuring that it would worthwhile to host without pushing up the cost of transactions for the consumer.

Then for a online transaction, rather than a customer entering their credit card details into a webpage, and submitting that to a single website, instead they’d enter the details into their facet of their web browser with the website providing a transaction ID, and merchant ID too.

The browser would then encrypt and shard these details across transmissions to three or more of the global payment services hosts. These hosts would then forward the shard they received to your bank who would assemble, verify and process the data and send back an affirmative or negative response for the transaction.

The merchant would also be connected a global payment services host, and would receive a message with the transaction ID detailing whether to proceed with the transaction or not. For orders requiring shipping a delivery address could then be submitted to the merchant.

With careful browser design and integration the above could be just as easy to use as current systems and would truly limit the impact of a security breach at your favourite online retailer.

This way to conduct widespread theft of identity and or payment details either your bank needs to be hacked; multiple providers need to be concurrently hacked; or a widespread glut of malware needs to take hold. Of these the most likely one – malware – is a threat we already live on today. The others whilst still possible are significantly less likely, so the end result should be significantly boosted online security for no additional user inconvenience.

It seems like a plausible solution to me. Could it work?

 

Edit to add: The Apple announcement on Sep 9th with regard to ApplePay addresses many of the concerns I raised – taking a similar approach with regard to ensuring that the merchant doesn’t require to store your name and payment details. No doubt an Apple solution will tick the box of being easy to use – the two caveats I see are that it’ll only work for Apple customers – a sizeable chunk of the population for sure – and that it requires trusting Apple implicitly, and whilst they will be taking the security of this very seriously, it will be quite a big basket of eggs…

]]>
https://bash.sh/thoughts-on-data-theft-and-security/feed/ 0
First Post https://bash.sh/first-post/ https://bash.sh/first-post/#respond Thu, 28 Aug 2014 18:58:35 +0000 http://bash.sh/?p=1 […]]]> Finally got round to setting this thing up.

Last time I had a go at maintaining a blog was back when I was at Joost – almost 7 years ago. I set something up on tumblr and made a couple of posts, but never got into it.

This time I’ll try harder – I think I’ve got some interesting stories to share from the lessons I’ve learnt along the way at Expedia and Betfair, and my time at HackerSchool this summer has made me realise the value in sharing what you know.

For those who don’t know, I left Expedia at the end of May, and opted to take the summer off and go to Hacker School for the summer. My main goal for Hacker School is to learn, to write code, and to have some fun. I’m now half way through my time there, and its exceeding all expectations  thus far.

My initial project was to work in computer vision – to automate the categorising of images by content – in particular to take the motorsport photos I have, and separate by car manufacturer, or other significant details (e.g. sponsor logos).

A video of the results from the end of the my first week at Hacker School is available here:

https://www.youtube.com/watch?v=eek4ARqZr6s

Since then have improved up on that significantly, through  pairing with Nava, and learning more about how OpenCV works behind the scenes, I’ve begun to move onto my next project, which is a network monitoring project. The purpose of this is to help visualise slowdowns within the Hacker School network to help us quickly see the cause – on and to continue my other goals of learning, writing code and having fun!

]]>
https://bash.sh/first-post/feed/ 0