Range expression</th>	1st Value</th>	2nd Value</th> </tr>
array or slice</td>	index i</td>	a[i]</td> </tr>
map</td>	key k</td>	m[k]</td> </tr>
string</td>	index i of rune</td>	rune int</td> </tr>
channel</td>	element</td>	error</td> </tr> </table> What range does for arrays and maps seems consistent and not particularly surprising. Things get a tad slightly odd with channels. A second variable arguably doesn't make much sense when ranging over a channel, so trying to do this results in a compile time error. Not terribly consistent, but logical.</p> Weirder still is range</strong> over strings. When operating on a string, range returns runes</a> (Unicode code points) not bytes. So, this code:</p> `s</span> :=</span> "a</span>\u00fc</span>b"</span></span> for</span> a, b</span> := range</span> s {</span></span> fmt.</span>Println</span>(a, b)</span></span> }</span></span></code></pre> Prints this:</p>` `0 97</span></span> 1 252</span></span> 3 98</span></span></code></pre> Notice the jump from 1 to 3 in the array index, because the rune at offset 1 is two bites wide in UTF-8. And look what happens when we now retrieve the value at that offset from the array. This:</p>` `fmt.</span>Println</span>(s[</span>1</span>])</span></span></code></pre> Prints this:</p>` 195</span></span></code></pre> What gives? At first glance, it's reasonable to expect this to print 252, as returned by range</strong>. That's wrong, though, because string access by index operates on bytes, so what we're given is the first byte of the UTF-8 encoding of the rune. This is bound to cause subtle bugs. Code that works perfectly on ASCII text simply due to the fact that UTF-8 encodes these in a single byte will fail mysteriously as soon as non-ASCII characters appear.</p> My argument here is that range</strong> is a very clear example of design directly from concrete use cases down, with little concern for consistency. In fact, the table of range</strong> return values above is really just a compendium of use cases: at each point the result is simply the one that is most directly useful. So, it makes total sense that ranging over strings returns runes. In fact, doing anything else would arguably be incorrect. What's characteristic here is that no attempt was made to reconcile this interface with the core of the language. It serves the use case well, but feels jarring.</p>Arrays are values, maps are references</h2> One final example along these lines. A core irregularity at the heart of Go is that arrays are values, while maps are references. So, this code will modify the s</strong> variable:</p> func</span> mod</span>(</span>x</span> map</span>[</span>int</span>]</span> int</span>){</span></span> x[</span>0</span>]</span> =</span> 2</span></span> }</span></span> </span> func</span> main</span>() {</span></span> s</span> := map</span>[</span>int</span>]</span>int</span>{}</span></span> mod</span>(s)</span></span> fmt.</span>Println</span>(s)</span></span> }</span></span></code></pre> And print:</p> map[0:2]</span></span></code></pre> While this code won't:</p> func</span> mod</span>(</span>x</span> [</span>1</span>]</span>int</span>){</span></span> x[</span>0</span>]</span> =</span> 2</span></span> }</span></span> </span> func</span> main</span>() {</span></span> s</span> :=</span> [</span>1</span>]</span>int</span>{}</span></span> mod</span>(s)</span></span> fmt.</span>Println</span>(s)</span></span> }</span></span></code></pre> And will print:</p> [0]</span></span></code></pre> This is undoubtedly inconsistent, but it turns out not to be an issue in practice, mostly because slices are</em> references, and are passed around much more frequently than arrays. This issue has surprised enough people to make it into the Go FAQ, where the justification is as follows</a>:</p> There's a lot of history on that topic. Early on, maps and channels were syntactically pointers and it was impossible to declare or use a non-pointer instance. Also, we struggled with how arrays should work. Eventually we decided that the strict separation of pointers and values made the language harder to use. This change added some regrettable complexity to the language but had a large effect on usability: Go became a more productive, comfortable language when it was introduced.</p> </blockquote> This is not exactly the clearest explanation for a technical decision I've ever read, so allow me to paraphrase: "Things evolved this way for pragmatic reasons, and consistency was never important enough to force a reconciliation".</p> The G Word</h2> Now we get to that perpetual bugbear of Go critiques: the lack of generics. This, I think, is the deepest example of the Go designers' willingness to sacrifice coherence for pragmatism. One gets the feeling that the Go devs are a tad weary of this argument by now, but the issue is substantive and worth facing squarely. The crux of the matter is this: Go's built-in container types are super special. They can be parameterized with the type of their contained values in a way that user-written data structures can't be.</p> The supported way to do generic data structures is to use blank interfaces. Lets look at an example of how this works in practice. First, here is a simple use of the built-in array type.</p> l</span> :=</span> make</span>([]</span>string</span>,</span> 1</span>)</span></span> l[</span>0</span>]</span> =</span> "foo"</span></span> str</span> :=</span> l[</span>0</span>]</span></span></code></pre> In the first line we initialize the array with the type string</strong>. We then insert a value, and in the final line, we retrieve it. At this point, str</strong> has type string</strong> and is ready to use. The user-written analogue of this might be a modest data structure with put</strong> and get</strong> methods. We can define this using interfaces like so:</p> type</span> gtype</span> struct</span> {</span></span> data</span> interface</span>{}</span></span> }</span></span> func</span> (</span>t </span></span>gtype</span>)</span> put</span>(</span>v</span> interface</span>{}) {</span></span> t.data</span> =</span> v</span></span> }</span></span> func</span> (</span>t </span></span>gtype</span>)</span> get</span>()</span> interface</span>{} {</span></span> return</span> t.data</span></span> }</span></span></code></pre> To use this structure, we would say:</p> v</span> :=</span> gtype</span>{}</span></span> v.</span>put</span>(</span>"foo"</span>)</span></span> str</span> :=</span> v.</span>get</span>().(</span>string</span>)</span></span></code></pre> We can assign a string to a variable with the empty interface type without doing anything special, so put</strong> is simple. However, we need to use a type assertion on the way out, otherwise the str</strong> variable will have type interface{}</strong>, which is probably not what we want.</p> There are a number of issues here. It's cosmetically bothersome that we have to place the burden of type assertion on the caller of our data structure, making the interface just a little bit less nice to use. But the problems extend beyond syntactic inconvenience - there's a substantive difference between these two ways of doing things. Trying to insert a value of the wrong type into the built-in array causes a compile-time error, but the type assertion acts at run-time and causes a panic on failure. The blank-interface paradigm sidesteps Go's compile time type checking, negating any benefit we may have received from it.</p> The biggest issue for me, though, is the conceptual inconsistency. This is something that's difficult to put into words, so here's a picture:</p> </a> </div> The fact that the built-in containers magically do useful things that user-written code can't irks me. It hasn't become less jarring over time, and still feels like a bit of grit in my eye that I can't get rid of. I might be an extreme case, but this is an aesthetic instinct that I think is shared by many programmers, and would have convinced many language designers to approach the problem differently.</p> The extent to which Go's lack of generics is a critical problem, however, is not the point here. The meat of the matter is why</strong> this design decision was taken, and what it reveals about the character of Go. Here's how the lack of generics is justified by the Go developers</a>:</p> Many proposals for generics-like features have been mooted both publicly and internally, but as yet we haven't found a proposal that is consistent with the rest of the language. We think that one of Go's key strengths is its simplicity, so we are wary of introducing new features that might make the language more difficult to understand.</p> </blockquote> Instead of creating the atomic elements needed to support generic data structures then adding a suite of them to the standard library, the Go team went the other way. There was a concrete use case for good data structures, and so they were added. Attempting a deep reconciliation with the rest of the language was a secondary requirement that was so unimportant that it fell by the wayside for Go 1.x.</p> A Pragmatic Beauty</h1> Lets over-simplify for a moment and divide languages into two extreme camps. On the one hand, you have languages that are highly consistent, with most higher order functionality deriving from the atomic elements of the language. In this camp, we can find languages like Lisp. On the other hand are languages that are shamelessly eager to please. They tend to grow organically, sprouting syntax as needed to solve specific pragmatic problems. As a consequence, they tend to be large, syntactically diverse, not terribly coherent, and, occasionally, sometimes even unparseable</a>. In this camp, we find languages like Perl. It's tempting to think that there exists a language somewhere in the infinite multiverse of possibilities that unites perfect consistency and perfect usability, but if there is, we haven't found it. The reality is that all languages are a compromise, and that balancing these two forces against each other is really what makes language design so hard. Placing too much value on consistency constrains the human concessions we can make for mundane use cases. Making too many concessions results in a language that lacks coherence.</p> Like many programmers, I instinctively prefer purity and consistency and distrust "magic". In fact, I've never found a language with a strongly pragmatic bent that I really liked. Until now, that is. Because there's one thing I'm pretty clear on: Go is on the Perl end of this language design spectrum. It's designed firmly from concrete use cases down, and shows its willingness to sacrifice consistency for practicality again and again. The effects of this design philosophy permeate the language. This, then, is the source of my initial dissatisfaction with Go: I'm pre-disposed to dislike many of its core design decisions.</p> Why, then, has the language grown on me over time? Well, I've gradually become convinced that practically-motivated flaws like the ones I list in this post add up to create Go's unexpected nimbleness. There's a weird sort of alchemy going on here, because I think any one of these decisions in isolation makes Go a worse language (even if only slightly). Together, however, they jolt Go out of a local maximum many procedural languages are stuck in, and take it somewhere better. Look again at each of the cases above, and imagine what the cumulative effect on Go would have been if the consistent choice had been made each time. The language would have more syntax, more core concepts to deal with, and be more verbose to write. Once you reason through the repercussions, you find that the result would have been a worse language overall. It's clear that Go is not the way it is because its designers didn't know better, or didn't care. Go is the result of a conscious pragmatism that is deep and audacious. Starting with this philosophy, but still managing to keep the language small and taut, with almost nothing dispensable or extraneous took great discipline and insight, and is a remarkable achievement.</p> So, despite its flaws, Go remains graceful. It just took me a while to appreciate it, because I expected the grace of a ballet dancer, but found the grace of an battered but experienced bar-room brawler.</p> --</p> Edited to remove some inaccuracies about channels.</p> ^3</sup>Simplified from here</a>.</p> </div> ^{2</sup> I don't mean mundane details like the syntax and core concepts of a language. In the case of Go, you can get a handle on these in an hour by reading the language specification.</p> </div>}^{1</sup> Pedant hedge: yes, the illusion isn't perfect, and there are in fact subtle ways in which Python dictionaries are not just objects like any other.</p> </div>} mitmproxy and pathod 0.9.2 2013-08-25T00:00:00+00:00 </a> </div> I've just released v0.9.2 of both mitmproxy</a> and pathod</a>. This is a bugfix release, chiefly to address two crashing issues affecting mitmproxy when relaying SSL traffic. A range of other fixes and improvements are also included - if you use mitmproxy, you should upgrade.</p> CHANGELOG</h2> Improvements to the mitmproxywrapper.py helper script for OSX.</li> Don't take minor version into account when checking for serialized file compatibility.</li> Fix a bug causing resource exhaustion under some circumstances for SSL connections.</li> Revamp the way we store interception certificates. We used to store these on disk, they're now in-memory. This fixes a race condition related to cert handling, and improves compatibility with Windows, where the rules governing permitted file names are weird, resulting in errors for some valid IDNA-encoded names.</li> Display transfer rates for responses in the flow list.</li> Many other small bugfixes and improvements.</li> </ul> Introducing choir.io 2013-08-16T00:00:00+00:00 </a> choir.io </div> </div> Today, I'm raising the veil (slightly) on a new project - choir.io</a>. The most succinct description of choir.io is that it is a service that turns events into sound. Why would you want to do that? Well, I believe that there are compelling reasons to make sound part of your monitoring stack. Let's see if I can convince you.</p> The soundscape</h2> When I walk into my study every morning, I'm surrounded a rich, subtle soundscape that exists just beneath conscious perception. My air-conditioner, computers and monitors all emit hums and purrs. I can "tune in" to these if I focus, but they usually only draw my attention when something changes. When the power goes out there is a deathly silence, when a CPU fan noise changes pitch or texture, it bothers me immediately.</p> Layered over this background are more obtrusive sounds, closer to the threshold of awareness - the clacking of keyboards, faint noises of my family getting ready for their day upstairs, the front door opening and closing. Whether or not I pay attention to these is somewhat context dependent. Am I waiting, or instance, for my wife and kids to start trooping down the stairs so I can join them for my son's swimming lesson? If I am, I listen out for those sounds specifically. I get an enormous amount of information about my world from these more discrete, event-related noises.</p> Finally, there are the really obtrusive sounds, things that immediately get my attention. This might be someone saying my name, my phone ringing, a knock at the door, or a smoke alarm. I'm very aware of these, and they usually signal something I have to deal with immediately.</p> These layers of more and less obtrusive sounds form a soundscape that is ever-present, and utterly necessary in our day-to-day lives. Notice how effortless this process of extracting meaning from our ambient sounds is. Our minds process this information stream without any mental exertion, filters out what we don't need to notice, and draws our attention to what we do. There's a lot of cognitive research (that I might delve into in future posts) that show that our brains and auditory systems are specifically designed to make sense of the world in this way.</p> We have nothing like this rich texture of ambient awareness for the technology that surrounds us. Our monitoring mechanisms seem to be stuck at the ends of the intrusiveness spectrum. At one end, we have email notifications that demand our attention until we start to ignore them or silence them with a filter. At the other end we have passive status dashboards that require us to remember to switch context and visually consult a different interface. Choir.io doesn't aim to supplant either of these, but tries to fill in the blank portion of the awareness spectrum between them.</p> When I sit at my desk, I can hear our server architecture humming away. There's the subtle pitter-patter of hits to various webservers, the occasional clack of an SSH login. Occasionally there is a chime when @alexdong pushes to Github, followed shortly by the celebratory cheer of a server deploy. When I hear the jarring note of a 500 server error, I switch context to view logs or a dashboard, but otherwise my focus stays with my editor window. Choir is young, but it's already become an indispensable part of my life.</p> Challenges and next steps</h2> There are a number of key questions that we'd like to answer with the help of our intrepid early adopters. First among these is the question of soundscape design. What makes a good sound pack? What is the right mix of intrusive and non-intrusive sounds? How do we construct soundscapes that blend into the background like natural sounds do? Another set of questions surrounds the API and integration. What is the right blend of simplicity and power is in the API? Which services should we integrate with next?</p> There are some obvious next steps in the works. We recognize that sound pack design is a deep problem with subjective solutions. So, letting users assemble, edit and eventually share their own sound packs is high on our list of priorities. Free-standing Choir.io player apps for Windows and OSX will also be on the way soon, so you won't need to remember to keep a browser tab open. Technical improvements to the API that are on the way include UDP and SSL support.</p> Choir is trying to do something new, and we want as much feedback as early in the process as possible. So, we've decided to start sending out invites today, even though Choir is far from the polished system that it will be in a few months. If you're brave, willing to give frank feedback, and want to help us explore this exciting idea, please request an invite</a>.</p> mitmproxy 0.9.1 2013-06-16T00:00:00+00:00 </a> </div> I'm happy to announce the release of mitmproxy 0.9.1</a>. This is a bugfix release, with no significant changes in behaviour.</p> As hinted in my previous release note, the project itself is also evolving. As of this release, mitmproxy and its sister projects (pathod</a> and netlib</a>) are housed under a separate organization on Github, rather than my own personal space:</p> github.com/mitmproxy</a></p> I'm also very happy to welcome the first external core developer to the mitmproxy projext: Maximilian Hils</a>. Max is the author of HoneyProxy</a>, a web analysis front-end for mitmproxy. In the next few months, he'll be working on integrating and expanding his work to become mitmproxy's official web interface. Max's efforts will be sponsored by Google under their Summer of Code</a> program, and will be mentored by the HoneyNet Project</a>.</p> Changelog</h2> Use "correct" case for Content-Type headers added by mitmproxy.</li> Make UTF environment detection more robust.</li> Improved MIME-type detection for viewers.</li> Always read files in binary mode (Windows compatibility fix).</li> Correct PyOpenSSL dependency declaration.</li> Some developer documentation.</li> </ul> Skout: a devastating privacy vulnerability 2013-05-31T00:00:00+00:00 I've become a bit weary of the process of public vulnerability disclosure - I'm much more likely nowadays to just drop companies an anonymous notice and move on. Every so often, though, I come across an issue so egregious that talking about it publicly seems like an imperative. This is one of them.</p> First, some background. Skout is a location-based mobile social network. The idea is to allow people to meet others in their area, semi-anonymously, get to know them, and then perhaps line up a meeting in meatspace. As far as I can tell, a huge fraction of the userbase are singles, using Skout as an ad-hoc dating app. Skout's scale is significant - they don't release exact user numbers, but I've seen claims of more than 10 million users, and a growth rate of a million users per month.</p> In 2012, Skout went through a major PR catastrophe, when its service was linked to no fewer than 3 separate rapes of children</a> by adult men posing as teenagers. Skout immediately suspended the service for teenagers and went through a security re-vamp. A month later, teens were allowed back</a>, with Skout making much of its new safety system, "advanced, proprietary algorithms" to weed out stalkers, and its long-term commitment to community safety.</p> Given this background, the problem I found is simple but devastating. The Skout mobile application talks to Skout's servers through a simple API. When a user's profile is viewed an unencrypted, plain-HTTP request is made to to a path like this:</p> http://i22.skout.com/services/ServerService/getProfile</span></span></code></pre> What's returned is a blob of XML containing the user's complete profile data. In fact, the profile data is too</em> complete, including some bits of data information that is never actually used by the app. For example, we can see the user's exact date of birth:</p> <</span>ax213:birthdayDate</span>>xx/xx/1995</</span>ax213:birthdayDate</span>></span></span></code></pre> ... but only the user's age in years is actually displayed. Most serious, however, is the high-precision location information that is returned in the ax213:homeLocation and ax213:location tags:</p> <</span>ax213:latitude</span>>-xx.xxx</</span>ax213:latitude</span>></span></span> <</span>ax213:longitude</span>>xxx.xxx</</span>ax213:longitude</span>></span></span></code></pre> The three decimal places of precision in the co-ordinates is enough to locate a user to within about 110 meters north-south, and substantially less than that east-west depending on the distance from the equator. Here's what that looks like in a hypothetical example:</p> </a> </div> I used mitmproxy</a> to observe Skout's traffic, but because the request is unencrypted any tool that allows you to inspect network traffic would be enough. The result is a stalker's wet dream - click on an anonymous profile, watch your network traffic, and find out exactly where the victim lives. I've also seen minors located at malls where they hang out, and at their schools... Given the scale of Skout's userbase and the ease with which the data can be obtained, I think there's a high likelihood that this issue has already been used for unsavoury purposes.</p> I reported the vulnerability to Skout on the 24th of May. I'm happy to report that they immediately realised the seriousness of the situation, and their API stopped returning exact lat/long values a few hours later. Subsequent correspondence with Niklas Lindstrom, Skout's CTO, confirmed that they were taking steps to tighten security. I've encouraged Skout to speak about this publicly - their userbase needs to know about the issue, and need to be reassured that action is being taken to ensure that this type of privacy breach won't ever recur.</p> How mitmproxy works 2013-05-16T00:00:00+00:00 I started work on mitmproxy</a> because I was frustrated with the available interception tools. I had a long list of minor complaints - they were insufficiently flexible, not programmable enough, mostly written in Java (a language I don't enjoy), and so forth. My most serious problem, though, was opacity. The best tools were all closed source and commercial. SSL interception is a complicated and delicate process, and after a certain point, not understanding precisely what your proxy is doing just doesn't fly.</p> The text below is now part of the official documentation</a> of mitmproxy. It's a detailed description of mitmproxy's interception process, and is more or less the overview document I wish I had when I first started the project. I proceed by example, starting with the simplest unencrypted explicit proxying, and working up to the most complicated interaction - transparent proxying of SSL-protected traffic1</a></sup> in the presence of SNI</a>.</p> Explicit HTTP</h2> Configuring the client to use mitmproxy as an explicit proxy is the simplest and most reliable way to intercept traffic. The proxy protocol is codified in the HTTP RFC</a>, so the behaviour of both the client and the server is well defined, and usually reliable. In the simplest possible interaction with mitmproxy, a client connects directly to the proxy and makes a request that looks like this:</p> GET http://example.com/index.html HTTP/1.1</span></span></code></pre> This is a proxy GET request - an extended form of the vanilla HTTP GET request that includes a schema and host specification, and it includes all the information mitmproxy needs to relay the request upstream.</p> </a> </div> 1</b></td> The client connects to the proxy and makes a request.</td> </tr> 2</b></td> Mitmproxy connects to the upstream server and simply forwards the request on.</td> </tr> </tbody> </table> Explicit HTTPS</h2> The process for an explicitly proxied HTTPS connection is quite different. The client connects to the proxy and makes a request that looks like this:</p> CONNECT example.com:443 HTTP/1.1</span></span></code></pre> A conventional proxy can neither view nor manipulate an SSL-encrypted data stream, so a CONNECT request simply asks the proxy to open a pipe between the client and server. The proxy here is just a facilitator - it blindly forwards data in both directions without knowing anything about the contents. The negotiation of the SSL connection happens over this pipe, and the subsequent flow of requests and responses are completely opaque to the proxy.</p> The MITM in mitmproxy</h3> This is where mitmproxy's fundamental trick comes into play. The MITM in its name stands for Man-In-The-Middle - a reference to the process we use to intercept and interfere with these theoretically opaque data streams. The basic idea is to pretend to be the server to the client, and pretend to be the client to the server, while we sit in the middle decoding traffic from both sides. The tricky part is that the Certificate Authority</a> system is designed to prevent exactly this attack, by allowing a trusted third-party to cryptographically sign a server's SSL certificates to verify that they are legit. If this signature doesn't match or is from a non-trusted party, a secure client will simply drop the connection and refuse to proceed. Despite the many shortcomings of the CA system as it exists today, this is usually fatal to attempts to MITM an SSL connection for analysis. Our answer to this conundrum is to become a trusted Certificate Authority ourselves. Mitmproxy includes a full CA implementation that generates interception certificates on the fly. To get the client to trust these certificates, we register mitmproxy as a trusted CA with the device manually</a>.</p> Complication 1: What's the remote hostname?</h3> To proceed with this plan, we need to know the domain name to use in the interception certificate - the client will verify that the certificate is for the domain it's connecting to, and abort if this is not the case. At first blush, it seems that the CONNECT request above gives us all we need - in this example, both of these values are "example.com". But what if the client had initiated the connection as follows:</p> CONNECT 10.1.1.1:443 HTTP/1.1</span></span></code></pre> Using the IP address is perfectly legitimate because it gives us enough information to initiate the pipe, even though it doesn't reveal the remote hostname.</p> Mitmproxy has a cunning mechanism that smooths this over - upstream certificate sniffing</a>. As soon as we see the CONNECT request, we pause the client part of the conversation, and initiate a simultaneous connection to the server. We complete the SSL handshake with the server, and inspect the certificates it used. Now, we use the Common Name in the upstream SSL certificates to generate the dummy certificate for the client. Voila, we have the correct hostname to present to the client, even if it was never specified.</p> Complication 2: Subject Alternative Name</h3> Enter the next complication. Sometimes, the certificate Common Name is not, in fact, the hostname that the client is connecting to. This is because of the optional Subject Alternative Name</a> field in the SSL certificate that allows an arbitrary number of alternative domains to be specified. If the expected domain matches any of these, the client will proceed, even though the domain doesn't match the certificate Common Name. The answer here is simple: when extract the CN from the upstream cert, we also extract the SANs, and add them to the generated dummy certificate.</p> Complication 3: Server Name Indication</h3> One of the big limitations of vanilla SSL is that each certificate requires its own IP address. This means that you couldn't do virtual hosting where multiple domains with independent certificates share the same IP address. In a world with a rapidly shrinking IPv4 address pool this is a problem, and we have a solution in the form of the Server Name Indication</a> extension to the SSL and TLS protocols. This lets the client specify the remote server name at the start of the SSL handshake, which then lets the server select the right certificate to complete the process.</p> SNI breaks our upstream certificate sniffing process, because when we connect without using SNI, we get served a default certificate that may have nothing to do with the certificate expected by the client. The solution is another tricky complication to the client connection process. After the client connects, we allow the SSL handshake to continue until just after</em> the SNI value has been passed to us. Now we can pause the conversation, and initiate an upstream connection using the correct SNI value, which then serves us the correct upstream certificate, from which we can extract the expected CN and SANs.</p> There's another wrinkle here. Due to a limitation of the SSL library mitmproxy uses, we can't detect that a connection hasn't</em> sent an SNI request until it's too late for upstream certificate sniffing. In practice, we therefore make a vanilla SSL connection upstream to sniff non-SNI certificates, and then discard the connection if the client sends an SNI notification. If you're watching your traffic with a packet sniffer, you'll see two connections to the server when an SNI request is made, the first of which is immediately closed after the SSL handshake. Luckily, this is almost never an issue in practice.</p> Putting it all together</h3> Lets put all of this together into the complete explicitly proxied HTTPS flow.</p> </a> </div> 1</b></td> The client makes a connection to mitmproxy, and issues an HTTP CONNECT request.</td> </tr> 2</b></td> Mitmproxy responds with a 200 Connection Established, as if it has set up the CONNECT pipe.</td> </tr> 3</b></td> The client believes it's talking to the remote server, and initiates the SSL connection. It uses SNI to indicate the hostname it is connecting to.</td> </tr> 4</b></td> Mitmproxy connects to the server, and establishes an SSL connection using the SNI hostname indicated by the client.</td> </tr> 5</b></td> The server responds with the matching SSL certificate, which contains the CN and SAN values needed to generate the interception certificate.</td> </tr> 6</b></td> Mitmproxy generates the interception cert, and continues the client SSL handshake paused in step 3.</td> </tr> 7</b></td> The client sends the request over the established SSL connection.</td> </tr> 7</b></td> Mitmproxy passes the request on to the server over the SSL connection initiated in step 4.</td> </tr> </tbody> </table> Transparent HTTP</h2> When a transparent proxy is used, the HTTP/S connection is redirected into a proxy at the network layer, without any client configuration being required. This makes transparent proxying ideal for those situations where you can't change client behaviour - proxy-oblivious Android applications being a common example.</p> To achieve this, we need to introduce two extra components. The first is a redirection mechanism that transparently reroutes a TCP connection destined for a server on the Internet to a listening proxy server. This usually takes the form of a firewall on the same host as the proxy server - iptables</a> on Linux or pf</a> on OSX. Once the client has initiated the connection, it makes a vanilla HTTP request, which might look something like this:</p> GET /index.html HTTP/1.1</span></span></code></pre> Note that this request differs from the explicit proxy variation, in that it omits the scheme and hostname. How, then, do we know which upstream host to forward the request to? The routing mechanism that has performed the redirection keeps track of the original destination for us. Each routing mechanism has a different way of exposing this data, so this introduces the second component required for working transparent proxying: a host module that knows how to retrieve the original destination address from the router. In mitmproxy, this takes the form of a built-in set of modules</a> that know how to talk to each platform's redirection mechanism. Once we have this information, the process is fairly straight-forward.</p> </a> </div> 1</b></td> The client makes a connection to the server.</td> </tr> 2</b></td> The router redirects the connection to mitmproxy, which is typically listening on a local port of the same host. Mitmproxy then consults the routing mechanism to establish what the original destination was.</td> </tr> 3</b></td> Now, we simply read the client's request...</td> </tr> 4</b></td> ... and forward it upstream.</td> </tr> </tbody> </table> Transparent HTTPS</h2> The first step is to determine whether we should treat an incoming connection as HTTPS. The mechanism for doing this is simple - we use the routing mechanism to find out what the original destination port is. By default, we treat all traffic destined for ports 443 and 8443 as SSL.</p> From here, the process is a merger of the methods we've described for transparently proxying HTTP, and explicitly proxying HTTPS. We use the routing mechanism to establish the upstream server address, and then proceed as for explicit HTTPS connections to establish the CN and SANs, and cope with SNI.</p> </a> </div> 1</b></td> The client makes a connection to the server.</td> </tr> 2</b></td> The router redirects the connection to mitmproxy, which is typically listening on a local port of the same host. Mitmproxy then consults the routing mechanism to establish what the original destination was.</td> </tr> 3</b></td> The client believes it's talking to the remote server, and initiates the SSL connection. It uses SNI to indicate the hostname it is connecting to.</td> </tr> 4</b></td> Mitmproxy connects to the server, and establishes an SSL connection using the SNI hostname indicated by the client.</td> </tr> 5</b></td> The server responds with the matching SSL certificate, which contains the CN and SAN values needed to generate the interception certificate.</td> </tr> 6</b></td> Mitmproxy generates the interception cert, and continues the client SSL handshake paused in step 3.</td> </tr> 7</b></td> The client sends the request over the established SSL connection.</td> </tr> 7</b></td> Mitmproxy passes the request on to the server over the SSL connection initiated in step 4.</td> </tr> </tbody> </table> ^{1</sup> I use "SSL" to refer to both SSL and TLS in the generic sense, unless otherwise specified.</p> </div> pathod 0.9 2013-05-16T00:00:00+00:00 I've just released pathod 0.9</a>, my toolset for crafting malicious and interesting HTTP traffic. Apart from the usual range of stability improvements and bugfixes, this release introduces a major new set of features: proxy support. Pathoc</a>, the client, has sprouted support for vanilla proxy connections, and is also able to tunnel through proxies using CONNECT. Pathod</a>, the server, will now respond to proxy requests as well as straight HTTP, and will treat CONNECT requests as SSL with on-the-fly generation of dummy certificates.</p> The Pathod changes in particular open a whole new range of possibilities for fuzzing and other mischief. Any client with proxy support can be directed at Pathod, which can then impersonate the upstream server and return the creatively malicious response of your choice.</p> There have also been some organizational changes. This is the first release based on netlib</a>, the gonzo networking library pathod now shares with mitmproxy</a>. Over the next while, pathod and mitmproxy will move closer together. As a sign of this, the major version numbers between these projects are now synchronized.</p> mitmproxy 0.9 2013-05-15T00:00:00+00:00 </a> </div> I'm happy to announce the release of mitmproxy 0.9</a>. This is a major release, with huge improvements to mitmproxy pretty much across the board. So much has happened in the year since the last release that it's difficult to pick out the headlines. Mitmproxy is now faster, more scalable, and works in more tricky corner cases than ever before. Full transparent mode support has landed for both Linux and OSX. Content decoding is much nicer, with a slew of new targets like AMF</a> and Protocol Buffers</a>. We now have a WSGI container that allows you to host web apps right in the proxy. In addition to this, there is a myriad of new features, bugfixes and other small improvements.</p> There are also changes afoot in the project itself. As a first step, I've moved mitmproxy from the GPLv3 to an MIT license. I hope that this will make it easier for people to use the project in more contexts. Keep an eye out for more changes along these lines soon, geared to broadening participation in the project.</p> Changelog</h2> Upstream certs mode is now the default.</li> Add a WSGI container that lets you host in-proxy web applications.</li> Full transparent proxy support for Linux and OSX.</li> Introduce netlib, a common codebase for mitmproxy and pathod</a>.</li> Full support for SNI.</li> Color palettes for mitmproxy, tailored for light and dark terminal backgrounds.</li> Stream flows to file as responses arrive with the "W" shortcut in mitmproxy.</li> Extend the filter language, including ~d domain match operator, ~a to match asset flows (js, images, css).</li> Follow mode in mitmproxy ("F" shortcut) to "tail" flows as they arrive.</li> --dummy-certs option to specify and preserve the dummy certificate directory.</li> Server replay from the current captured buffer.</li> Huge improvements in content views. We now have viewers for AMF, HTML, JSON, Javascript, images, XML, URL-encoded forms, as well as hexadecimal and raw views.</li> Add Set Headers, analogous to replacement hooks. Defines headers that are set on flows, based on a matching pattern.</li> A graphical editor for path components in mitmproxy.</li> A small set of standard user-agent strings, which can be used easily in the header editor.</li> Proxy authentication to limit access to mitmproxy</li> </ul> Google, destroyer of ecosystems 2013-03-14T00:00:00+00:00 Google has finally shut down a service I actually care about - Google Reader will die a graceless, undignified death on July 1, 2013</a>. The only way Google could inconvenience me more would be to shut down search itself, and yet - I'm not angry that Google is shutting Reader down. I'm furious that they ever entered the RSS game at all. Consider this quote from a TechCrunch article in January 2006</a>. Here, Michael Arrington ends an article about the shutdown of a feed reader service with a statement that seems truly bizarre today:</p> The RSS reader space is becoming hyper competitive, with dozens of different choices for readers.</p> </blockquote> A hyper competitive space with dozens of choices? Reader made its first public appearance a couple of months before this, in October 2005. I remember this period well - it was a time of immense excitement, when RSS seemed to be the future, the news ecosystem was vibrant, and this thing called the blogosphere, fueled by peer subscription, was doubling in size every six months. It was into this magic garden that Google wandered, like a giant toddler leaving destruction in its wake. Reader was undeniably a good product, but it's best quality was also its worst: it was free. Subsidized by Google's immense search profits, it never had to earn its keep, and its competitors started to die. Over time, the "hyper competitive" RSS reader market turned into a monoculture. Today, on the eve of its shutdown, RSS more or less means "Google Reader" to a large fraction of readers, to the extent where even the best feed readers on IOS are just Google Reader clients1</a></sup>.</p> The sudden shock of Reader's closure will harm a news ecosystem that I already believe to be deeply ill</a>. Google Reader is not just a core part of my information diet - it's also the most direct channel I have to readers of this blog. As of today, the Reader subscriber count for corte.si</a> stands at about 3 times the total number of other subscribers combined. Some of these readers will migrate to other services and stay in touch, but many will inevitably abandon the idea of direct subscription to blogs entirely. In the next few months, tens of thousands of small blogs will lose direct contact with a large fraction of their readers.</p> The truth is this: Google destroyed the RSS feed reader ecosystem with a subsidized product, stifling its competitors and killing innovation. It then neglected Google Reader itself for years, after it had effectively become the only player. Today it does further damage by buggering up the already beleaguered links between publishers and readers. It would have been better for the Internet if Reader had never been at all.</p> ^1</sup>Yes, I'm aware that there are a few hardy outliers still playing in this place. My own logs show that their reach is insignificant, though, and when I tried to shift my subscriptions about a year ago, there was nothing as good as Reader itself. Once NewsBlur's</a> servers have recovered, I definitely plan to give it another shot.</p> </div> Things I found on GitHub: aspell custom dictionary entries 2013-02-26T00:00:00+00:00 I've been doing a series of posts looking at data gathered with ghrabber</a>, a simple tool I wrote that lets you grab files matching a search specification from GitHub. Last week, I looked at shell history</a> in the broad, and then specifically at pipe chains</a>. Today, I move on to something different - custom aspell</a> dictionaries. When aspell finds a word it doesn't recognize, the user is prompted to correct it, ignore it, or add it to a custom dictionary so that it will be recognized as correct in future. These words are written to the user's custom dictionary - a file named .aspell_en_pw</strong> that lives in the user's home directory. It turns out that 30 people have checked aspell dictionaries into GitHub, containing a total of 9501 custom words. The chart below shows the top 50 words, with the X-axis showing the percentage of files the word appeared in.</p> </a> </div> There were a few requests for the raw data behind the previous two posts, so this time round you can also download a CSV file</a> with the occurrence totals for each word in the dataset.</p> Things I found on GitHub: pipe chains 2013-02-22T00:00:00+00:00 Earlier this week I published ghrabber</a>, a simple tool that lets you grab files matching an arbitrary search specification from GitHub. I used ghrabber to retrieve all the bash_history and zsh_history files accidentally checked in to repos, and took a light look at the dataset with some simple graphs</a>. In total, I obtained 234 shell history files with 165k individual command entries. This is a very rare opportunity to "shoulder-surf", to actually see what people do</em> at the command prompt, and perhaps get some insights into how to improve things.</p> Along those lines, today's post looks at pipe chains - that is, compound commands that pipe the output of one command to another. The pipe operator lies at the core of the Unix command-line philosophy. The fact that we can easily compose complex operations is the reason why we are able to write small tools that "do one thing well" without losing generality. The shell history data on Github can give us some real data about what people do with composed commands, and how they do it.</p> </a> </div> It turns out that about 2% of all commands issued on the command-line use pipes. The graph above shows the prevalence the most common pipe chains - that is, what percentage of the user in my sample used each chain. There's a lot of fascinating stuff we can read straight from this image.</p> Starting at the top, the first thing we notice is how widely used the ps \| grep</strong> chain is. About 17% of users in my sample used this chain - given the type of data we have, the real-world prevalence would surely be higher still. I've just been extolling the virtues of small tools and composability, but in this case practicality should beat purity. I suggest that everyone should have a command-alias similar to this in their shell configuration:</p> alias</span> pg</span>=</span>"ps aux \| grep"</span></span></code></pre> I've added this to my .zshrc today, and I've already used it twice.</p> Next up, we have the ls \| grep</strong> pipes. The vast majority of uses here could actually be accomplished using the shell's filename generation mechanism. This ranges from simple redundancies like grepping for file extensions, to performing quite complex matching operations that could be done using the shell's advanced glob operations. I'm guilty of this myself - I rarely use features like recursive globbing, expansions using character ranges, case insensitive globbing, and so forth. I've brushed up on filename expansion for my chosen shell</a>, and perhaps you should too.</p> The last thing I want to point out is a pattern that's genuinely dangerous - curl \| bash</strong>, along with its cousins curl \| sh</strong> and wget \| sh</strong>. Unfortunately, this has become the recommended installation pattern for some tool - the vast majority of invocations here are for RVM</a> and Yeoman</a>. I don't think it's a good idea to pipe anything from the web straight into a local shell, but the situation is made particularly dire by the fact that almost half of these invocations are either over plain HTTP or explicitly turn certificate validation off.</p> I'll stop here, although there are interesting things to say about nearly every entry in the graph above. Next week, I'll move on from the shell history sample, look at some other juicy datasets extracted using ghrabber.</p> Things I found on GitHub: shell history 2013-02-19T00:00:00+00:00 Github recently introduced hugely improved code search</a>, one of those rare moments when a service I use adds a feature that directly and measurably measurably improves my life. Predictably, there was soon a flurry</a> of</a> breathless</a> stories about the security implications. This shouldn't have been news to anyone - by now, it should be clear that better search in almost any context has security or privacy implications, a law of the universe almost as solid as the second law of thermodynamics. We saw this with Google's own code search</a>, as well as Google proper</a>, Facebook's Graph Search</a> and even Bing</a>. A certain fraction of people will always make mistakes, and and any sufficiently powerful search will allow bad guys to find and take advantage of the outliers.</p> After the dust had settled a bit I started wondering what else we could do with Github's search - other than snookering schmucks who checked in their private keys. I'm always enticed by data, and the combination of search and the ability to download raw checked-in files seemed like a promising avenue to explore. Lets see what we can come up with.</p> ghrabber</a> - grab files from GitHub</h2> First, some tooling. I've just released ghrabber, a simple tool that lets you grab all files matching a search specification from GitHub. Here, for instance, is an obvious wheeze - fetching all files with the extension ".key":</p> ./ghrabber.py</span> "extension:key"</span></span></code></pre> Downloaded files are saved locally to files named user.repository</strong>. Existing files with the same name are skipped, which means that you can reasonably efficiently stop and resume a ghrab.</p> Shell history files</h2> I've been having a lot of fun exploring Github with ghrabber. I'll return to this in future posts - today I'll start with a quick illustration of what can be done. One type of difficult-to-find information that is sometimes checked in to repos is shell history. Two simple ghrabber commands for the two most popular shells is all we need:</p> ./ghrabber.py</span> "path:.bash_history"</span></span></code></pre> and</p> ./ghrabber.py</span> "path:.zsh_history"</span></span></code></pre> After cleaning the data a bit, I had 234 history files varying in length from 1 line to just over 10 thousand, containing a total of 165k entries. I fed this into Pandas</a> for analysis, parsing each command using a combination of hand-hacked heuristics and the built-in shlex</a> module. The remainder of this post is a light exploration of some approaches to this dataset, steering clear of the obvious and tediously well-covered security implications.</p> </a> </div> One way to slice the data is to look at the percentage of history files a given command appears in. This gives us a nice listing of the top commands by user prevalence, which you can see in the graph on the left above. On the right, I've taken the same list of commands, and checked how many invocations are preceded by a man</strong> lookup for the command. This gives us an idea of which commonly-used commands have difficult or unintuitive interfaces. It's interesting that ln</strong> is right at the top of the list, considering how simple the command syntax is. My theory is that everyone forgets the order of the source and target files.</p> </a> </div> </a> </div> Since we have a list of the most widely used commands, it's also trivial to do silly popularity comparisons. Above is the obvious look at the state of the editor wars (vim is winning, folks), and a check on how tmux</a> is doing in supplanting screen (the faster the better).</p> </a> </div> </a> </div> </a> </div> </a> </div></p> Another interesting thing to do is to look at the most commonly used flags to commands. I think having "real data" of command use may well guide us to design better command-line interfaces. I'd love to know the most common invocation flags for some of the tools I write.</p> I'll stop there. The data pool in this case is very deep, and there are a huge range of interesting bits of command-line ethnography that could be done. Stay posted for more in the coming weeks.</p> The trouble with social news 2013-01-24T00:00:00+00:00 There is something terribly awry with the social news ecosystem. This is a feeling that's been growing on me over the last few years, and is the reason why I've cut both Reddit</a> and Hacker News</a> (who together constitute pretty much all of "social news") out of my information diet. Although I've mulled over things in various conversations, I've never actually tried to put my feeling of unease in writing, until today. What's spurring me into action is a proposal by Yann LeCun</a> that a model similar to social news be adopted for scientific peer review - self-assembled Reviewing Entities voting on streams of submitted papers, regulated by a reputation system for authors and reviewers. Basically, this is science a la Reddit: complete with subreddits, karma and upboats. I find the idea frankly terrifying.</p> I guess it's time, then, to put finger to keyboard and lay out what disquiets me about social news.</p> Karma Corrupts</h2> You start by introducing a reputation mechanism like karma</a> to improve some outcome - say, to increase the quality of comments, or to apply a threshold to restrict voting to trustworthy community members. This seems like a plausible and even elegant mechanism at first, until you discover the terrible side-effects.</p> Humans are fundamentally status-seeking social apes, and you've now introduced a visible measure of social worth that people will be driven to maximize. In the real world, we have a word for those who spend their lives accumulating karma - we call them politicians. And so, within karma communities, we see the rise of a political class - persuasive centrists who cater (perhaps unconsciously) to a constituency, and who express (perhaps eloquently) opinions calculated to appeal to the masses and avoid controversy. Hacker News and many subreddits are dominated by people like this, whose comments are largely predictable and rarely add anything new or unexpected to the conversation.</p> At the bottom end of the food chain, we have a different class of creature with the same basic aim as the politicians, but without the persuasive charm needed to pull off the political approach. These are the karma whores, who use a mixture of frank pandering, provocation and calculated outrage to achieve the same aims.</p> The karma maximization game often acts contrary to the goals we aimed to achieve by introducing karma in the first place: the tenor of the community suffers, the diversity of opinion declines, and the karma whores post pictures of their cats everywhere.</p> The Lossy Sieve</h2> Go and have a look at the new story submission queue</a> on Hacker News. Scroll through a few pages, and pay attention to the stories stuck at one vote - they will most likely never receive another upvote and will die in obscurity. Now, go look at the front page</a>. When I do this exercise I'm struck by the fact that there's plenty of crap on the front page, and quite a bit of good stuff in the submission queue languishing in obscurity. So, quality can't be the sole metric here - what determines what gets onto the front page and what doesn't?</p> Lets try a thought experiment. First, set up a small number of voting accounts - say, 10 or so. Now, in the new submission queue, pick 5 random stories every hour, and give them a small number of upvotes soon after they are submitted. I predict that you will find that stories that received this small initial boost are vastly more likely to end up on the front page. If I'm right, then chance dominates story selection - as long as an article exceeds some basic quality threshold, it all depends on who happens to see the story soon after it is submitted, and whether the spirit moves them to vote. Note that this is not the case at the extremes - frankly bad content won't be upvoted, and really important stories will usually find their way to the top. The lossy sieve phenomenon affects everything in between.</p> What this boils down to is that social news doesn't provide an effective filter - good content gets lost, and mediocre content finds its way onto our screens.</p> The Pinhole Effect</h2> In social news, the front page is king. Most users never go beyond the first or second page of top stories. However, front-page real estate is incredibly limited compared to the volume of submissions on most popular subreddits and on Hacker News. The effect of this is that we're looking at a fast-flowing river of information through a pinhole. Even assuming that the selection mechanism works flawlessly, what you see on the front page is a small sliver of the total, chosen through a consensus mechanism that takes no account of individual variation in tastes and interests. The news you see is not tailored to you</em> - it's tailored to some abstract, average participant, with all the rough edges of individuality smoothed away. The effect of this is that even at its best, the stories that emerge from the social news system feel like a predictable pablum dished up by the hivemind. The subreddit system tries to improve this by allowing communities to self-assemble around interests, but the pinhole effect still dominates in busy subreddits like /r/programming</a>.</p> Gaming The System</h2> Social news systems are eminently gameable, and cheating is rife. Part of the reason for this is that a story's destiny depends on a relatively small number of votes. If your story has any merit at all, you significantly increase the likelihood that it will end up on the front page by giving it a small nudge at the beginning of its life. If it has no merit whatsoever, you can still force it onto people's screens with a few tens or hundreds of votes. Conversely, you can use the same effect to censor and oppress views you disagree with if your social news site has downvotes. Anyone who's kept an eye on these things can rattle off examples of gaming in action: the voting rings</a>, the "social media consultants"</a>, the vigilante thought-polizei</a>, the political operators</a>, and dozens of other types of manipulation and villainy. What's more - these visible scandals are just the tip of the iceberg. Eyeballs are valuable, and there's an active arms race with social news sites on the one side, and a dark army of spammers, scammers and true believers on the other. How much of what we see is affected by this type of cheating? We just don't know, but my suspicion is that the effect is significant.</p> The point here is broader than any particular instance of gaming. It's that social news sites are structurally susceptible to manipulation in ways that can't be fixed without changing the core of their operation. A system like this might be good enough to deliver rage comics</a>, but I feel queasy trusting it any further.</p> Community Collapse Disorder</h2> My final beef with social news is a problem that it shares with pretty much all online communities, especially technical ones. We're all familiar with the life-cycle of technical forums. They start with a small community of insiders who create value, which then attracts more people to participate, which then dilutes the quality of the contributions (and often introduces a few pathological bad actors), which then causes the good contributors to move on, which causes the magic well to dry up. Everyone then take their toys and move to the next community, and the cycle repeats. We saw this with Usenet and the original C2 wiki, and we are seeing it now with Hacker News and many technical subreddits all at various points in this life-cycle.</p> I believe that Community Collapse Disorder is one of the Big Problems online that we don't yet have a satisfactory solution to. People are trying, though. Hacker News, for instance, seems to be rather poignantly aware of its own decline</a>, with some of the best of the old-timers calling for an alternative</a>. Paul Graham himself recognizes the issue, and has been tweaking things in various ways to combat the phenomenon, without much success.</p> At the moment, we just don't know how to build online communities that are both inclusive and stable. Democracy, here, seems to lead inevitably to decline, and social news sites are no exception.</p> A better way forward?</h2> A big part of the reason I don't use social news anymore is that my existing social networks have become so much more effective at turning up good content. The absolute best source of news for me is simply the set of links shared by the folks I follow on Twitter</a>. I follow people who post interesting content, and whom I trust to act as information filters for me. Most of them share my technical interests, but some are interesting because they are from my home town, or because they share some more esoteric pursuit with me. So, the news stream I see is exactly tailored to me. At the same time, there is also room idiosyncrasy - if someone I follow shares something left-field that tickles their fancy, I'll see it. In turn, I try to be a responsible information filter for those who follow me - I find a link or two worth tweeting on most days.</p> There are still things I miss - Twitter is great for sharing links, but is an awful medium for technical discussion. Google+</a> could be a better alternative, but just doesn't seem to have achieved liftoff for me. I would also love better tools for aggregating and harvesting links from my social network. At the moment I use Flipboard</a> and Prismatic</a>, but I have issues with both. On the whole, though, these are quibbles. It seems to me that using social networks to filter news is a better way forward - if I was tackling the social news problem, I'd be building tools to support this process.</p> Go: a nice language with an annoying personality 2013-01-18T00:00:00+00:00 Last week, I had the pleasure of attending Dropbox</a>'s annual company hack fest</a>. It was a great opportunity to get a look at how Dropbox works internally, and mingle with the smart and driven folks who make one of my favourite products. In the spirit of hack week, me and my friend @alexdong</a> decided to do our project in Go. We'd both wanted to explore the language, but had never quite been able to make time - a week-long code holiday seemed to be the perfect opportunity. I was hopeful that Go would turn out to hit a magical sweet spot: a light set of abstractions hugging close to the machine, while still providing the indoor plumbing and civilized conveniences of life that I had grown used to with languages like Python. Five days of furious hacking later, I can report that Go might well deliver on this promise, but has enough annoying personality quirks that I will think twice about basing any more projects on it.</p> My main beef with Go has nothing to do with fundamental language design, and may seem almost inconsequential at first glance. The Go compiler treats unused module imports and declared variables as compile errors. This is great in theory and is something you might well want to enforce before code can be committed, but during the actual process</em> of producing code it's nothing but an irksome, unnecessary pain in the ass. Let's look at a concrete example, starting with a snippet of code as follows 1</a></sup></p> import</span> (</span></span> "</span>io/ioutil</span>"</span></span> )</span></span> ...</span></span> ...</span></span> m, err</span> :=</span> ioutil.</span>ReadFile</span>(path)</span></span> if</span> err</span> !=</span> nil</span> {</span></span> return</span> nil</span>, err</span></span> }</span></span> ...</span></span> ...</span></span> DoSomething</span>(m)</span></span></code></pre> I'm a firm believer that printing stuff to screen is a programmer's best debugging tool, so say we're hacking away and want to print the value of m</strong> while running our unit tests. We change the code as follows, adding an import for the "fmt" module and a call to Print:</p> import</span> (</span></span> "</span>io/ioutil</span>"</span></span> "</span>fmt</span>"</span></span> )</span></span> ...</span></span> ...</span></span> m, err</span> :=</span> ioutil.</span>ReadFile</span>(path)</span></span> if</span> err</span> !=</span> nil</span> {</span></span> return</span> nil</span>, err</span></span> }</span></span> fmt.</span>Print</span>(m)</span></span> ...</span></span> ...</span></span> DoSomething</span>(m)</span></span></code></pre> Now we keep hacking, and want to comment out the print statement for a moment like so:</p> import</span> (</span></span> "</span>io/ioutil</span>"</span></span> "</span>fmt</span>"</span></span> )</span></span> ...</span></span> ...</span></span> m, err</span> :=</span> ioutil.</span>ReadFile</span>(path)</span></span> if</span> err</span> !=</span> nil</span> {</span></span> return</span> nil</span>, err</span></span> }</span></span> //fmt.Print(m)</span></span> ...</span></span> ...</span></span> DoSomething</span>(m)</span></span></code></pre> This is a compile error. We have to switch contexts, move to the top of the module, also comment out the import, and then move back to the spot we're really hacking on:</p> import</span> (</span></span> "</span>io/ioutil</span>"</span></span> //"fmt"</span></span> )</span></span> ...</span></span> ...</span></span> m, err</span> :=</span> ioutil.</span>ReadFile</span>(path)</span></span> if</span> err</span> !=</span> nil</span> {</span></span> return</span> nil</span>, err</span></span> }</span></span> //fmt.Print(m)</span></span> ...</span></span> ...</span></span> DoSomething</span>(m)</span></span></code></pre> A few seconds later, we want to re-enable the Print statement - so up we go again to the top of the module to re-enable the import. This is even worse when we want to, say, comment out the DoSomething</strong> call while hacking:</p> import</span> (</span></span> "</span>io/ioutil</span>"</span></span> )</span></span> ...</span></span> ...</span></span> m, err</span> :=</span> ioutil.</span>ReadFile</span>(path)</span></span> if</span> err</span> !=</span> nil</span> {</span></span> return</span> nil</span>, err</span></span> }</span></span> ...</span></span> ...</span></span> //DoSomething(m)</span></span></code></pre> This is also a compile error because now m</em> is unused. We have to hunt up in our code to find the declaration, which could be explicit or implicit using an :=</strong> assignment. So, in this case we find the declaration, and use the magic underscore name to throw the offending value away:</p> import</span> (</span></span> "</span>io/ioutil</span>"</span></span> )</span></span> ...</span></span> ...</span></span> _, err</span> :=</span> ioutil.</span>ReadFile</span>(path)</span></span> if</span> err</span> !=</span> nil</span> {</span></span> return</span> nil</span>, err</span></span> }</span></span> ...</span></span> ...</span></span> //DoSomething(m)</span></span></code></pre> That should fix it, right? Well, no. It turns out we've previously declared and used err</strong> (a very common idiom), so this is still a compile error. We're using the "declare and assign" syntax, but have no new variables on the left-hand side of the ":=". So we need to make another tweak:</p> import</span> (</span></span> "</span>io/ioutil</span>"</span></span> )</span></span> ...</span></span> ...</span></span> _, err</span> =</span> ioutil.</span>ReadFile</span>(path)</span></span> if</span> err</span> !=</span> nil</span> {</span></span> return</span> nil</span>, err</span></span> }</span></span> ...</span></span> ...</span></span> //DoSomething(m)</span></span></code></pre> Five seconds later, we want to re-enable DoSomething</strong>, and now we have to unwind the entire process.</p> The cumulative effect of all this is like trying to write code while someone next to you randomly knocks your hands off the keyboard every few seconds. It's a pointlessly pedantic approach that adds constant friction to your write-compile-test cycle, breaks your flow, and just generally makes life a little harder for very little benefit. There's no way to turn this mis-feature off, no flag we can pass to the compiler to temporarily make this a warning rather than an error while hacking2</a></sup>.</p> The irony of the situation is that I agree with the sentiment behind this. I don't want dangling variables or imports in my codebase. And I agree that if something is worth warning about it's worth making it an error. The mistake is to confuse the state we want at the conclusion of a unit of hacking3</a></sup>, with what we need at every point in between, during the write-compile-test cycle. This cycle is the core of the process of actually producing code, and the exhilarating sense of weightlessness</a> that you get when hacking in Python is largely due to the fact that the language works really, really hard to optimize this process. Go has given away this feeling of exhilaration, basically for nothing.</p> Despite all this, it's still possible that the benefits of Go do outweigh its irritating personality. Interfaces, memory management, first-class concurrency and static type checking is a knockout combination, and the language in general has something of the taut practicality that I love in C. So, despite the rantiness of this post, I'll keep hacking on our project and make sure I produce a few thousand more lines of code before making a final call on the language. Look for a project release and a blog post along these lines in the coming months.</p> ^{1</sup> Ellipses indicate "an arbitrary amount of intervening code"</p> </div> ^{2</sup> I edited this paragraph a bit for tone. I originally accused the Go documentation of being faintly smug about all of this - which is not fair, and doesn't add anything to the argument.</p> </div> ^{3</sup> Why don't we have a word for this? By "unit of hacking", I mean the work that goes on between starting to hack on a change-set and doing a commit. At the beginning and at the end, the code is in a clean state, but in between there are many periods of transition where cleanliness requirements are relaxed.</p> </div>}}} Released: pathod 0.3 2012-11-16T00:00:00+00:00 I've just released pathod 0.3</a>, which beefs up pathoc</a>'s fuzzing capabilities, improves the spec language and includes lots of bugfixes and other small tweaks. Get it while it's hot!</p> Better fuzzing</h2> A major focus of this release is to improve pathoc</a>'s capabilities as a basic fuzzing tool. I've had fun breaking webservers</a> with pathoc, and it's even come in handy in my Day Job. Here's a quick summary of how things have changed.</p> The -x</strong> flag tells pathoc to explain its requests. This prints out an expanded pathoc query specification, with all randomly generated content and query modifications resolved. If you trigger an exception, you can precisely replay the offending query using this explanation.</li> The options for outputting requests and responses have been expanded hugely. First, the -q</strong> and -r</strong> flags tell pathoc to dump complete records of requests and responses respectively. This data is sniffed by instrumenting the socket, so is canonical regardless of our ability to interpret returned data. The -x</strong> option makes pathod dump this data in hexdump format (otherwise unprintable characters are escaped to preserve your terminal).</li> A number of options have been added to let you ignore expected responses. -C</strong> takes a comma-separated list of response codes to ignore. -T</strong> ignores server timeouts. This lets you hone in on the exceptional responses that you care about, and ignore the rest.</li> </ul> Language improvements</h2> I've simplified response specifications by making the response message a standard component with the "r" mnemonic.</li> I've added the "u" mnemonic to request specifications, as a shortcut for specifying the User-Agent header:</li> </ul> get:/:u"My Weird User-Agent"</span></span></code></pre> We also have a small library of representative User-Agent strings that can be used instead of specifying your own. For example, this specifies the GoogleBot User-Agent string:</p> get:/:ug</span></span></code></pre> The list of available shortcuts are in the docs, and can be listed from the commandline using the --show-uas</strong> flag to pathoc:</p> > ./pathoc --show-uas</span></span> User agent strings:</span></span> a android</span></span> l blackberry</span></span> b bingbot</span></span> c chrome</span></span> f firefox</span></span> g googlebot</span></span> i ie9</span></span> p ipad</span></span> h iphone</span></span> s safari</pre></span></span></code></pre> pathoc: break all the Python webservers! 2012-09-27T00:00:00+00:00 A few months ago, I announced pathod</a>, a pathological HTTP daemon. The project started as a testing tool to let me craft standards-violating HTTP responses while working on mitmproxy</a>. It soon became a free-standing project, and has turned out to be incredibly useful in security testing, exploit delivery and general creative mischief. In the last release, I added pathoc - pathod's malicious client-side twin. It does for HTTP requests what pathod does for HTTP responses, and uses the same hyper-terse specification language</a>.</p> In this post, I show how pathoc can be used as a very simple fuzzer, by finding issues in a number of major pure-Python webservers. None of the tested servers failed catastrophically - they all caught the unexpected exception and continued serving requests. None the less, I think it's reasonable to say that we've triggered a bug if a) the server returns an 500 Internal Server Error response or terminates the connection abnormally, and b) we see a traceback in our logs. In fact, by this definition, I found bugs in every</em> pure-Python server I tested.</p> All of the problems I list below are simple failures of validation - what they have in common is that somewhere in the project code is called with input that it doesn't expect and can't handle. This matters - in fact, I'd argue that the majority of security problems fall in this category. It's interesting to ponder why this type of issue is so ubiquitous in Python servers. I have no doubt that part the answer lies in Python's use of exceptions - errors that would be explicit in other languages can be implicit in Python, and code that seems clean and intuitive might in fact be buggy. I think this is especially relevant right now, given the recent flurry of discussion surrounding the Go language</a> and its error handling. It's pretty instructive to read Russ Cox's recent riposte</a> to this post</a> criticizing Go's explicit approach, while looking at the bugs below. I love Python</a> and I think it's a fine language, but I also think the designers of Go probably made the right choice.</p> Basic fuzzing with pathoc</h2> My methodology for these tests was very simple indeed. I launched each server in turn, and used pathod to fire corrupted GET requests at the daemon until I saw an error. I then looked at the logs, and boiled the distinct cases down to a minimal pathoc specification by hand. This exercises a rather shallow set of features in the server software - mostly parsing of the HTTP lead-in and request headers. It's possible to give software a much, much deeper workout with pathoc, but I'll leave that for a future post.</p> My pathoc fuzzing command looked something like this:</p> pathoc</span> -n 1000 -p 8080 -t 1</span> localhost 'get:/:b@10:ir,"\x00"'</span></span></code></pre> The most important flags here are -n</b>, which tells pathoc to make 1000 consecutive requests, and -t</b>, which tells pathoc to time out after one second (necessary to prevent hangs when daemons terminate improperly). The request specification itself breaks down as follows:</p> get</td> Issue a GET request</td> </tr> /</td> ... to the path / </td> </tr> b@10</td> ... with a body consisting of 10 random bytes </td> </tr> ir,"\x00"</td> ... and inject a NULL byte at a random location.</td> </tr> </table> It's that last clause - the random injection - that makes the difference between simply crafting requests and basic fuzzing. Every time a new request is issued, the injection occurs at a different location. I varied the injected character between a NULL byte, a carriage return and a random alphabet letter. Each exposed different errors in different servers. For a complete description of the specification language, see the online docs</a>.</p> Results</h2> For each bug, I've given a traceback and a minimal pathoc call to trigger the issue. The tracebacks have been edited lightly to shorten file paths and remove irrelevances like timestamps.</p> CherryPy</h3> pathoc</span> -p 8080</span> localhost 'get:/:b@10:h"Content-Length"="x"'</span></span></code></pre>ENGINE ValueError("invalid literal for int() with base 10: 'x'",)</span></span> Traceback (most recent call last):</span></span> File "cherrypy/wsgiserver/wsgiserver2.py", line 1292, in communicate</span></span> req.parse_request()</span></span> File "cherrypy/wsgiserver/wsgiserver2.py", line 591, in parse_request</span></span> success = self.read_request_headers()</span></span> File "cherrypy/wsgiserver/wsgiserver2.py", line 711, in read_request_headers</span></span> if mrbs and int(self.inheaders.get("Content-Length", 0)) > mrbs:</span></span> ValueError: invalid literal for int() with base 10: 'x'</span></span></code></pre>pathoc</span> -p 8080</span> localhost 'get:/:i4,"\r"</span></span></code></pre>ENGINE TypeError("argument of type 'NoneType' is not iterable",)</span></span> Traceback (most recent call last):</span></span> File "cherrypy/wsgiserver/wsgiserver2.py", line 1292, in communicate</span></span> req.parse_request()</span></span> File "cherrypy/wsgiserver/wsgiserver2.py", line 580, in parse_request</span></span> success = self.read_request_line()</span></span> File "cherrypy/wsgiserver/wsgiserver2.py", line 644, in read_request_line</span></span> if NUMBER_SIGN in path:</span></span> TypeError: argument of type 'NoneType' is not iterable</span></span></code></pre>Tornado</h3> pathoc</span> -p 8080</span> localhost 'get:/:b@10:h"Content-Length"="x"'</span></span></code></pre>[E 120927 11:42:26 iostream:307] Uncaught exception, closing connection.</span></span> Traceback (most recent call last):</span></span> File "tornado/iostream.py", line 304, in wrapper</span></span> callback(args)</span></span> File "tornado/httpserver.py", line 254, in _on_headers</span></span> content_length = int(content_length)</span></span> ValueError: invalid literal for int() with base 10: 'x'</span></span> [E 120927 11:42:26 ioloop:435] Exception in callback <tornado.stack_context._StackContextWrapper object at 0x1012e28e8></span></span> Traceback (most recent call last):</span></span> File "tornado/ioloop.py", line 421, in _run_callback</span></span> callback()</span></span> File "tornado/iostream.py", line 304, in wrapper</span></span> callback(args)</span></span> File "tornado/httpserver.py", line 254, in _on_headers</span></span> content_length = int(content_length)</span></span> ValueError: invalid literal for int() with base 10: 'x'</span></span></code></pre>pathoc</span> -p 8080</span> localhost 'get:/:h"h\r\n"="x"'</span></span></code></pre>[E iostream:307] Uncaught exception, closing connection.</span></span> Traceback (most recent call last):</span></span> File "tornado/iostream.py", line 304, in wrapper</span></span> callback(args)</span></span> File "tornado/httpserver.py", line 236, in _on_headers</span></span> headers = httputil.HTTPHeaders.parse(data[eol:])</span></span> File "tornado/httputil.py", line 127, in parse</span></span> h.parse_line(line)</span></span> File "tornado/httputil.py", line 113, in parse_line</span></span> name, value = line.split(":", 1)</span></span> ValueError: need more than 1 value to unpack</span></span> [E ioloop:435] Exception in callback <tornado.stack_context._StackContextWrapper object at 0x1012bd7e0></span></span> Traceback (most recent call last):</span></span> File "tornado/ioloop.py", line 421, in _run_callback</span></span> callback()</span></span> File "tornado/iostream.py", line 304, in wrapper</span></span> callback(args)</span></span> File "tornado/httpserver.py", line 236, in _on_headers</span></span> headers = httputil.HTTPHeaders.parse(data[eol:])</span></span> File "tornado/httputil.py", line 127, in parse</span></span> h.parse_line(line)</span></span> File "tornado/httputil.py", line 113, in parse_line</span></span> name, value = line.split(":", 1)</span></span> ValueError: need more than 1 value to unpack</span></span></code></pre>Twisted</h2> pathoc</span> -p 8080</span> localhost 'get:/:b@10:h"Content-Length"="x"'</span></span></code></pre>[HTTPChannel,4,127.0.0.1] Unhandled Error</span></span> Traceback (most recent call last):</span></span> File "twisted/python/log.py", line 84, in callWithLogger</span></span> return callWithContext({"system": lp}, func, args, kw)</span></span> File "twisted/python/log.py", line 69, in callWithContext</span></span> return context.call({ILogContext: newCtx}, func, args, *kw)</span></span> File "twisted/python/context.py", line 118, in callWithContext</span></span> return self.currentContext().callWithContext(ctx, func, args, *kw)</span></span> File "twisted/python/context.py", line 81, in callWithContext</span></span> return func(args,**kw)</span></span> --- <exception caught here> ---</span></span> File "twisted/internet/selectreactor.py", line 150, in _doReadOrWrite</span></span> why = getattr(selectable, method)()</span></span> File "twisted/internet/tcp.py", line 199, in doRead</span></span> rval = self.protocol.dataReceived(data)</span></span> File "twisted/protocols/basic.py", line 564, in dataReceived</span></span> why = self.lineReceived(line)</span></span> File "twisted/web/http.py", line 1558, in lineReceived</span></span> self.headerReceived(self.__header)</span></span> File "twisted/web/http.py", line 1580, in headerReceived</span></span> self.length = int(data)</span></span> exceptions.ValueError: invalid literal for int() with base 10: 'x'</span></span></code></pre>SimpleHTTP</h2> pathoc</span> -p 8080</span> localhost 'get:"/\0"'</span></span></code></pre>Exception happened during processing of request from ('127.0.0.1', 54029)</span></span> Traceback (most recent call last):</span></span> File "lib/python2.7/SocketServer.py", line 284, in _handle_request_noblock</span></span> self.process_request(request, client_address)</span></span> File "lib/python2.7/SocketServer.py", line 310, in process_request</span></span> self.finish_request(request, client_address)</span></span> File "lib/python2.7/SocketServer.py", line 323, in finish_request</span></span> self.RequestHandlerClass(request, client_address, self)</span></span> File "lib/python2.7/SocketServer.py", line 638, in __init__</span></span> self.handle()</span></span> File "python2.7/BaseHTTPServer.py", line 340, in handle</span></span> self.handle_one_request()</span></span> File "lib/python2.7/BaseHTTPServer.py", line 328, in handle_one_request</span></span> method()</span></span> File "lib/python2.7/SimpleHTTPServer.py", line 44, in do_GET</span></span> f = self.send_head()</span></span> File "lib/python2.7/SimpleHTTPServer.py", line 68, in send_head</span></span> if os.path.isdir(path):</span></span> File "lib/python2.7/genericpath.py", line 41, in isdir</span></span> st = os.stat(s)</span></span> TypeError: must be encoded string without NULL bytes, not str</span></span></code></pre>Waitress</h3> pathoc</span> -p 8080</span> localhost 'get:/:i16," "'</span></span></code></pre>ERROR:waitress:uncaptured python exception, closing channel</span></span> <waitress.channel.HTTPChannel connected 127.0.0.1:62330 at 0x1007ca310></span></span> (</span></span> <type 'exceptions.IndexError'>:list index out of range</span></span> [lib/python2.7/asyncore.py\|read\|83]</span></span> [lib/python2.7/asyncore.py\|handle_read_event\|444]</span></span> [lib/python2.7/site-packages/waitress/channel.py\|handle_read\|169]</span></span> [lib/python2.7/site-packages/waitress/channel.py\|received\|186]</span></span> [lib/python2.7/site-packages/waitress/parser.py\|received\|99]</span></span> [lib/python2.7/site-packages/waitress/parser.py\|parse_header\|158]</span></span> [lib/python2.7/site-packages/waitress/parser.py\|get_header_lines\|247]</span></span> )</span></span></code></pre> Edit: The first version of this post had examples that were due to the test WSGI application, not waitress. I've replaced them with the traceback above, which has been reformatted for clarity.</strong></p> Werkzeug</h3> pathoc</span> -p 8080</span> localhost 'get:/:h"Host"="n\r\0"'</span></span></code></pre>Traceback (most recent call last):</span></span> File "flask/app.py", line 1518, in __call__</span></span> return self.wsgi_app(environ, start_response)</span></span> File "flask/app.py", line 1507, in wsgi_app</span></span> return response(environ, start_response)</span></span> File "/usr/local/lib/python2.7/site-packages/werkzeug/wrappers.py", line 1082, in __call__</span></span> app_iter, status, headers = self.get_wsgi_response(environ)</span></span> File "werkzeug/wrappers.py", line 1070, in get_wsgi_response</span></span> headers = self.get_wsgi_headers(environ)</span></span> File "werkzeug/wrappers.py", line 986, in get_wsgi_headers</span></span> headers['Location'] = location</span></span> File "werkzeug/datastructures.py", line 1132, in __setitem__</span></span> self.set(key, value)</span></span> File "werkzeug/datastructures.py", line 1097, in set</span></span> self._validate_value(_value)</span></span> File "werkzeug/datastructures.py", line 1065, in _validate_value</span></span> raise ValueError('Detected newline in header value. This is '</span></span> ValueError: Detected newline in header value. This is a potential security problem</span></span></code></pre> Limits of data visualization with space filling curves 2012-09-20T00:00:00+00:00 I recently wrote a series</a> of posts</a> using the Hilbert curve</a> to visualize binaries, culminating in a gallery showing regions of high entropy in malware</a>.</p> </a> </div> The fact that the Hilbert curve has excellent locality preservation means that one dimensional features are preserved (as much as they can be) in the two-dimensional layout. This lets us visually pick out features of interest, and makes it possible, for instance, to quickly identify different malware packers just based on their layout characteristics.</p> An obvious next step is to ask if it's possible to extend this idea to let us visually compare binaries, creating a sort of visual diff. Unfortunately, we now bump our heads against the limitations of space-filling curve visualization. I made the animation below after a recent conversation along these lines, and I think it illustrates the main issues nicely. It shows a single contiguous stretch of data (the black area) being shifted progressively through a binary. At each timestep, the only thing that changes is the starting location of the data block:</p> </a> </div> Two things are immediately clear:</p> The block of data doesn't retain its shape at different offsets - identical stretches of data can look totally different depending on their locations.</li> There's no way to quickly see where</em> in the binary a piece of information lies. Unless you are very familiar with the particular curve and know its exact orientation, you can't say, for instance, when the data block lies a third of the way through the binary.</li> </ul> It's often worthwhile to trade off these things for locality preservation, but it definitely scotches certain use cases. I do wonder if it might be possible to tune the trade-off somewhat - sacrificing some locality preservation for better shape retention and offset estimation. I've toyed with some ideas along these lines (see the unrolled layouts in the binary visualization post</a>), but I still don't have a satisfying solution. If anyone out there knows of one, drop me a line.</p> Findng the UDID leak: a guessing game 2012-09-07T00:00:00+00:00 It's become quite a popular parlor game to guess who is responsible for the recent Antisec UDID leak. I've now seen no less than six separate apps named as the probable source (two of which came from Marco Arment</a>). Before we pick the next culprit, I think it's worth taking a step back to consider the list of things we don't</em> know:</p> We don't know that we're dealing with just one source. The Antisec dump may well be an amalgam of data from various sources.</li> We don't know that we're looking for just one app, or even a set of apps by one developer. The leak may well come from one of the myriad of 3rd party services which could be included in thousands of apps.</li> We don't know that Antisec is being truthful about the scale of the database, or the additional data they claim is associated with the UDID/APNS records.</li> We certainly don't know that the data was filched from an FBI laptop or that the NCFTA was in any way involved.</li> </ul> Given all of these unknowns, I think a simple process-of-elimination approach to tracking down the leak will probably be fruitless, or worse, result in the finger being pointed at even more innocent parties. The one entity that may already have the answer to this question is Apple. They have a list of a million affected UDIDs, and they presumably have records of all apps that have ever used the associated push tokens. Given a large and precise sample like this, it should be possible to find the origin(s) of the leak reasonably easily. Indeed, if Apple is on the ball they may already have done this.</p> Now for some frank speculation of my own. Let's assume for a moment that Antisec has been entirely truthful about the data, and that we're dealing with a single source. In that case, we're looking for:</p> ... an app or third-party service integrated into multiple apps</li> ... with 12 million or more users</li> ... that is APNS-enabled</li> ... which also gathers user data like real names and zip codes.</li> </ul> I'll throw my hat in the ring and say that my money is on a third-party service, not a single app. If my hunch is right, the list of possible culprits is actually rather short.</p> The UDID leak is a privacy catastrophe 2012-09-04T00:00:00+00:00 Something I've been worrying about for a long time has just happened: Antisec has leaked a database with more than a million UDIDs</a>. The UDID issue has been a bit of a white whale of mine - I've written many blog posts about it and spent more hours than I care to think negotiating responsible disclosure with companies misusing UDIDs. Let's recap some of the posts I've written about this:</p> In May 2011</a>, just before its sale to Gree was announced, I showed that OpenFeint</a> was misusing UDIDs in a way that allowed you to link a UDID to a user's identity, geolocation and Facebook and Twitter accounts. I didn't discuss it openly at the time, you could also completely take over an OpenFeint account, and access chat, forums, friends lists, and more using just a UDID. This resulted in a class-action lawsuit against OpenFeint, which has since petered out.</li> Later that month</a>, I published a survey looking at how UDIDs are used in practice. The data is now slightly out of date, but shows just how widely UDIDs are used and misused.</li> In September 2011</a>, I published the most troubling news so far, which paradoxically also got the least coverage in the press. I looked at all</em> the gaming social networks on IOS - basically OpenFeint and its competitors - and found catastrophic mismanagement by nearly everyone. The vulnerabilities ranged from de-anonymization, to takeover of the user's gaming social network account, to the ability to completely take over the user's Facebook and Twitter accounts using just a UDID.</li> </ul> As serious these problems are, I'm afraid it's just the tip of the iceberg. Negotiating disclosure and trying to convince companies to fix their problems has taken literally months of my time, so I've stopped publishing on this issue for the moment. It's disheartening to say it, but some of the companies mentioned in my posts still</em> have unfixed problems (they were all notified well in advance of any publication). I will also note ominously that I know of a number of similar vulnerabilities elsewhere in the IOS app ecosystem that I've just not had the time to pursue.</p> When speaking to people about this, I've often been asked "What's the worst that can happen?". My response was always that the worst case scenario would be if a large database of UDIDs leaked... and here we are.</p> Defiler 2012-08-26T00:00:00+00:00 I've been living out of a bag for the last 3 weeks, working hard on a series of intense but fun audits. After running in high gear for a while I find that I need a mental palate cleanser - something to help me refocus and stop me from getting snowblind. I then grab my camera, strap on my macro rig, and walk out the door to try to catch the local wildlife in the act. It's become a bit of a game - the aim is to catch creatures in their natural setting and leave them completely undisturbed when I go, with no posing, prodding or other disturbances. Getting a usable shot of a 5mm target sitting on a twig swaying in the wind is a fun challenge.</p> Today I find myself in Sydney, working in a part of the town that is shot through with unreasonably beautiful walking tracks. The place is also blessed with a huge diversity of invertebrate life that makes my adopted home town</a> seem barren by comparison. I walked along a nearby track until I found a quiet, leafy spot, geared up, and leopard-crawled through the underbrush. Not long after, I came face-to-face with this imposing little chap sitting on the tip of a fern frond.</p> </a> </div> This is a Lymantriid</a> caterpillar of some variety, probably one of the tussock moths native to Australia. "Lymantria" means "defiler" - some species of this family can cause huge damage to foliage, and are considered to be destructive pests. So much so, that when a single male Gypsy Moth</a> (Lymantria dispar) was discovered in Hamilton, New Zealand, they sprayed the entire city with a caterpillar-specific bacterial insecticide</a>.</p> No need for drastic measures with this particular fellow, though - he's native to this ecosystem, and the only pest is me and my camera. He was head down munching away when I found him, and paid absolutely no attention to me when I moved in close to get these shots. He's got reason to be cocksure, too - those tufts of hair on his back contain hollow, poison-filled spines that can cause a pretty unpleasant reaction when touched.</p> </a> </div> An few hours exploring and photographing is a very effective brain-cleaner, leaving me ready to deal with spiny, venomous defilers of the digital variety.</p> pathod 0.2: the daemon gets an evil twin 2012-08-22T00:00:00+00:00 I've just pushed pathod 0.2 out the door. This is a huge release, with many new features:</p> pathoc</a>, pathod's evil client-side twin.</li> libpathod.test</a>, a framework for using pathod in your unit tests.</li> Improved mini language</a>, including many new abilities and improvements.</li> A rewrite of the networking core.</li> </ul> The project also has a new website at pathod.net</a>. Yes, pathod is now self-hosting, so you can try out both pathod and pathoc specifications right on the website. There's also a new public pathod instance</a>, which I'm sure everyone will use entirely responsibly.</p> Introducing pathod: a pathological HTTP server 2012-05-01T00:00:00+00:00 I've just released pathod</a>, a pathological HTTP/S daemon useful for testing and torturing HTTP clients. At its core is a tiny, terse language for crafting HTTP responses. It also has a built-in web interface that lets you play with the response spec language, inspect logs, and access pathod's full help document.</p> The rest of this post is a quick teaser showing some of pathod's abilities. See the detailed documentation on the pathod site</a> if you want more.</p> The simplest possible response</h2> The easiest way to craft a response is to specify it directly in the request URL. Lets start with the simplest possible example. Start pathod, and then visit this URL:</p> http://localhost:9999/p/200</span></span></code></pre> The "/p/" path is the location of the response generator in pathod's default configuration - everything after that a response specification in pathod's mini-language. The general form of a response spec is as follows:</p> code[MESSAGE]:[colon-separated list of features]</span></span></code></pre> In this case, we're specifying only the HTTP response code - that is, an HTTP 200 OK with no headers and no content, resulting in a response like this:</p> HTTP/1.1 200 OK</span></span></code></pre>Specifying features</h2> One example of a "feature" is a response header. Lets embellish our response by adding one:</p> 200:h"Etag"="foo"</span></span></code></pre> The first letter of the feature - "h", in this case - is a mnemonic indicating the type of feature we're adding. The full response to this spec looks like this:</p> HTTP/1.1 200 OK</span></span> Etag: foo</span></span></code></pre> Both "Etag" and "foo" are Value Specifiers, a syntax used throughout the response specification language. In this case they are literal values, as indicated by the fact that they are quoted strings. The Value Specification syntax also lets us load values from files or generate random data. For instance, here is a specification that generates 100k of random binary data for the header value:</p> 200:h"Etag"=@100k</span></span></code></pre> Now, binary data in the header value will probably break things in interesting ways, but is unlikely to be read by the client as a valid (but over-long) value. To see if the client really drops off its perch if we feed it a single 100k header, we have to constrain the random data. Here's the same response, but with data generated only from ASCII letters:</p> 200:h"Etag"=@100k,ascii_letters</span></span></code></pre> pathod has a large number of built-in character classes from which random data can be generated.</p> Pauses and Disconnects</h2> Next, we can disrupt the communications in various ways. At the moment, this means adding pauses and disconnects to a response. Let's start with an HTTP 404 response with a body consisting of a 100k of random binary data:</p> 404:b@100k</span></span></code></pre> Here's the same response, but with a 120 second pause after sending 100 bytes:</p> 404:b@100k:p120,100</span></span></code></pre> And, the same response again, but with hard disconnect after sending 100 bytes:</p> 404:b@100k:d100</span></span></code></pre> Instead of specifying a time explicitly, we can ask pathod to just randomly disconnect at a time of its choosing:</p> 404:b@100k:dr</span></span></code></pre> That's it for the teaser - hopefully it's enough to entice you into looking at pathod</a>'s full documentation.</p> What's next?</h2> pathod is an "airport project" - the first draft was written in its entirety during a 40-hour trip back home from New York (I drew a bad lot in stopovers). I've now firmed it up a bit, but there's still work to be done. In the next month, mitmproxy's test suite will move to pathod, after which there will be a simple, well-documented way to unit test. I also plan to build out the JSON API (which is used to drive pathod in test suites), and expand the mini-language with convenient ways to generate pathological cookies, authentication headers, SSL errors, and cache control.</p> mitmproxy 0.8 2012-04-09T00:00:00+00:00 </a> </div> I'm happy to announce the release of mitmproxy 0.8</a>. This release has a few major new features, big speedups, and many, many small bugfixes and improvements. Here are the headlines:</p> Android interception</h2> The most prominent new feature is that we now have a supported way to intercept Android traffic. What's more, we can do this without a cumbersome transparent proxying rig - see the Android section in the documentation</a> for the details. Special thanks goes to Jim Cheetham</a> for lending me an Android device and helping to get this feature off the ground.</p> Replacement patterns</h2> Another exceedingly useful new feature is replacement patterns</a>. These consist of a filter, a regular expression and a replacement string, and run continuously while mitmproxy processes requests and responses. You can pass these either on the command-line, or using a built-in replacement pattern editor.</p> </a> </div> I'm sure you can immediately think of many uses for this flexible feature, but my favourite is to use it during testing as a way to conveniently inject complicated exploits into web traffic. I do this by setting a replacement pattern that swaps a short but likely unique string (say MYXSS) for a long exploit, and then I use simple interaction and front-end tools like Firebug to inject exploits into requests manually based on the short string marker.</p> Improved pretty-printing of request and response contents</h2> This release of mitmproxy has a completely redesigned subsystem for pretty-printing request and response bodies. For instance, we now extract EXIF tags and other basic information to give you something better than a hex dump when looking at an image:</p> </a> </div> We also have much improved HTML indenting (using lxml</a>), and a built-in JavaScript beautifier (thanks to JSBeautifier</a>) that teases out compressed and obfuscated scripts into something readable.</p> Changelog</h2> Detailed tutorial for Android interception. Some features that land in this release have finally made reliable Android interception possible.</li> Upstream-cert mode, which uses information from the upstream server to generate interception certificates.</li> Replacement patterns that let you easily do global replacements in flows matching filter patterns. Can be specified on the command-line, or edited interactively.</li> Much more sophisticated and usable pretty printing of request bodies. Support for auto-indentation of JavaScript, inspection of image EXIF data, and more.</li> Details view for flows, showing connection and SSL cert information (X keyboard shortcut).</li> Server certificates are now stored and serialized in saved traffic for later analysis. This means that the 0.8 serialization format is NOT compatible with 0.7.</li> Add a shortcut key ("f") to load the remainder of a request or response body, if it is abbreviated.</li> Many other improvements, including bugfixes, and expanded scripting API, and more sophisticated certificate handling.</li> </ul> mitmproxy 0.7 2012-02-27T00:00:00+00:00 </a> </div> I'm happy to announce the release of mitmproxy 0.7</a>. The biggest visible change is a new structured editor for headers, query strings and form fields. Other new feature include a reverse proxy mode, extended script API that makes many common tasks much easier, and a myriad of improvements to the interface (including a massive increase in speed). Everybody still on 0.6 should upgrade - get it here:</p> mitmproxy-0.7.tar.gz</a> (docs)</a></h2> You can also now install mitmproxy using pip</a>, like so:</p> pip</span> install mitmproxy</span></span></code></pre> In other news, the project has had an amazing month, after a rash of high-profile results obtained using mitmproxy were published. It started with Arun Thampi's discovery</a> that Path uploads users' address books to their servers. Things snowballed from there, and for a few days mitmproxy seemed to be everywhere. Similar findings were made for Hipster</a>, The Verge</a> did a mitmproxy-driven AddressbookGate expose (including vaguely threatening background shots of mitmproxy doing its dastardly work), and lots of people said nice things on Twitter.</p> To see the impact all of this for the mitmproxy project, you need only look at the Github page</a> - watchers of the repo went from about 200 a month a go, to 950 at the time of this post.</p> Changelog</h2> New built-in key/value editor. This lets you interactively edit URL query strings, headers and URL-encoded form data.</li> Extend script API to allow duplication and replay of flows.</li> API for easy manipulation of URL-encoded forms and query strings.</li> Add "D" shortcut in mitmproxy to duplicate a flow.</li> Reverse proxy mode. In this mode mitmproxy acts as an HTTP server, forwarding all traffic to a specified upstream server.</li> UI improvements - use Unicode characters to make GUI more compact, improve spacing and layout throughout.</li> Add support for filtering by HTTP method.</li> Add the ability to specify an HTTP body size limit.</li> Move to typed netstrings for serialization format - this makes 0.7 backwards-incompatible with serialized data from 0.6!</li> Significant improvements in speed and responsiveness of UI.</li> Many minor bugfixes and improvements.</li> </ul> OpenBSD in decline? 2012-02-26T00:00:00+00:00 My leisurely Sunday activity today is to set up a new OpenBSD</a> firewall for my mobile app testing lab. I haven't done a from-scratch OpenBSD install for years, so I spent some time reading through the change logs for the last few versions to catch up with what's changed. Although the project is clearly still making steady, well-engineered progress, I had the nagging feeling that the rate of change wasn't what it used to be. So, I pulled some numbers from CVS commit message list archives</a>, and graphed them. Here are the number of commits per month from January 2001 to January 2012. The orange line is a simple 12-month moving average:</p> </a> </div> Now, we should be cautious about interpreting this - the number of commits doesn't tell us anything about the quality, importance or magnitude of code change. Even if it did all of these things, there are other and perhaps better measures of a project's health. Still, the trend is clear, and suggests a sustained decline in activity.</p> I just bought some T-shirts</a> to help support one of my favourite open source projects. You should too.</p> Malware 2012-01-05T00:00:00+00:00 Edit: Since this post, I've created an interactive tool for binary visualisation - see it at binvis.io</a></b></p> Hover and click for more.</p>}

corte.si

Spacecurve

2026-01-28T00:00:00+00:00

In 2024, I noticed that I'd let my blog languish. Since the issue was urgent, I made a firm new year's resolution to address the situation in 2025. Which is why, today, in January 2026, I'm writing this post.

I've just released spacecurve</a>, a new just-for-fun space-filling curve project. It's is the latest symptom of a long preoccupation with these beautiuful mathematical objects. Over the years, this preoccupation yielded blog posts like malware visualisations</a>, a portrait of the Hilbert curve</a> and tools like binvis.io</a>. I have a long list of related ideas I never got to but have to wanted to explore, and the first step is naturally to... rewrite it in Rust. This is just a starting point, a base for exploring ideas I have about visualisation, color spaces, and the qualities of the curves themselves.

As part of the rewrite we now have fast base implementations of the curves themselves in the spacecurve</a> library, and a visual exploration tool for 2D and 3D curves in the scurve</a> command-line tool. Thanks to egui</a>, the visualiser runs both natively and in the browser.

Click through on the images below to see the web version.

</a> </div> </div>

Installation</h2>
spacecurve is a Rust library for generating a variety of space-filling curves, including Hilbert, Peano, Sierpinski, Moore, and Z-order curves.
`cargo add spacecurve</code></pre> scurve is a command-line tool for generating and visualizing space-filling curves.`
`cargo install scurve</code></pre> It includes an egui interface for exploring the curves in 2D and 3D, which you can run like this:`
`scurve gui</code></pre>`
spacecurve web</h2>
Because egui supports webassembly, I've also deployed the egui app to the web. Access it by clicking below, or on any of the images above.
Web Viewer</a>

Generative zoology with neural networks

2020-06-30T00:00:00+00:00

A couple of years ago a paper titled Progressive Growing of GANs for Improved Quality, Stability, and Variation</a> cropped up on my reading list. It describes growing generative adversarial networks</a> progressively, starting with low-resolution images, and then building up more detail as training goes on. It got quite a bit of press at the time because the authors used their idea to generate realistic, unique images of human faces.

</a>
Representative images from the Progressive GANs repo</a> </div> </div>
Looking at these images, it seems like the neural net would have to learn a vast number of things to be able to do what these networks were doing. Some of this seems relatively simple and factual - say, that eye colours should match. But other aspects are fantastically complex and hard to articulate. For instance, what nuances are needed to link the configuration of eyes, mouth and skin creases into a coherent facial expression? Of course, I'm anthropomorphising a statistical machine here, and we may be fooled by our intuition - it could turn out that there are relatively few working variations, and that the solution space is more constrained than we imagine. Maybe the most interesting thing is not the images themselves, but rather the uncanny effect they have on us.
Some time later, a favourite podcast of mine</a> mentioned PhyloPic</a>, a database of silhouette images of animals, plants and other lifeforms. Musing along the lines above, I wondered what would result if you trained a system like the one in the Progressive GANs paper on a very diverse dataset of this sort. Would you just generate many variations of a few known animal types, or would there be enough variation to do neural-network driven speculative zoology</a>? However things played out, I was pretty sure I would get a few good prints for my study wall out of it, so I set out to satisfy my curiosity with an attitude of open-minded experimentation.
</video> </a>
Training from random noise to competence </div> </div>
I adapted the code from the progressive GANs paper</a>, and trained a model for 12000 iterations using a Google Cloud instance with 8 NVIDA K80 GPUs over the complete PhyloPic dataset. Total training time, including some false starts and experiments, was 4 days. I used the final trained model to produce 50k individual images, and then spent hours poring over the results, categorising, filtering and collating images. I also did some light editing by flipping images to orient creatures in the same direction, because I found this a bit more visually satisfying. This hands-on approach means that what you see below is a sort of collaboration between me and the neural net - it did the creative work, and I edited.
</a>
Flying insects </div> </div>
The first surprising thing to me was how aesthetically pleasing the results were. Much of this is certainly a reflection of the good taste of the artists who produced the original data. However, there were also some happy accidents. For instance, it seems that whenever the neural net enters uncertain territory - whether it be fiddly bits that it hasn't quite mastered yet or complete flights of vaguely biological fantasy - chromatic aberrations begin to enter the picture. This is curious, because the input set is entirely in black and white, so colour cannot be a learned solution to some generative problem. Any colour must necessarily be a pure artefact of the mind of the machine. Delightfully, one of the things that consistently triggers chromatic aberrations are the wings of flying insects. This means that it generated hundreds and hundreds of variations of evocatively-coloured "butterflies" like the ones above. I wonder if this could be a useful observation - if you train using only black-and-white images, but demand output in full colour, splotches of colour might be a useful way to see where the model is still not able to accurately represent the training set.
The bulk of the output is a huge variety of entirely recognisable silhouettes - birds, various quadrupeds, reams of little gracile theropod dinosaurs, sauropods, fish, bugs, arachnids and humanoids.
</a>
Birds </div> </div>
</a>
Quadrupeds </div> </div>
</a>
Dinosaurs </div> </div>
</a>
Fish </div> </div>
</a>
Bugs </div> </div>
</a>
Hominids </div> </div>
Stranger things</h2>
Once the known critters have been weeded out, we get to stranger things. One of the questions I had going into this was whether plausible animal body plans that don't exist in nature would emerge - perhaps hybrids of the creatures in the input set. Well, with careful search and a helpful touch of pareidolia, I found hundreds of quadrupedal birds, snake-headed deer and other fantastical monstrosities.
</a>
Monstrosities </div> </div>
Straying even further into the unkown, the model produced weird abstract patterns and unidentifiable entities, all with a vaguely biological, "life-ish" feel to them.
</a>
Abstract </div> </div>
</a>
Unidentifiable </div> </div>
A random sample</h2>
What doesn't come through in the images above is the sheer abundance of variation in the results. I'm having a number of these image sets printed and framed, and the effect of hundreds of small, detailed images side by side at scale is quite striking. To give some idea of the scope of the full dataset, I'm including one of these prints below - this one is a random sample from the unfiltered corpus of images.
</a> </div>

Some personal thoughts on our national tragedy

2019-03-19T00:00:00+00:00

</a>
Outside the Al Huda Mosque near my home (by Mark McGuire</a>) </div> </div>
A year ago, my wife and I decided to become citizens of New Zealand. Both of our sons were born here and are full, native Kiwis. It felt odd for our family not to have this in common, and besides, our own connection with New Zealand had grown strong over the happy decade we'd lived here. It was time to take the plunge. Forms were filled in, interviews were held, and we were were notified that our citizenship ceremony would be on the 8th of February, 2018.
On the day, we were ushered into a hall with a podium and rows of slightly uncomfortable stackable chairs. By the time we arrived it was already full of our fellow soon-to-be Kiwis, along with their friends and family. Boisterous children resisted the shushing of their parents, and there was a bit of raucous running up and down the aisles. Nobody minded. The mood was friendly, expectant, and happy. We took our seats next to a young Chinese couple, and behind a family from the UK. Many were wearing splendid traditional dress from their countries of origin - Tongan, Chinese, Thai, Indian. I myself wore a business suit, something I only do under duress. The man in front of me's stiff posture and occasional collar-stretching finger showed I wasn't alone. We were all there with common purpose - because we felt the need for a deeper commitment to our home, and perhaps a deeper sense of acceptance in turn.
A dapper, splendid-mustached gentleman took his place at the podium, and the hall became silent. He began the kind of speech you would expect: a speech of welcome, about the rights and duties of citizenship, about the solemnity of the moment. It was at this point, in that stuffy hall, in the middle of a somewhat monotonous civil ceremony, that I was suddenly aware of a profound connection with the people around me. I felt, with complete clarity, a golden thread linking me to my wife, to the couple next to us, to the gent running the ceremony, extending outwards to everyone in the room. I felt the presence of generations of parents, stretching back in time, working to better the lives of their families, all their individual journeys leading us here, to this hall at this time. Most of all, I felt the presence of our children - all our children, the children in the room and my children, and their children, and their children's children, all joined, facing the unknowable future. This built to a sort of vision: a great, thronging, thrusting, golden river of humanity, meandering over a dark background. All of us together, everyone that has ever lived and everyone that ever will, shining ties binding us together each to each, all pushing ever forward in humanity's common project. For a moment between breaths, I was in touch with something transcendent, cosmically larger than me, yet something of which my own small fleck of personhood was a necessary part.
Afterwards, people congregated in happy, smiling groups, shaking hands and hugging, having their first conversations as full citizens. I slipped out the door at the back of the hall. My wife, who knows me best, followed, holding my hand and laughing with kind-hearted amusement at how moist-eyed and emotional I was.
That moment in the hall came back to me when I first read about the atrocity in Christchurch. I saw again the open, friendly, hopeful faces of my freshly-minted fellow citizens. I felt again the web of love that connects us all in fundamental unity. And I was suffused with an aching and overwhelming grief. Grief for the victims and their families, my countrymen and countrywomen. But grief also that anyone could have a conception of humanity so small, so narrow, and so mean as to lead to an act like this.
In the coming weeks I'll be doing my part in the business of reckoning with our national tragedy, using the tools I have - code, data, and technology. We can do much with these, but we can't go all the way. The real work will be to look again at the human aspect our online communities, which, it has become terrifyingly clear, have become an obstacle to recognising our common purpose.

mitmproxy v1.0.0: Christmas Edition

2016-12-26T00:00:00+00:00

</a> </div>
Six years after mitmproxy's first checkin, we've finally released version 1.0.0 of the project. Our version numbering persisted below 1.0 well into the project's maturity, for reasons that are a tad difficult to explain. My mental model of software development is of an eternal pilgrimage - the roadmap of possible improvements stretches on forever, and we never quite reach a point where we look back and feel that we've arrived. From this perspective, it makes sense for 1.0 to always be out of reach. Rather than adopting more transcendental options</a>, I've stuck with simply incrementing the minor version with each release. This release sees two changes in our process. First, we're committing to a much more regular cadence, aiming for a new release every two months or so (with minor bugfix and patch releases in between). Second, each of these releases will see a major version number increment - this is v1.0, we'll release v2.0 by the end of February, and so forth. This retains something of the flavor of our previous eccentric version numbering strategy by de-emphasizing major version increments as flagfall events, without being as restrictive. Let the pilgrimage continue.
The project's momentum continues to be excellent - since the last release, we've had 459 commits by 10 contributors, resulting in 104 closed issues and 172 closed PRs, all in just over 70 days. All this activity has resulted in a number of very significant developments.
Over the last year, we've done a huge amount of work converting the project from Python 2 to Python 3. Our previous release straddled the two versions, retaining compatibility with Python 2.7. This release is strictly Python3-only. We are now well positioned to take full advantage of things like optional type checking, the new asyncio module and the many small and large interface improvements that Python 3 brings.
Our user interfaces continue to improve by leaps and bounds. The console interface now has a much cleaner core, sports a number of new features like flow ordering, and has seen significant speed improvements. We're also finally releasing something we've been cooking up for quite a while - mitmweb, a web interface to mitmproxy. It's doesn't have feature parity with the console tool yet, but we feel it's ready to step onto the stage as one of our primary interfaces. Since mitmproxy console doesn't run on Windows (yet), mitmweb is the best GUI option for our Windows users for now. We're also improving our distribution mechanisms on Windows, with a new installer package kindly provided by BitRock</a>. These two developments together mean much better support for our Windows users.
At a protocol level, we're happy to announce that our support for Websockets is now mature, and enabled by default. For the moment, the best way to interact with Websockets traffic is to use our scripting mechanism - we will have support in the GUIs very soon. On the HTTP/2 front, the news is mixed. We're very happy with the quality of our own implementation of the protocol, but we've discovered that some server implementations still have problems with certain protocol edge cases. Over the last few months we found multiple bugs affecting some very prominent websites and CDNs. We are working closely with the affected companies to get these issues fixed - but big wheels turn slowly, especially when it comes to business-critical infrastructure, and all the needed repairs haven't been rolled out yet. This has left us in a bit of a quandary - we know that fixes for these issues are imminent, and we believe that the particular problems are idiosyncratic and shouldn't prompt a redevelopment of our core to make us bug-for-bug compatible. None the less, the effect is that mitmproxy's HTTP2 implementation will currently do unexpected things when talking to large sites like Twitter and Reddit. We've decided to disable HTTP/2 by default for this release - you can explicitly re-enable it using the --http2 flag.
Finally, if you're interested in hacking on mitmproxy, now is an excellent time to join us. Contributing is simple - pick one of the issues that we've tagged as good first contributions</a>, join us on Slack</a> to discuss your approach, and then send a PR.
Changelog</h2>

All mitmproxy tools are now Python 3 only! We plan to support Python 3.5 and higher.</li>
Web-Based User Interface: Mitmproxy now offically has a web-based user interface called mitmweb. We consider it stable for all features currently exposed in the UI, but it still misses a lot of mitmproxy’s options.</li>
Windows Compatibility: With mitmweb, mitmproxy is now useable on Windows. We are also introducing an installer (kindly sponsored by BitRock) that simplifies setup.</li>
Configuration: The config file format is now a single YAML file. In most cases, converting to the new format should be trivial - please see the docs for more information.</li>
Console: Significant UI improvements - including sorting of flows by size, type and url, status bar improvements, much faster indentation for HTTP views, and more.</li>
HTTP/2: Significant improvements, but is temporarily disabled by default due to wide-spread protocol implementation errors on some large website</li>
WebSocket: The protocol implementation is now mature, and is enabled by default. Complete UI support is coming in the next release. Hooks for message interception and manipulation are available.</li>
A myriad of other small improvements throughout the project.</li> </ul>

mitmproxy v0.18

2016-10-17T00:00:00+00:00

We've just released mitmproxy v0.18</a>! Since the last release, the project has had 1399 commits by 40 contributors, resulting in 217 closed issues and 305 closed PRs, all of this in just over 189 days.
This release is notable for a number of reasons.
First, it contains significant contributions from our three excellent GSOC</a> students this year. Shadab Zafar worked on Python 3 compatibility and a number of aspects of mitmproxy's core. Clemens Brunner and Jason Hao made major improvements to mitmweb, the upcoming web-based interface to mitmproxy. We loved working with these guys, and hope that they will continue to hack on mitmproxy.
Second, the project has seen some significant internal reorganisation. Previously, we were split over three separate repositories (mitmproxy, netlib and pathod). Over time, the practical headaches of keeping everything synchronised started taking a toll, and we decided to amalgamate it all in a single repo. The most immediate external effect is that installing mitmproxy (through, say, "pip install mitmproxy") now gets you all of the associated tools and libraries, including pathod and pathoc.
Finally, 0.18 will be the last major version of mitmproxy compatible with Python 2. The next release will target Python 3.5 only, with all of the 2/3 compatibility cruft stripped out. This is not a decision we took lightly - we have a significant community of developers that have tools based on mitmproxy, and we realise this might be painful for some of them. We feel that being able to use the full features of Python 3.5 will make the transition worth it. If you have a library or tool based on mitmproxy, you should start planning for a conversion now. We'd be very happy to help you navigate the transition, so feel free to drop by the Slack channel</a> to chat to the dev team.
Changelog</h2>

Python 3 Compatibility for mitmproxy and pathod (Shadab Zafar, GSoC 2016)</li>
Major improvements to mitmweb (Clemens Brunner & Jason Hao, GSoC 2016)</li>
Internal Core Refactor: Separation of most features into isolated Addons</li>
Initial Support for WebSockets</li>
Improved HTTP/2 Support</li>
Reverse Proxy Mode now automatically adjusts host headers and TLS Server Name Indication</li>
Improved HAR export</li>
Improved export functionality for curl, python code, raw http etc.</li>
Flow URLs are now truncated in the console for better visibility</li>
New filters for TCP, HTTP and marked flows.</li>
Mitmproxy now handles comma-separated Cookie headers</li>
Merge mitmproxy and pathod documentation</li>
Mitmdump now sanitizes its console output to not include control characters</li>
Improved message body handling for HTTP messages:

.raw_content provides the message body as seen on the wire</li>
.content provides the decompressed body (e.g. un-gzipped)</li>
.text provides the body decompressed and decoded body</li> </ul> </li>
New HTTP Message getters/setters for cookies and form contents.</li>
Add ability to view only marked flows in mitmproxy</li>
Improved Script Reloader (Always use polling, watch for whole directory)</li>
Use tox for testing</li>
Unicode support for tnetstrings</li>
Add dumpfile converters for mitmproxy versions 0.11 and 0.12</li>
Numerous bugfixes</li> </ul>
Contributors for this release</h2>

Aldo Cortesi</li>
Angelo Agatino Nicolosi</li>
BSalita</li>
Brett Randall</li>
Christian Frichot</li>
Clemens Brunner</li>
Cory Benfield</li>
Doug Freed</li>
Drake Caraker</li>
Felix Yan</li>
Israel Blancas</li>
Jason</li>
Jason Pepas</li>
Jonathan Jones</li>
Kostya Esmukov</li>
Linmiao Xu</li>
Manish Kumar</li>
Maximilian Hils</li>
Ryan Laughlin</li>
Sachin Kelkar</li>
Sanchit Sokhey</li>
Schamper</li>
Shadab Zafar</li>
Steven Noble</li>
Steven Van Acker</li>
Tai Dickerson</li>
Thomas Kriechbaumer</li>
Tyler St. Onge</li>
Vincent Haupert</li>
Wes Turner</li>
Yoginski</li>
Zohar Lorberbaum</li>
arjun</li>
chhsiao</li>
jpkrause</li>
phackt</li>
redfast</li>
smill</li>
strohu</li>
vulnminer</li> </ul>

Hobbes

2016-03-22T00:00:00+00:00

</a> </div>
Eight years ago my wife and I walked into the Cat Protection Society</a> near our house in Sydney on a whim - just to look, we assured each other, and most definitely not to get another cat. Thirty minutes later we emerged with a box containing a tiny ball of scraggly orange fluff, a wee kitten we immediately named Hobbes. Circumstances had taken Hobbes away his mother far too early, and since I was able to work from home at the time the job of playing surrogate largely fell to me. I fed him, let him perch on my shoulder like a fluffy little malodorous parrot while I worked, and cleaned him with a cotton bud after his inept attempts to use the litter tray. He grew from a tiny scrap to a mischievous and energetic kitten, and then to a somewhat slothful but very handsome boy. Perhaps because he came to us so young, Hobbes never got on with other cats. He preferred the company of humans, and considered himself to be as much of a person as anyone else. The photo above is him in his natural habitat: draped bonelessly over my lap like a purring orange throw-rug, just being part of whatever conversation his humans are having.
About a year ago, Hobbes started losing weight. Truth be told shedding a few pounds would probably have done him good, but this was unexplained by any change in his diet. After a series of X-rays and a biopsy we got bad news: he had lymphoma. With chemotherapy he would have a year or so of high-quality life left, but likely not much more. Apart from giving him his daily pills, there was not much we could do. We treated him to his favorite food as often as seemed sensible, and watched carefully for the moment when the scales tipped and discomfort outweighed the joy in his life.
This morning Zoe and I took Hobbes to the vet one last time. He always hated being in the cat carrier, and would pace, tense and wide-eyed, ready to spring out like a jack-in-the-box when we opened the door. Today, he just seemed tired and sore, huddled motionlessly in an uncomfortable-looking crouch. We held him together as the vet gave him two injections - one to send him gently to sleep, and shortly after, another to stop his heart. Afterwards we brought him home and buried him under a cherry tree in our garden. Perhaps when spring comes, it will flower orange.
Goodbye, Hobbesy. Your family will miss you. You were a good, good boy.

modd: a flexible tool for responding to filesystem change

2016-02-11T00:00:00+00:00

I've just released modd</a>, a new 1</a> project of mine. Like its sister project devd</a>, it's distributed as a single, self-contained binary for all major platforms - get it while it's fresh</a>.
Modd is a simple tool that's hard to explain pithily. It triggers commands and manages daemons in response to filesystem changes - but that is a technically-correct mouthful that doesn't really convey how it is used. Part of the problem is that it is extremely flexible. In my projects it runs linters, does live code compiles, manages infrastructure daemons like databases, runs test instances of projects and is even rendering and live-reloading this blog post as I type. Modd replaces parts of tools like Gulp</a>, Grunt</a>, Foreman</a> and make, but it can also augment them. For instance, one of my projects is entirely driven by a Makefile, with tasks invoked by modd on change.
At modd's core is a a file change detection library that tries to get things right for most developer work patterns. It handles temporary files, VCS directories and many pathological behaviors shown by common editors</a> correctly (or at least tries really hard to). The change detection algorithm waits for a lull in activity, so that jobs aren't triggered in the middle of progressive processes like renders and compiles that may touch many files. The result is change detection that is less surprising and more consistent than similar projects out there. The output of the change detection algorithm is then hooked up to a very flexible way to specify commands and manage daemons, letting you specify shell scripts that trigger on file match patterns in a single config file. Finally, there are a few mod-cons. A custom terminal logging module</a> lets modd sensibly interleave the output of possibly concurrent daemons and commands, with headings showing which command was responsible for what. Modd also has support for desktop notifications (Growl</a> on OSX, libnotify</a> on Linux), letting you see things like linter output and compile editors immediately.
Below, I'm going to show one quick example of how I use modd to do a live build/compile cycle for devd</a>, a pretty standard Go project. In a future post, I'll show how I've replaced Gulp entirely for a Javascript-heavy front-end project.
Please see the modd documentation</a> for a complete explanation of the syntax and for more examples.
Test-compile cycle for Go</h2>
On startup, modd looks for a file called modd.conf in the current directory. This file has a simple but powerful syntax - one or more blocks of commands, each of which can be triggered on changes to files matching a set of file patterns. Commands have two flavors: prep commands that run and terminate (e.g. compiling, running test suites or running linters), and daemon commands that run and keep running (e.g databases or webservers). Daemons are restarted when their block is triggered, after all prep commands have run successfully. Commands are embedded shell scripts, so shell features like redirection work, and compound, multi-step commands are common.
Here is the simple modd.conf I use to drive the test cycle for devd</a>:
**/*.go { prep: go test @dirmods } **/*.go !**/*_test.go { prep: go install ./cmd/devd daemon +sigterm: devd -ml ./tmp }</code></pre> After the modd command, the commands execute for the first time, and modd is then ready to respond to changes. The initial output looks like this:
</a> </div>
The config file does three things:

When any .go file changes, it runs "go test" on the affected module.</li>
When a non-test file changes, it compiles and installs devd.</li>
It keeps a test instance of the devd daemon running, and restarts it with a SIGTERM when needed.</li> </ul>
The one subtlety here is the @dirmods tag, which is replaced with a shell-escaped list of all directories that contain modified files. There's a similar tag - @mods - that is replaced with all matching modified files. When first run, both of these tags are replaced by all possible matches - that is, all directories containing matching files, and all matching files respectively. This means that the test suite for all the Go modules in the project is run on startup, and only for modified modules after that.
^1In fact, this is release v0.2</a>, which slipped in before I had time to announce v0.1 on my blog. </div>

mitmproxy v0.15

2015-12-04T00:00:00+00:00

</a> </div>
We've just released mitmproxy 0.15</a>. This is primarily a bugfix release, but with a few really juicy long-demanded features thrown in:

Support for loading and converting older dumpfile formats (0.13 and up)</li>
Content views for inline script (@chrisczub)</li>
Better handling of empty header values (Benjamin Lee/@bltb)</li>
Fix a gnarly memory leak in mitmdump</li>
A number of bugfixes and small improvements</li> </ul>
Behind the scenes, there has been a bunch of other exciting developments. The effort to port mitmproxy and its underlying libraries to Python3 continues apace. Our automated build and testing infrastructure has improved hugely - we now have up-to-date binary snapshots built for each commit</a>.
Thanks to all the contributors who helped get this release out the door, and, as usual, special thanks to my invaluable co-maintainer Max</a>, who's been steering things while I've been kept busy with other things.

Trawling Github for cookies, bookmarks and browsing history

2015-11-26T00:00:00+00:00

It's a universal rule that search over a sufficiently large body of user data poses security challenges. This follows naturally from the fact that humans - even smart, informed, careful humans - occasionally slip up. Given enough data, and the ability to pick out slip-ups with search, there will always be rich pickings for a malefactor. I wrote a short series of posts a while ago about interesting things I found on Github - commands from shell history files</a>, common pipe chains</a>, and words from custom spell-check dictionaries</a>. While shell history files could definitely contain very sensitive information, in practice there were only a handful of really damaging issues in the dataset. Trawling around people's dotfile directories, I found that something much more damaging often made it into repos: browser state. It's easy to see how this could happen - it takes just one injudicious add of a hidden directory to expose cookies, browser history, bookmarks and more. I decided to return to this issue later, and it slipped off my radar until recently.
When I wrote the first series of posts, I also released a tiny tool called ghrabber</a> (just a hack, really) that lets you grab files from Github en-masse using a Github code search query. The first thing I noticed when I picked it up again is that it no longer worked as expected. I used to be able to retrieve all files matching a path, like so:
ghrabber.py "path:.bash_history"</code></pre>
Today, this returns an error - Github now requires you to specify both a search term and a path1</a>. There are all sorts of possible explanations for this change, but I like to think that it's meant to prevent (or at least impede) exactly the kind of trawling I've been amusing myself with.
Let's say we want to search for Firefox browser profile cookies. These are stored in a SQLite file called "cookie.sql". Github doesn't index binary files for search, so we can't search for characteristic content in the file. Path specification is broken, so we can't search for the filename. Stumped, right? Not so fast - the cookie files live in a directory with a large number of associated non-binary files. If we could come up with a signature for one of these accompanying files, then we could download a path relative to the match to retrieve the cookie storage file itself. I quickly added a flag to do exactly this to ghrabber</a>, and cooked up appropriate query strings to detect Firefox and Chrome browser profiles. I'll elide those here, for obvious reasons.
A look at the data</h2>
The result was 708 distinct browser profiles that included 33 364 bookmarks, and 88 013 cookies. Many of these profiles are actually intentional checkins - testing trusses, blank profiles and so forth. However, some totally unscientific manual sampling indicates that just less than half of these are probably genuine accidental checkins, containing private information.
Let's take a light, high-level look at the data. The figure below shows the percentage of profiles with cookies from each TLD:
Percentage of profiles with cookies from domain</figcaption> </figure>
As expected, the stats here are dominated by the mega-trackers that infest almost every site on the internet - a familiar cast of rogues including DoubleClick, Scorecard Research, Quantserve and so forth. It's sad to see how few domains here are genuine destinations - apparently the top sites for this sample are Google, YouTube, Github (not unexpectedly), and Twitter.
Next up is the percentage of profiles with bookmarks for a given domain:

Percentage of profiles with bookmarks for domain</figcaption> </figure>
Here, the top domains are those pre-seeded on install, particularly with Firefox. This explains the Mozilla domains as well as ubuntu.com, debian.org and launchpad.net. Once we're outside of this list, the "genuine destinations" match the cookie dataset quite well - YouTube, Github, Wikipedia, and so forth.
A difficult situation</h2>
The surprise here is not that people accidentally check sensitive information into git repos. The real surprise is just how much of a pain in the butt it was to responsibly address the issue. At the end of this little experiment, I had more than 700 repositories that potentially contained sensitive, accidentally exposed user information. It beggars belief, but it's 2015 and the most popular repository hosting service in the world has no way to privately report a bug against a repo</a>. One could create a public bug report for each repository in question - but that would be like hanging out a neon sign saying "privacy issue here" for others to find, particularly since bug reports are published in a user's activity stream.
In the end, I decided to directly notify as many people as I could by email. So, I wrote a script that checked each affected user's profile for an email address. That left me with 120-odd users with contact details. I manually whittled these down to repositories that were obviously accidental checkins and sent them each an email, resulting in a dozen or so responses with variations on "oops, thanks for letting me know".
Hey Github!</h2>
I have two recommendations for Github that would make this situation vastly, vastly better:

Add a mechanism that lets users report private bugs, visible only to the repo owners. There's just no excuse for the lack of a feature like this. </li>

Consider restricting search functionality somewhat. One option would be not to index dotfiles (.*) by default, and perhaps let users opt in to dotfile indexing on a per-repo basis. The vast majority of accidental checkins are either within dotfiles (shell history, for example), or within directories that start with leading dots (browser history, ssh config) </li> </ul>
^{1
In fact, Github search path specifications seem to be broken now in a
more general way, but that's beside the point for this post.
</div>}

devd v0.3

2015-11-12T00:00:00+00:00

</a> </div>
I've just released devd 0.3</a> - a measured increment, with a modest set of bugfixes and new features. This is inline with my broad plan to keep devd a small, dependable, and focused tool.</a> Everyone should update.

-s (--tls) Generate a self-signed certificate, and enable TLS. The cert bundle is stored in ~/.devd.cert</li>
Add the X-Forwarded-Host header to reverse proxied traffic.</li>
Disable upstream cert validation for reverse proxied traffic. This makes using self-signed certs for development easy. Devd shoudn't be used in contexts where this might pose a security risk.</li>
Bugfix: make CSS livereload work in Firefox</li>
Bugfix: make sure the Host header and SNI host matches for reverse proxied traffic.</li> </ul>

mitmproxy: release v0.14

2015-11-07T00:00:00+00:00

</a> </div>
We've just released mitmproxy 0.14</a>! Since the last release, the project has had 399 commits by 13 contributors, resulting in 79 closed issues and 37 closed PRs, all of this in just over 100 days.

Docs: Greatly updated docs now hosted on ReadTheDocs</a></li>
Docs: Fixed Typos, updated URLs etc. (Nick Badger, Ben Lerner, Choongwoo Han, onlywade, Jurriaan Bremer)</li>
mitmdump: Colorized TTY output</li>
mitmdump: Use mitmproxy's content views for human-readable output (Chris Czub)</li>
mitmproxy and mitmdump: Support for displaying UTF8 contents</li>
mitmproxy: add command line switch to disable mouse interaction (Timothy Elliott)</li>
mitmproxy: bug fixes (Choongwoo Han, sethp-jive, FreeArtMan)</li>
mitmweb: bug fixes (Colin Bendell)</li>
libmproxy: Add ability to fall back to TCP passthrough for non-HTTP connections.</li>
libmproxy: Avoid double-connect in case of TLS Server Name Indication. This yields a massive speedup for TLS handshakes.</li>
libmproxy: Prevent unneccessary upstream connections (macmantrl)</li>
Inline Scripts: New API for HTTP Headers</a></li>
Inline Scripts: Properly handle exceptions in done</code> hook</li>
Inline Scripts: Allow relative imports, provide __file__</code></li>
Examples: Add probabilistic TLS passthrough as an inline script</li>
netlib: Refactored HTTP protocol handling code</li>
netlib: ALPN support</li>
netlib: fixed a bug in the optional certificate verification.</li>
netlib: Initial Python 3.5 support (this is the first prerequisite for 3.x support in mitmproxy)</li> </ul>I had very little time to spend on mitmproxy this cycle due to an extraordinarily busy patch at work - so, all of the above was shepherded into being by my hyper-efficient co-maintainer, Maximilian Hils</a>. Having a steady pair of hands to keep things on track while I've been "absent" has been great. As a project, we'd also like to thank Google, who sponsored the work of Thomas Kriechbaumer</a> under the Google Summer of Code</a> program, and the Honeynet Project</a> under whose aegis the GSoC work was done. The excellent work Thomas has done on HTTP2 support and many, many other aspects of mitmproxy has been invaluable. Look for new releases building on this soon.
devd v0.2 (and some thoughts on small tools) 2015-11-05T00:00:00+00:00 I've just released version 0.2 of devd</a>, a local webserver for developers. This release contains a number of small improvement, and a few new features. -x (--exclude) flag to exclude files from livereload.</li> -P (--password) flag for quick HTTP Basic password protection.</li> -q (--quiet) flag to suppress all output from devd.</li> Humanize file sizes in console logs.</li> Improve directory indexes - better formatting, they now also livereload.</li> Devd's built-in livereload URLs are now less likely to clash with user URLs.</li> Internal 404 pages are now included in logs, timing measurement, and filtering.</li> Improved heuristics for livereload file change detection. We now handle things like transient files created by editors better.</li> A Linux ARM build will now be distributed with each release.</li> </ul> Thanks to Barret Rennie</a>, Bill Mill</a> and Judson Mitchell ([email protected]">[email protected]</a>) for contributing to this release. Some thoughts on small tools</h1> I love small, modest tools that do one thing well. I wrote devd partly out of nostalgia for thttpd</a>, a tiny web daemon that used to be my rough-and-ready, just-serve-files-now webserver for many years. It was a single, small binary that I could cross-compile for all the platforms I used, and it did its humble job well. Back in the day, it was one of the first things I put on every new box, along with my shell configuration and ssh keys. When it started showing its age, I moved on to the usual combination of built-in interpreter daemons (e.g. "python -m SimpleHTTPServer") and more heavy-handed tools, but not without a touch of sadness. Looking back on it now, it's clear that the thttpd I remember is a somewhat rose-tinted version of the real thing: thttpd actually did both more and less than I really needed. Devd strives to be a tool in the same sprit, that matches more closely what I want in my EDC</a> http daemon. If people think of it as a small, dependable and unobtrusive part of their daily toolset, I'll have done my job well. This release includes a few new features for devd, and the next release will add a few more. Not long after that, I expect it to be more or less feature complete. It will continue to improve internally, and bugs will always be fixed, but it will never sprout the ability to run PHP or render less on the fly (both feature requests I've had since the first release). Instead, it will focus on doing the few things it does as well as it can: serve files, act as a reverse proxy tying development servers together, and live reload when files change. devd: a web daemon for developers 2015-10-23T00:00:00+00:00 I've just released devd</a>, a small, self-contained, command-line-only HTTP server for developers. It started as a weekend stress-relief hack (that's a thing where I'm from), but has now become my preferred "daily driver" for most web-ish things. It's simple, direct and does more or less exactly what I need. This isn't terribly surprising, since I wrote it to scratch my own idiosyncratic itch - hopefully other, similarly itchy hackers will find it useful too. Quick start</h2> Serve the current directory, open it in the browser (-o), and livereload when files change (-l): devd -ol .</code></pre> Reverse proxy to http://localhost:8080, and livereload when any file in the src directory changes: devd -w ./src http://localhost:8080</code></pre>Features</h2> Cross-platform and self-contained</h3> Devd is a single statically compiled binary with no external dependencies, and is released for OSX, Linux and Windows. Don't want to install Node or Python in that light-weight Docker instance you're hacking in? Just copy over the devd binary and be done with it. Designed for the terminal</h3> This means no config file, no daemonization, and logs that are designed to be read in the terminal by a developer. Logs are colorized and log entries span multiple lines. Devd's logs are detailed, warn about corner cases that other daemons ignore, and can optionally include things like detailed timing information and full headers. </a> </div> To make quickly firing up an instance as simple as possible, devd automatically chooses an open port to run on (unless it's specified), and can open a browser window pointing to the daemon root for you (the -o flag in the example above). Livereload</h3> When livereload is enabled, devd injects a small script into HTML pages, just before the closing head tag. The script listens for change notifications over a websocket connection, and reloads resources as needed. No browser addon is required, and livereload works even for reverse proxied apps. If only changes to CSS files are seen, devd will only reload external CSS resources, otherwise a full page reload is done. This serves the current directory with livereload enabled: devd -l .</code></pre> You can also trigger livereload for files that are not being served, letting you reload reverse proxied applications when source files change. So, this command watches the src directory tree, and reverse proxies to a locally running application: devd -w ./src http://localhost:8888</code></pre>Reverse proxy + static file server + flexible routing</h3> Modern apps tend to be collections of web servers, and devd caters for this with flexible reverse proxying. You can use devd to overlay a set of services on a single domain, add livereload to services that don't natively support it, add throttling and latency simulation to existing services, and so forth. Here's a more complicated example showing how all this ties together - it overlays two applications and a tree of static files. Livereload is enabled for the static files (-l) and also triggered whenever source files for reverse proxied apps change: devd -l \ -w ./src/ \ /=http://localhost:8888 \ /api/=http://localhost:8889 \ /static/=./assets</code></pre>Light-weight virtual hosting</h3> Devd uses a dedicated domain - devd.io - to do simple virtual hosting. This domain and all its subdomains resolves to 127.0.0.1, which we use to set up virtual hosting without any changes to /etc/hosts or other local configuration. Route specifications that don't start with a leading / are taken to be subdomains of devd.io. So, the following command serves a static site from devd.io, and reverse proxies a locally running app on api.devd.io: devd ./static api=http://localhost:8888</code></pre> Check out the docs at the Github repo</a> for the full route specification syntax. Latency and bandwidth simulation</h3> Want to know what it's like to use your fancy 5mb HTML5 app from a mobile phone in Botswana? Look up the bandwidth and latency here</a>, and invoke devd like so (making sure to convert from kilobits per second to kilobytes per second): devd -d 114 -u 51 -l 75 .</code></pre> Devd tries to be reasonably accurate in simulating bandwidth and latency - it uses a token bucket implementation for throttling, properly handles concurrent requests, and chunks traffic up so data flow is smooth. mitmproxy: release v0.13 2015-07-26T00:00:00+00:00 </a> </div> This is a slightly late announcement of the release of mitmproxy v0.13</a>, which was pushed out the door earlier this week by my esteemed compatriots while I was tied up with other things. We have a number of big new features this time round. First, mitmproxy now has upstream certificate validation, thanks to the hard work of Kyle Morton</a>. Mitmproxy is increasingly being used in user-oriented roles where upstream cert validation is crucial, so this is a welcome improvement. We also have a new transparent proxy mode, which uses the HTTP Host headers to detect the upstream server to connect to, rather than the OS NAT tables. This isn't accurate 100% of the time, but it's so convenient that having it in the base makes sense. Thanks to Ijiro123</a>. Other improvements include include marking of flows in mitmproxy console (thanks to Jake Drahos</a>) and and an addition to the filter language allowing better matching of source and destination addresses (thanks to Israel Halle</a>) This release also features something a bit more unusual: a removed feature. We added the ability to forward server certificates through to the client verbatim to allow mitmproxy to exploit the infamous #gotofail</a> bug on IOS and OSX. We were one of the first (and perhaps THE first) publicly available mechanisms to exploit this issue, and pen testers, app reversers and curious folks everywhere rejoiced. Unfortunately, cert forwarding has become a support burden - for fiddly technical reasons, it adds a lot of complication to the way mitmproxy is distributed and installed. Since #gotofail is no longer so current, we've decided to remove support from mitmproxy. If you still have some vulnerable devices out there you need to muck with, the official answer at the moment is to install v0.12. mitmproxy v0.12.1 2015-06-04T00:00:00+00:00 </a> </div> I've just released mitmproxy v0.12.1</a>. This release fixes a few crashing bugs that slipped through in the previous iteration, so everyone should upgrade. Also included are a number of small improvements. The most noticeable of these is mouse interaction for mitmproxy console - the screen capture above shows me scrolling with my mouse, clicking to view a flow and switch tabs. We pay a small price for this - users now have to hold down a modifier key (shift on some systems, alt on others) to select text in the terminal for copying and pasting. To ease users into this, we've added a warning if we detect an attempt to select text without the right modifier key. mitmproxy: release v0.12 and some project news 2015-05-26T00:00:00+00:00 Project News</h2> Before we get to the new release, I'd like to give a quick update on some internal project developments. First up, after a somewhat involved process that included a couple of rounds of community voting and much discussion, we have a new logo: </a> </div> This will be rolled out in all the places where it makes sense along with the 0.12 release. Second, the long-dormant @mitmproxy</a> Twitter account is finally waking up. Please follow us there for mitmproxy project updates and related news. Third, we'd like to welcome Thomas Kriechbaumer</a> to the project. Thomas is being sponsored to work on mitmproxy under the Google Summer of Code</a> program, and will be adding HTTP2 support - one of our most anticipated features. Special thanks goes to the Honeynet Project</a> under whose aegis the GSoC work will be done. Lastly, a peek into the project's immediate future. We have websockets support on the way, thanks to a protocol contribution by Chandler Abraham</a>. We have HTTP2 on the way, thanks to Thomas. The mitmproxy web interface is gradually maturing behind the scenes, and should be ready to be unleashed on the world soon. And, of course, the project continues to improve quickly in almost every other respect. It's an exciting time, and there's a lot of interesting work to do - if you'd like to be involved, please get in touch. mitmproxy v0.12</h2> </a> </div> The most immediately visible change in v0.12 is a thorough overhaul of the console interface, which has been improved in almost every respect. Performance and responsiveness is better, keybindings have been consolidated, and options have been collected in a dedicated options screen (shortcut "o"). Palettes have been overhauled entirely, with improvements to the palettes themselves, the ability to change palettes on the fly, and support for non-transparent (mitmproxy sets the console background) and transparent (your emulator sets the console background) modes. The console application has also sprouted a powerful new cookie editor that will make tampering with cookie names and values more convenient. Other major features include official support for transparent mode on FreeBSD (thanks to Mike C</a>), the ability to log TLS master keys for use with other tools like WireShark, support for creating flows from scratch in the console app (thanks Marcelo Glezer</a>). A thorough overhaul of the documentation is also under way - thanks to Jim Shaver</a> for his work there. pathod v0.12</h2> I'm also releasing pathod v0.12. The primary change here is the first phase of full support for websockets. At the moment, this is client-only - server support will follow in the next release. Here's a taster - the pathoc command below initiates a websocket connection to echo.websockets.org, then sends 10 websocket frames, each with a body of 100 random bytes. > ./pathoc echo.websockets.org ws:/ wf:b@100:x10 >> ws:/ << 200 OK: 225 bytes >> wf:b@100:ir,@1</code></pre> The usual range of injections and stream manipulations are available, and every aspect of the websocket frames can be manipulated in ways that creatively violate the specs. See the pathod documentation for the language definition. binvis.io - a browser-based tool for visualising binary data 2015-03-04T00:00:00+00:00 Over the years, I've written a number of posts on this blog on the topic of binary data visualisation. I looked at using space-filling curves to understand the structure of binary data</a>, I've showed how entropy visualisation lets you trivially pick out compressed and encrypted sections</a>, and I've drawn pretty pictures of malware</a>. Unfortunately the tools I wrote (code here</a>) all produced static images, which made making practical use a pain. You really need interactivity to be able to combine visual exploration with inspection of the actual underlying data, and to let you easily export interesting sections. binvis.io</a></h2> l recently started toying with the idea of using web technologies to build an interactive visualiser of this sort. One thing led to another... and today, I'm happy to announce a first draft of the idea: binvis.io </a> </div> With binvis.io you can: Visually explore binary data</li> Cluster bytes to pick out fine structural features with space-filling curves</li> Use the simple scan layout to navigate and select data intuitively</li> Flip between a number of useful byte color mappings, including an entropy visualiser that lets you pick out compressed or encrypted sections</li> Export data segments for analysis</li> </ul> Next steps</h2> Right now, Binvis is local only - that is, when you open a file, all analysis is done in your browser and nothing is sent to the server. In the longer term, I'd like to add the ability to upload, share and annotate binaries, both publicly and privately. There is probably a market of... oh, at least a dozen people out there who would have use for an imgur-like sharing system for binaries. Fame and riches surely await. Of course, there are also an immense number of other improvements to be made to almost every aspect of binvis, ranging from speed, to better colour schemes, to improvements in interaction and UX. The todo list is long, and time is short, so I'm looking for serious collaborators. If you're interested, drop me a line! Thanks</h2> Binvis isn't the first interactive binary visualisation tool of this sort. A few others that spring to mind are ..cantor.dust</a>, bininspect</a> and binglide</a>. I'm trying to learn from these precursors, and I'm delighted to see that they all also drew, to a greater or lesser extent, on my earlier work. Thus the eternal cycle of code rolls on. I'd like to particularly thank Greg Conti</a> for letting me re-use the name of his own, much earlier visualisation tool</a>, for publishing a fascinating series of papers</a> and talks</a> on the topic, and for providing feedback both on this particular incarnation of the idea as well as my earlier dabblings. mitmproxy 0.11.2 2014-12-29T00:00:00+00:00 </a> </div> I've just pushed mitmproxy v0.11.2</a> out the door. This is primarily a bugfix release, but does have one very useful new feature: configuration files. All options available through command-line flags can now be set persistently in config files, for all the tools - see the documentation for more</a>. Adding this was made much easier by ConfigArgParse</a>, one of those small Python project gems that you feel more people should know about. Check it out. This release also features the usual array of bugfixes and small improvements. In particular, we know handle upstream servers that knock back connections without SNI better, and the onboarding app now works in the OSX binary builds. Everyone should update. mitmproxy and pathod 0.11 2014-11-07T00:00:00+00:00 </a> </div> I'm happy to announce that we've just released v0.11 of both mitmproxy</a> and pathod</a>. This release features a huge revamp of mitmproxy's internals and a long list of important features. Pathod has much improved SSL support and fuzzing. Our thanks to the many testers and [contributors](https: //github.com/mitmproxy/mitmproxy/blob/master/CONTRIBUTORS) that helped get this out the door. Please lodge bug reports and feature requests here</a>. Mitmproxy Changelog</h2> Performance improvements for mitmproxy console</li> SOCKS5 proxy mode allows mitmproxy to act as a SOCKS5 proxy server</li> Data streaming for response bodies exceeding a threshold ([email protected])</li> Ignore hosts or IP addresses, forwarding both HTTP and HTTPS traffic untouched</li> Finer-grained control of traffic replay, including options to ignore contents or parameters when matching flows ([email protected])</li> Pass arguments to inline scripts</li> Configurable size limit on HTTP request and response bodies</li> Per-domain specification of interception certificates and keys (see --cert option)</li> Certificate forwarding, relaying upstream SSL certificates verbatim (see --cert-forward)</li> Search and highlighting for HTTP request and response bodies in mitmproxy console ([email protected])</li> Transparent proxy support on Windows</li> Improved error messages and logging</li> Support for FreeBSD in transparent mode, using pf ([email protected])</li> Content view mode for WBXML ([email protected])</li> Better documentation, with a new section on proxy modes</li> Generic TCP proxy mode</li> Countless bugfixes and other small improvements</li> </ul> Pathod Changelog</h2> Hugely improved SSL support, including dynamic generation of certificates using the mitproxy cacert</li> pathoc -S dumps information on the remote SSL certificate chain</li> Big improvements to fuzzing, including random spec selection and memoization to avoid repeating randomly generated patterns</li> Reflected patterns, allowing you to embed a pathod server response specification in a pathoc request, resolving both on client side. This makes fuzzing proxies and other intermediate systems much better.</li> </ul> mitmproxy now supports #gotofail 2014-03-11T00:00:00+00:00 A few weeks ago, I posted that I had hacked up a version of mitmproxy that exploited CVE-2014-1266</a>, giving unrestricted access to nearly all HTTPS traffic on affected IOS and OSX devices. I chose not to release working code at the time, but a number of POCs</a> have been floating about publicly almost since the issue was first discovered. So, the time has come to publish - as of yesterday, mitmproxy's master branch</a> supports #gotofail. To see the exploit in action, invoke mitmproxy as follows: mitmproxy --ciphers="DHE-RSA-AES256-SHA" --cert-forward</code></pre> After configuring your device proxy, you should see something like this screenshot, which shows off interception of miscellaneous iTunes traffic: </a> </div> Note that the client device here has no mitmproxy CA certificate installed, and we get circumvention of certificate pinning "for free". Two new options make the magic work. The --ciphers option specifies which SSL ciphers we should expose to connecting clients. In this case, we force the client to use a DHE cipher, which is required to trigger the issue. The --cert-forward option tells mitmproxy to pass upstream SSL certificates down to the client unmodified. Usually we'd expect this to fail, since the upstream certs won't match mitmproxy's private key. In this case #gotofail means the client fails to properly execute the check, letting us pass certificates through to the client verbatim as if we owned them. There's one additional wrinkle that mitmproxy smooths over - before we can get the mismatching certificate and key to the client, OpenSSL itself has to be coaxed into accepting them. The first version of my exploit involved a patch to OpenSSL to remove the library's own consistency check, but this is inconvenient. Luckily it turns out that we can munge an obscure flag</a> in the RSA data-structures to circumvent this, which allows us to exploit #gotofail in pure Python. The moment I got this exploit working, I marched upstairs and confiscated my wife's un-updated iPhone 5 to add it to my pool of test devices (never fear - it's been replaced with a nice new 5S). Devices running IOS of the right vintage have suddenly become the gold standard for analysis and pen testing. This beautiful vulnerability lets us circumvent SSL effortlessly, completely sidestepping certificate pinning for all the applications I've tried, without any cumbersome and invasive interference with the device</a>. Combine this with the fact that these same devices also have an un-tethered jailbreak, and I think it's unlikely that we'll ever have an analysis platform this nice again. So, stockpile your IOS 7.0.6 devices now, and intercept all the things. Exploiting CVE-2014-1266 with mitmproxy 2014-02-25T00:00:00+00:00 This post is a quick recap of work I've been discussing on Twitter in the last few hours. I've just finished putting together a version of mitmproxy</a> that takes advantage of CVE-2014-1266</a>, Apple's critical SSL/TLS bug</a>. We knew in theory that the issue should give access to all SSL traffic using Apple's broken implementation - I can now report that this is also true in practice. I've confirmed full transparent interception of HTTPS traffic on both IOS (prior to 7.0.6) and OSX Mavericks. Nearly all encrypted traffic, including usernames, passwords, and even Apple app updates can be captured. This includes: App store and software update traffic</li> iCloud data, including KeyChain enrollment and updates</li> Data from the Calendar and Reminders</li> Find My Mac updates</li> Traffic for applications that use certificate pinning, like Twitter</li> </ul> It's difficult to over-state the seriousness of this issue. With a tool like mitmproxy in the right position, an attacker can intercept, view and modify nearly all sensitive traffic. This extends to the software update mechanism itself, which uses HTTPS for deployment. At the time of writing, Apple still doesn't have a fix deployed for OSX. It took less than a day to get the patched version of mitmproxy and its supporting libraries up and running. I won't be releasing my patches until well after Apple's pending update, but it's safe to assume that this is now being exploited in the wild. Of course, intelligence agencies have no doubt been on top of this for some time - perhaps some of the inflammatory Sochi security horror stories</a> were plausible after all. mitmproxy and pathod 0.10 2014-01-29T00:00:00+00:00 </a> </div> I've just released v0.10 of both mitmproxy</a> and pathod</a>. This is chiefly a bugfix release, with a few nice additional features to sweeten the pot. </a> </div> Perhaps the most visible change has been a huge improvement in the recommended method for installing the mitmproxy certificates. Certs are now served straight from the web application hosted in mitmproxy, which means that in most cases cert installation is as simple as typing the mitmproxy URL into the devce driver. See the docs</a> for more. In other, minor news - I see that the mitmproxy project</a> has just passed 2000 stars on GitHub. Between PyPi and the files we serve from mitmproxy.org</a>, the project has also seen nearly 100k downloads in the last year (after removing obvious bots). I know, I know - figures like these don't mean much, but it's still nice to see that people are using and enjoying mitmproxy. Changelog</h2> Support for multiple scripts and multiple script arguments</li> Easy certificate install through the in-proxy web app, which is now enabled by default</li> Forward proxy mode</a>, that forwards proxy requests to an upstream HTTP server</li> Reverse proxy now works with SSL</li> Search within a request/response using the "/" and "n" shortcut keys</li> A view that beatifies CSS files if cssutils is available</li> Many bug fix, documentation improvements, and more.</li> </ul> How I Learned to Stop Worrying and Love Golang 2013-11-21T00:00:00+00:00 Here's a riff on Malcolm Gladwell's rule of thumb about mastery</a>: you don't really know a programming language until you've written 10,000 lines of production-quality code in it. Like the original this is a generalization that is undoubtedly false in many cases - still, it broadly matches my intuition for most languages and most programmers1</a>. At the beginning of this year, I wrote a sniffy post about Go</a> when I was about 20% of the way to knowing the language by this measure. Today's post is an update from further along the curve - about 80% - following a recent set of adventures that included entirely rewriting choir.io</a>'s core dispatcher in Go. My opinion of Go has changed significantly in the meantime. Despite my initial exasperation, I found that the experience of actually writing Go was not unpleasant. The shallow issues became less annoying over time (perhaps just due to habituation), and the deep issues turned out to be less problematic in practice than in theory. Most of all, though, I found Go was just a fun and productive language to work in. Go has colonized more and more use cases for me, to the point where it is now seriously eroding my use of both Python and C. After my rather slow Road to Damascus experience, I noticed something odd: I found it difficult to explain why Go worked so well in practice. Sure, Go has a triad of really smashing ideas (interfaces, channels and goroutines), but my list of warts and annoyances is long enough that it's not clear on paper that the upsides outweigh the downsides. So, my experience of actually cutting code in Go was at odds with my rational analysis of the language, which bugged me. I've thought about this a lot over the last few months, and eventually came up with an explanation that sounds like nonsense at first sight: Go's weaknesses are also its strengths. In particular, many design choices that seem to reduce coherence and maintainability at first sight actually combine to give the language a practical character that's very usable and compelling. Lets see if I can convince you that this isn't as crazy as it sounds. Maps and magic</h2> Lets pretend that we're the designers of Go, and see if we can follow the thinking that went into a seemingly simple part of the language - the value retrieval syntax for maps. We begin with the simplest possible case - direct, obvious, and familiar from a number of other languages: v := mymap["foo"]</code></pre> It would be nice if we could keep it this simple, but there's a complication - what if "foo" doesn't exist in the map? The fact that Go doesn't have exceptions limits the possibilities. We can discard some gross options out of hand - for instance, making this a runtime error or returning a magic value flagging non-existence are both pretty horrible. A more plausible route is to pass an existence flag back as a second return value: v, ok := mymap["foo"]</code></pre> So far, so logical, and if consistency was the primary goal, we would stop here. However, having two return arguments would make many common patterns of use inconvenient. You would constantly be discarding the ok flag in situations where it wasn't needed. Another repercussion is that you couldn't directly use the results in an if clause. Instead of a clean phrasing like this (relying on the zero value returned by default): if map["foo"] { // Do something }</code></pre> ... you would have to do this: if _, ok := map["foo"]; ok { // Do something }</code></pre> Ugh. What we really want, is to get the best of both worlds. The ease of the first signature, plus the flexibility of the second. In fact, Go does exactly that, in a surprising way: it discards some basic conceptual constraints, and makes the data returned by the map accessor depend on how many variables it's assigned to. When it's assigned to one variable, it just returns the value. When it's assigned to two variables, it also returns an existence flag. Compare this with Python. The dictionary access syntax is identical: v = mymap["foo"]</code></pre> Python does have exceptions, so non-existence is signaled through a KeyError, and the dictionary interface includes a get method that allows the user to specify a default return when this is too cumbersome. This is certainly consistent on the surface, but there's also a deeper structure that helps the user understand what's going on. The square bracket accessor syntax is just syntactic sugar, because the call above is equivalent to this: v = mymap.__getitem__("foo")</code></pre> In a sense, then, the value access is just a method call. The coder can write a dictionary of their own that acts just like a built-in dictionary2</a>, and can also build a clear mental model of what's going on underneath. Python dictionaries are conceptually built up from more primitive language elements, where Go maps are designed down from concrete use cases. Range: a compendium of use cases</h2> An even stranger beast is the range clause of Go's for loops. Like map accessors, range will return either one value or two, depending on the number of variables assigned to. What's particularly revealing about range is the way these results differ depending on the data type being ranged over. Consider this piece of code, for example: for x, y := range v { }</code></pre> To figure out what this does, we need to know the type of v, and then consult a table like this:3</a> Range expression</th> 1st Value</th> 2nd Value</th> </tr> array or slice</td> index i</td> a[i]</td> </tr> map</td> key k</td> m[k]</td> </tr> string</td> index i of rune</td> rune int</td> </tr> channel</td> element</td> error</td> </tr> </table> What range does for arrays and maps seems consistent and not particularly surprising. Things get a tad slightly odd with channels. A second variable arguably doesn't make much sense when ranging over a channel, so trying to do this results in a compile time error. Not terribly consistent, but logical. Weirder still is range over strings. When operating on a string, range returns runes</a> (Unicode code points) not bytes. So, this code: s := "a\u00fcb" for a, b := range s { fmt.Println(a, b) }</code></pre> Prints this: 0 97 1 252 3 98</code></pre> Notice the jump from 1 to 3 in the array index, because the rune at offset 1 is two bites wide in UTF-8. And look what happens when we now retrieve the value at that offset from the array. This: fmt.Println(s[1])</code></pre> Prints this: 195</code></pre> What gives? At first glance, it's reasonable to expect this to print 252, as returned by range. That's wrong, though, because string access by index operates on bytes, so what we're given is the first byte of the UTF-8 encoding of the rune. This is bound to cause subtle bugs. Code that works perfectly on ASCII text simply due to the fact that UTF-8 encodes these in a single byte will fail mysteriously as soon as non-ASCII characters appear. My argument here is that range is a very clear example of design directly from concrete use cases down, with little concern for consistency. In fact, the table of range return values above is really just a compendium of use cases: at each point the result is simply the one that is most directly useful. So, it makes total sense that ranging over strings returns runes. In fact, doing anything else would arguably be incorrect. What's characteristic here is that no attempt was made to reconcile this interface with the core of the language. It serves the use case well, but feels jarring. Arrays are values, maps are references</h2> One final example along these lines. A core irregularity at the heart of Go is that arrays are values, while maps are references. So, this code will modify the s variable: func mod(x map[int] int){ x[0] = 2 } func main() { s := map[int]int{} mod(s) fmt.Println(s) }</code></pre> And print: map[0:2]</code></pre> While this code won't: func mod(x [1]int){ x[0] = 2 } func main() { s := [1]int{} mod(s) fmt.Println(s) }</code></pre> And will print: [0]</code></pre> This is undoubtedly inconsistent, but it turns out not to be an issue in practice, mostly because slices are references, and are passed around much more frequently than arrays. This issue has surprised enough people to make it into the Go FAQ, where the justification is as follows</a>: There's a lot of history on that topic. Early on, maps and channels were syntactically pointers and it was impossible to declare or use a non-pointer instance. Also, we struggled with how arrays should work. Eventually we decided that the strict separation of pointers and values made the language harder to use. This change added some regrettable complexity to the language but had a large effect on usability: Go became a more productive, comfortable language when it was introduced. </blockquote> This is not exactly the clearest explanation for a technical decision I've ever read, so allow me to paraphrase: "Things evolved this way for pragmatic reasons, and consistency was never important enough to force a reconciliation". The G Word</h2> Now we get to that perpetual bugbear of Go critiques: the lack of generics. This, I think, is the deepest example of the Go designers' willingness to sacrifice coherence for pragmatism. One gets the feeling that the Go devs are a tad weary of this argument by now, but the issue is substantive and worth facing squarely. The crux of the matter is this: Go's built-in container types are super special. They can be parameterized with the type of their contained values in a way that user-written data structures can't be. The supported way to do generic data structures is to use blank interfaces. Lets look at an example of how this works in practice. First, here is a simple use of the built-in array type. l := make([]string, 1) l[0] = "foo" str := l[0]</code></pre> In the first line we initialize the array with the type string. We then insert a value, and in the final line, we retrieve it. At this point, str has type string and is ready to use. The user-written analogue of this might be a modest data structure with put and get methods. We can define this using interfaces like so: type gtype struct { data interface{} } func (t *gtype) put(v interface{}) { t.data = v } func (t *gtype) get() interface{} { return t.data }</code></pre> To use this structure, we would say: v := gtype{} v.put("foo") str := v.get().(string)</code></pre> We can assign a string to a variable with the empty interface type without doing anything special, so put is simple. However, we need to use a type assertion on the way out, otherwise the str variable will have type interface{}, which is probably not what we want. There are a number of issues here. It's cosmetically bothersome that we have to place the burden of type assertion on the caller of our data structure, making the interface just a little bit less nice to use. But the problems extend beyond syntactic inconvenience - there's a substantive difference between these two ways of doing things. Trying to insert a value of the wrong type into the built-in array causes a compile-time error, but the type assertion acts at run-time and causes a panic on failure. The blank-interface paradigm sidesteps Go's compile time type checking, negating any benefit we may have received from it. The biggest issue for me, though, is the conceptual inconsistency. This is something that's difficult to put into words, so here's a picture: </a> </div> The fact that the built-in containers magically do useful things that user-written code can't irks me. It hasn't become less jarring over time, and still feels like a bit of grit in my eye that I can't get rid of. I might be an extreme case, but this is an aesthetic instinct that I think is shared by many programmers, and would have convinced many language designers to approach the problem differently. The extent to which Go's lack of generics is a critical problem, however, is not the point here. The meat of the matter is why this design decision was taken, and what it reveals about the character of Go. Here's how the lack of generics is justified by the Go developers</a>: Many proposals for generics-like features have been mooted both publicly and internally, but as yet we haven't found a proposal that is consistent with the rest of the language. We think that one of Go's key strengths is its simplicity, so we are wary of introducing new features that might make the language more difficult to understand. </blockquote> Instead of creating the atomic elements needed to support generic data structures then adding a suite of them to the standard library, the Go team went the other way. There was a concrete use case for good data structures, and so they were added. Attempting a deep reconciliation with the rest of the language was a secondary requirement that was so unimportant that it fell by the wayside for Go 1.x. A Pragmatic Beauty</h1> Lets over-simplify for a moment and divide languages into two extreme camps. On the one hand, you have languages that are highly consistent, with most higher order functionality deriving from the atomic elements of the language. In this camp, we can find languages like Lisp. On the other hand are languages that are shamelessly eager to please. They tend to grow organically, sprouting syntax as needed to solve specific pragmatic problems. As a consequence, they tend to be large, syntactically diverse, not terribly coherent, and, occasionally, sometimes even unparseable</a>. In this camp, we find languages like Perl. It's tempting to think that there exists a language somewhere in the infinite multiverse of possibilities that unites perfect consistency and perfect usability, but if there is, we haven't found it. The reality is that all languages are a compromise, and that balancing these two forces against each other is really what makes language design so hard. Placing too much value on consistency constrains the human concessions we can make for mundane use cases. Making too many concessions results in a language that lacks coherence. Like many programmers, I instinctively prefer purity and consistency and distrust "magic". In fact, I've never found a language with a strongly pragmatic bent that I really liked. Until now, that is. Because there's one thing I'm pretty clear on: Go is on the Perl end of this language design spectrum. It's designed firmly from concrete use cases down, and shows its willingness to sacrifice consistency for practicality again and again. The effects of this design philosophy permeate the language. This, then, is the source of my initial dissatisfaction with Go: I'm pre-disposed to dislike many of its core design decisions. Why, then, has the language grown on me over time? Well, I've gradually become convinced that practically-motivated flaws like the ones I list in this post add up to create Go's unexpected nimbleness. There's a weird sort of alchemy going on here, because I think any one of these decisions in isolation makes Go a worse language (even if only slightly). Together, however, they jolt Go out of a local maximum many procedural languages are stuck in, and take it somewhere better. Look again at each of the cases above, and imagine what the cumulative effect on Go would have been if the consistent choice had been made each time. The language would have more syntax, more core concepts to deal with, and be more verbose to write. Once you reason through the repercussions, you find that the result would have been a worse language overall. It's clear that Go is not the way it is because its designers didn't know better, or didn't care. Go is the result of a conscious pragmatism that is deep and audacious. Starting with this philosophy, but still managing to keep the language small and taut, with almost nothing dispensable or extraneous took great discipline and insight, and is a remarkable achievement. So, despite its flaws, Go remains graceful. It just took me a while to appreciate it, because I expected the grace of a ballet dancer, but found the grace of an battered but experienced bar-room brawler. -- Edited to remove some inaccuracies about channels. ^3Simplified from here</a>. </div> ^{2 I don't mean mundane details like the syntax and core concepts of a language. In the case of Go, you can get a handle on these in an hour by reading the language specification. </div>}^{1 Pedant hedge: yes, the illusion isn't perfect, and there are in fact subtle ways in which Python dictionaries are not just objects like any other. </div>} mitmproxy and pathod 0.9.2 2013-08-25T00:00:00+00:00 </a> </div> I've just released v0.9.2 of both mitmproxy</a> and pathod</a>. This is a bugfix release, chiefly to address two crashing issues affecting mitmproxy when relaying SSL traffic. A range of other fixes and improvements are also included - if you use mitmproxy, you should upgrade. CHANGELOG</h2> Improvements to the mitmproxywrapper.py helper script for OSX.</li> Don't take minor version into account when checking for serialized file compatibility.</li> Fix a bug causing resource exhaustion under some circumstances for SSL connections.</li> Revamp the way we store interception certificates. We used to store these on disk, they're now in-memory. This fixes a race condition related to cert handling, and improves compatibility with Windows, where the rules governing permitted file names are weird, resulting in errors for some valid IDNA-encoded names.</li> Display transfer rates for responses in the flow list.</li> Many other small bugfixes and improvements.</li> </ul> Introducing choir.io 2013-08-16T00:00:00+00:00 </a> choir.io </div> </div> Today, I'm raising the veil (slightly) on a new project - choir.io</a>. The most succinct description of choir.io is that it is a service that turns events into sound. Why would you want to do that? Well, I believe that there are compelling reasons to make sound part of your monitoring stack. Let's see if I can convince you. The soundscape</h2> When I walk into my study every morning, I'm surrounded a rich, subtle soundscape that exists just beneath conscious perception. My air-conditioner, computers and monitors all emit hums and purrs. I can "tune in" to these if I focus, but they usually only draw my attention when something changes. When the power goes out there is a deathly silence, when a CPU fan noise changes pitch or texture, it bothers me immediately. Layered over this background are more obtrusive sounds, closer to the threshold of awareness - the clacking of keyboards, faint noises of my family getting ready for their day upstairs, the front door opening and closing. Whether or not I pay attention to these is somewhat context dependent. Am I waiting, or instance, for my wife and kids to start trooping down the stairs so I can join them for my son's swimming lesson? If I am, I listen out for those sounds specifically. I get an enormous amount of information about my world from these more discrete, event-related noises. Finally, there are the really obtrusive sounds, things that immediately get my attention. This might be someone saying my name, my phone ringing, a knock at the door, or a smoke alarm. I'm very aware of these, and they usually signal something I have to deal with immediately. These layers of more and less obtrusive sounds form a soundscape that is ever-present, and utterly necessary in our day-to-day lives. Notice how effortless this process of extracting meaning from our ambient sounds is. Our minds process this information stream without any mental exertion, filters out what we don't need to notice, and draws our attention to what we do. There's a lot of cognitive research (that I might delve into in future posts) that show that our brains and auditory systems are specifically designed to make sense of the world in this way. We have nothing like this rich texture of ambient awareness for the technology that surrounds us. Our monitoring mechanisms seem to be stuck at the ends of the intrusiveness spectrum. At one end, we have email notifications that demand our attention until we start to ignore them or silence them with a filter. At the other end we have passive status dashboards that require us to remember to switch context and visually consult a different interface. Choir.io doesn't aim to supplant either of these, but tries to fill in the blank portion of the awareness spectrum between them. When I sit at my desk, I can hear our server architecture humming away. There's the subtle pitter-patter of hits to various webservers, the occasional clack of an SSH login. Occasionally there is a chime when @alexdong pushes to Github, followed shortly by the celebratory cheer of a server deploy. When I hear the jarring note of a 500 server error, I switch context to view logs or a dashboard, but otherwise my focus stays with my editor window. Choir is young, but it's already become an indispensable part of my life. Challenges and next steps</h2> There are a number of key questions that we'd like to answer with the help of our intrepid early adopters. First among these is the question of soundscape design. What makes a good sound pack? What is the right mix of intrusive and non-intrusive sounds? How do we construct soundscapes that blend into the background like natural sounds do? Another set of questions surrounds the API and integration. What is the right blend of simplicity and power is in the API? Which services should we integrate with next? There are some obvious next steps in the works. We recognize that sound pack design is a deep problem with subjective solutions. So, letting users assemble, edit and eventually share their own sound packs is high on our list of priorities. Free-standing Choir.io player apps for Windows and OSX will also be on the way soon, so you won't need to remember to keep a browser tab open. Technical improvements to the API that are on the way include UDP and SSL support. Choir is trying to do something new, and we want as much feedback as early in the process as possible. So, we've decided to start sending out invites today, even though Choir is far from the polished system that it will be in a few months. If you're brave, willing to give frank feedback, and want to help us explore this exciting idea, please request an invite</a>. mitmproxy 0.9.1 2013-06-16T00:00:00+00:00 </a> </div> I'm happy to announce the release of mitmproxy 0.9.1</a>. This is a bugfix release, with no significant changes in behaviour. As hinted in my previous release note, the project itself is also evolving. As of this release, mitmproxy and its sister projects (pathod</a> and netlib</a>) are housed under a separate organization on Github, rather than my own personal space: github.com/mitmproxy</a> I'm also very happy to welcome the first external core developer to the mitmproxy projext: Maximilian Hils</a>. Max is the author of HoneyProxy</a>, a web analysis front-end for mitmproxy. In the next few months, he'll be working on integrating and expanding his work to become mitmproxy's official web interface. Max's efforts will be sponsored by Google under their Summer of Code</a> program, and will be mentored by the HoneyNet Project</a>. Changelog</h2> Use "correct" case for Content-Type headers added by mitmproxy.</li> Make UTF environment detection more robust.</li> Improved MIME-type detection for viewers.</li> Always read files in binary mode (Windows compatibility fix).</li> Correct PyOpenSSL dependency declaration.</li> Some developer documentation.</li> </ul> Skout: a devastating privacy vulnerability 2013-05-31T00:00:00+00:00 I've become a bit weary of the process of public vulnerability disclosure - I'm much more likely nowadays to just drop companies an anonymous notice and move on. Every so often, though, I come across an issue so egregious that talking about it publicly seems like an imperative. This is one of them. First, some background. Skout is a location-based mobile social network. The idea is to allow people to meet others in their area, semi-anonymously, get to know them, and then perhaps line up a meeting in meatspace. As far as I can tell, a huge fraction of the userbase are singles, using Skout as an ad-hoc dating app. Skout's scale is significant - they don't release exact user numbers, but I've seen claims of more than 10 million users, and a growth rate of a million users per month. In 2012, Skout went through a major PR catastrophe, when its service was linked to no fewer than 3 separate rapes of children</a> by adult men posing as teenagers. Skout immediately suspended the service for teenagers and went through a security re-vamp. A month later, teens were allowed back</a>, with Skout making much of its new safety system, "advanced, proprietary algorithms" to weed out stalkers, and its long-term commitment to community safety. Given this background, the problem I found is simple but devastating. The Skout mobile application talks to Skout's servers through a simple API. When a user's profile is viewed an unencrypted, plain-HTTP request is made to to a path like this: http://i22.skout.com/services/ServerService/getProfile</code></pre> What's returned is a blob of XML containing the user's complete profile data. In fact, the profile data is too complete, including some bits of data information that is never actually used by the app. For example, we can see the user's exact date of birth: <ax213:birthdayDate>xx/xx/1995</ax213:birthdayDate></code></pre> ... but only the user's age in years is actually displayed. Most serious, however, is the high-precision location information that is returned in the ax213:homeLocation and ax213:location tags: <ax213:latitude>-xx.xxx</ax213:latitude> <ax213:longitude>xxx.xxx</ax213:longitude></code></pre> The three decimal places of precision in the co-ordinates is enough to locate a user to within about 110 meters north-south, and substantially less than that east-west depending on the distance from the equator. Here's what that looks like in a hypothetical example: </a> </div> I used mitmproxy</a> to observe Skout's traffic, but because the request is unencrypted any tool that allows you to inspect network traffic would be enough. The result is a stalker's wet dream - click on an anonymous profile, watch your network traffic, and find out exactly where the victim lives. I've also seen minors located at malls where they hang out, and at their schools... Given the scale of Skout's userbase and the ease with which the data can be obtained, I think there's a high likelihood that this issue has already been used for unsavoury purposes. I reported the vulnerability to Skout on the 24th of May. I'm happy to report that they immediately realised the seriousness of the situation, and their API stopped returning exact lat/long values a few hours later. Subsequent correspondence with Niklas Lindstrom, Skout's CTO, confirmed that they were taking steps to tighten security. I've encouraged Skout to speak about this publicly - their userbase needs to know about the issue, and need to be reassured that action is being taken to ensure that this type of privacy breach won't ever recur. How mitmproxy works 2013-05-16T00:00:00+00:00 I started work on mitmproxy</a> because I was frustrated with the available interception tools. I had a long list of minor complaints - they were insufficiently flexible, not programmable enough, mostly written in Java (a language I don't enjoy), and so forth. My most serious problem, though, was opacity. The best tools were all closed source and commercial. SSL interception is a complicated and delicate process, and after a certain point, not understanding precisely what your proxy is doing just doesn't fly. The text below is now part of the official documentation</a> of mitmproxy. It's a detailed description of mitmproxy's interception process, and is more or less the overview document I wish I had when I first started the project. I proceed by example, starting with the simplest unencrypted explicit proxying, and working up to the most complicated interaction - transparent proxying of SSL-protected traffic1</a> in the presence of SNI</a>. Explicit HTTP</h2> Configuring the client to use mitmproxy as an explicit proxy is the simplest and most reliable way to intercept traffic. The proxy protocol is codified in the HTTP RFC</a>, so the behaviour of both the client and the server is well defined, and usually reliable. In the simplest possible interaction with mitmproxy, a client connects directly to the proxy and makes a request that looks like this: GET http://example.com/index.html HTTP/1.1</code></pre> This is a proxy GET request - an extended form of the vanilla HTTP GET request that includes a schema and host specification, and it includes all the information mitmproxy needs to relay the request upstream. </a> </div> 1</td> The client connects to the proxy and makes a request.</td> </tr> 2</td> Mitmproxy connects to the upstream server and simply forwards the request on.</td> </tr> </tbody> </table> Explicit HTTPS</h2> The process for an explicitly proxied HTTPS connection is quite different. The client connects to the proxy and makes a request that looks like this: CONNECT example.com:443 HTTP/1.1</code></pre> A conventional proxy can neither view nor manipulate an SSL-encrypted data stream, so a CONNECT request simply asks the proxy to open a pipe between the client and server. The proxy here is just a facilitator - it blindly forwards data in both directions without knowing anything about the contents. The negotiation of the SSL connection happens over this pipe, and the subsequent flow of requests and responses are completely opaque to the proxy. The MITM in mitmproxy</h3> This is where mitmproxy's fundamental trick comes into play. The MITM in its name stands for Man-In-The-Middle - a reference to the process we use to intercept and interfere with these theoretically opaque data streams. The basic idea is to pretend to be the server to the client, and pretend to be the client to the server, while we sit in the middle decoding traffic from both sides. The tricky part is that the Certificate Authority</a> system is designed to prevent exactly this attack, by allowing a trusted third-party to cryptographically sign a server's SSL certificates to verify that they are legit. If this signature doesn't match or is from a non-trusted party, a secure client will simply drop the connection and refuse to proceed. Despite the many shortcomings of the CA system as it exists today, this is usually fatal to attempts to MITM an SSL connection for analysis. Our answer to this conundrum is to become a trusted Certificate Authority ourselves. Mitmproxy includes a full CA implementation that generates interception certificates on the fly. To get the client to trust these certificates, we register mitmproxy as a trusted CA with the device manually</a>. Complication 1: What's the remote hostname?</h3> To proceed with this plan, we need to know the domain name to use in the interception certificate - the client will verify that the certificate is for the domain it's connecting to, and abort if this is not the case. At first blush, it seems that the CONNECT request above gives us all we need - in this example, both of these values are "example.com". But what if the client had initiated the connection as follows: CONNECT 10.1.1.1:443 HTTP/1.1</code></pre> Using the IP address is perfectly legitimate because it gives us enough information to initiate the pipe, even though it doesn't reveal the remote hostname. Mitmproxy has a cunning mechanism that smooths this over - upstream certificate sniffing</a>. As soon as we see the CONNECT request, we pause the client part of the conversation, and initiate a simultaneous connection to the server. We complete the SSL handshake with the server, and inspect the certificates it used. Now, we use the Common Name in the upstream SSL certificates to generate the dummy certificate for the client. Voila, we have the correct hostname to present to the client, even if it was never specified. Complication 2: Subject Alternative Name</h3> Enter the next complication. Sometimes, the certificate Common Name is not, in fact, the hostname that the client is connecting to. This is because of the optional Subject Alternative Name</a> field in the SSL certificate that allows an arbitrary number of alternative domains to be specified. If the expected domain matches any of these, the client will proceed, even though the domain doesn't match the certificate Common Name. The answer here is simple: when extract the CN from the upstream cert, we also extract the SANs, and add them to the generated dummy certificate. Complication 3: Server Name Indication</h3> One of the big limitations of vanilla SSL is that each certificate requires its own IP address. This means that you couldn't do virtual hosting where multiple domains with independent certificates share the same IP address. In a world with a rapidly shrinking IPv4 address pool this is a problem, and we have a solution in the form of the Server Name Indication</a> extension to the SSL and TLS protocols. This lets the client specify the remote server name at the start of the SSL handshake, which then lets the server select the right certificate to complete the process. SNI breaks our upstream certificate sniffing process, because when we connect without using SNI, we get served a default certificate that may have nothing to do with the certificate expected by the client. The solution is another tricky complication to the client connection process. After the client connects, we allow the SSL handshake to continue until just after the SNI value has been passed to us. Now we can pause the conversation, and initiate an upstream connection using the correct SNI value, which then serves us the correct upstream certificate, from which we can extract the expected CN and SANs. There's another wrinkle here. Due to a limitation of the SSL library mitmproxy uses, we can't detect that a connection hasn't sent an SNI request until it's too late for upstream certificate sniffing. In practice, we therefore make a vanilla SSL connection upstream to sniff non-SNI certificates, and then discard the connection if the client sends an SNI notification. If you're watching your traffic with a packet sniffer, you'll see two connections to the server when an SNI request is made, the first of which is immediately closed after the SSL handshake. Luckily, this is almost never an issue in practice. Putting it all together</h3> Lets put all of this together into the complete explicitly proxied HTTPS flow. </a> </div> 1</td> The client makes a connection to mitmproxy, and issues an HTTP CONNECT request.</td> </tr> 2</td> Mitmproxy responds with a 200 Connection Established, as if it has set up the CONNECT pipe.</td> </tr> 3</td> The client believes it's talking to the remote server, and initiates the SSL connection. It uses SNI to indicate the hostname it is connecting to.</td> </tr> 4</td> Mitmproxy connects to the server, and establishes an SSL connection using the SNI hostname indicated by the client.</td> </tr> 5</td> The server responds with the matching SSL certificate, which contains the CN and SAN values needed to generate the interception certificate.</td> </tr> 6</td> Mitmproxy generates the interception cert, and continues the client SSL handshake paused in step 3.</td> </tr> 7</td> The client sends the request over the established SSL connection.</td> </tr> 7</td> Mitmproxy passes the request on to the server over the SSL connection initiated in step 4.</td> </tr> </tbody> </table> Transparent HTTP</h2> When a transparent proxy is used, the HTTP/S connection is redirected into a proxy at the network layer, without any client configuration being required. This makes transparent proxying ideal for those situations where you can't change client behaviour - proxy-oblivious Android applications being a common example. To achieve this, we need to introduce two extra components. The first is a redirection mechanism that transparently reroutes a TCP connection destined for a server on the Internet to a listening proxy server. This usually takes the form of a firewall on the same host as the proxy server - iptables</a> on Linux or pf</a> on OSX. Once the client has initiated the connection, it makes a vanilla HTTP request, which might look something like this: GET /index.html HTTP/1.1</code></pre> Note that this request differs from the explicit proxy variation, in that it omits the scheme and hostname. How, then, do we know which upstream host to forward the request to? The routing mechanism that has performed the redirection keeps track of the original destination for us. Each routing mechanism has a different way of exposing this data, so this introduces the second component required for working transparent proxying: a host module that knows how to retrieve the original destination address from the router. In mitmproxy, this takes the form of a built-in set of modules</a> that know how to talk to each platform's redirection mechanism. Once we have this information, the process is fairly straight-forward. </a> </div> 1</td> The client makes a connection to the server.</td> </tr> 2</td> The router redirects the connection to mitmproxy, which is typically listening on a local port of the same host. Mitmproxy then consults the routing mechanism to establish what the original destination was.</td> </tr> 3</td> Now, we simply read the client's request...</td> </tr> 4</td> ... and forward it upstream.</td> </tr> </tbody> </table> Transparent HTTPS</h2> The first step is to determine whether we should treat an incoming connection as HTTPS. The mechanism for doing this is simple - we use the routing mechanism to find out what the original destination port is. By default, we treat all traffic destined for ports 443 and 8443 as SSL. From here, the process is a merger of the methods we've described for transparently proxying HTTP, and explicitly proxying HTTPS. We use the routing mechanism to establish the upstream server address, and then proceed as for explicit HTTPS connections to establish the CN and SANs, and cope with SNI. </a> </div> 1</td> The client makes a connection to the server.</td> </tr> 2</td> The router redirects the connection to mitmproxy, which is typically listening on a local port of the same host. Mitmproxy then consults the routing mechanism to establish what the original destination was.</td> </tr> 3</td> The client believes it's talking to the remote server, and initiates the SSL connection. It uses SNI to indicate the hostname it is connecting to.</td> </tr> 4</td> Mitmproxy connects to the server, and establishes an SSL connection using the SNI hostname indicated by the client.</td> </tr> 5</td> The server responds with the matching SSL certificate, which contains the CN and SAN values needed to generate the interception certificate.</td> </tr> 6</td> Mitmproxy generates the interception cert, and continues the client SSL handshake paused in step 3.</td> </tr> 7</td> The client sends the request over the established SSL connection.</td> </tr> 7</td> Mitmproxy passes the request on to the server over the SSL connection initiated in step 4.</td> </tr> </tbody> </table> ^{1 I use "SSL" to refer to both SSL and TLS in the generic sense, unless otherwise specified. </div> pathod 0.9 2013-05-16T00:00:00+00:00 I've just released pathod 0.9</a>, my toolset for crafting malicious and interesting HTTP traffic. Apart from the usual range of stability improvements and bugfixes, this release introduces a major new set of features: proxy support. Pathoc</a>, the client, has sprouted support for vanilla proxy connections, and is also able to tunnel through proxies using CONNECT. Pathod</a>, the server, will now respond to proxy requests as well as straight HTTP, and will treat CONNECT requests as SSL with on-the-fly generation of dummy certificates. The Pathod changes in particular open a whole new range of possibilities for fuzzing and other mischief. Any client with proxy support can be directed at Pathod, which can then impersonate the upstream server and return the creatively malicious response of your choice. There have also been some organizational changes. This is the first release based on netlib</a>, the gonzo networking library pathod now shares with mitmproxy</a>. Over the next while, pathod and mitmproxy will move closer together. As a sign of this, the major version numbers between these projects are now synchronized. mitmproxy 0.9 2013-05-15T00:00:00+00:00 </a> </div> I'm happy to announce the release of mitmproxy 0.9</a>. This is a major release, with huge improvements to mitmproxy pretty much across the board. So much has happened in the year since the last release that it's difficult to pick out the headlines. Mitmproxy is now faster, more scalable, and works in more tricky corner cases than ever before. Full transparent mode support has landed for both Linux and OSX. Content decoding is much nicer, with a slew of new targets like AMF</a> and Protocol Buffers</a>. We now have a WSGI container that allows you to host web apps right in the proxy. In addition to this, there is a myriad of new features, bugfixes and other small improvements. There are also changes afoot in the project itself. As a first step, I've moved mitmproxy from the GPLv3 to an MIT license. I hope that this will make it easier for people to use the project in more contexts. Keep an eye out for more changes along these lines soon, geared to broadening participation in the project. Changelog</h2> Upstream certs mode is now the default.</li> Add a WSGI container that lets you host in-proxy web applications.</li> Full transparent proxy support for Linux and OSX.</li> Introduce netlib, a common codebase for mitmproxy and pathod</a>.</li> Full support for SNI.</li> Color palettes for mitmproxy, tailored for light and dark terminal backgrounds.</li> Stream flows to file as responses arrive with the "W" shortcut in mitmproxy.</li> Extend the filter language, including ~d domain match operator, ~a to match asset flows (js, images, css).</li> Follow mode in mitmproxy ("F" shortcut) to "tail" flows as they arrive.</li> --dummy-certs option to specify and preserve the dummy certificate directory.</li> Server replay from the current captured buffer.</li> Huge improvements in content views. We now have viewers for AMF, HTML, JSON, Javascript, images, XML, URL-encoded forms, as well as hexadecimal and raw views.</li> Add Set Headers, analogous to replacement hooks. Defines headers that are set on flows, based on a matching pattern.</li> A graphical editor for path components in mitmproxy.</li> A small set of standard user-agent strings, which can be used easily in the header editor.</li> Proxy authentication to limit access to mitmproxy</li> </ul> Google, destroyer of ecosystems 2013-03-14T00:00:00+00:00 Google has finally shut down a service I actually care about - Google Reader will die a graceless, undignified death on July 1, 2013</a>. The only way Google could inconvenience me more would be to shut down search itself, and yet - I'm not angry that Google is shutting Reader down. I'm furious that they ever entered the RSS game at all. Consider this quote from a TechCrunch article in January 2006</a>. Here, Michael Arrington ends an article about the shutdown of a feed reader service with a statement that seems truly bizarre today: The RSS reader space is becoming hyper competitive, with dozens of different choices for readers. </blockquote> A hyper competitive space with dozens of choices? Reader made its first public appearance a couple of months before this, in October 2005. I remember this period well - it was a time of immense excitement, when RSS seemed to be the future, the news ecosystem was vibrant, and this thing called the blogosphere, fueled by peer subscription, was doubling in size every six months. It was into this magic garden that Google wandered, like a giant toddler leaving destruction in its wake. Reader was undeniably a good product, but it's best quality was also its worst: it was free. Subsidized by Google's immense search profits, it never had to earn its keep, and its competitors started to die. Over time, the "hyper competitive" RSS reader market turned into a monoculture. Today, on the eve of its shutdown, RSS more or less means "Google Reader" to a large fraction of readers, to the extent where even the best feed readers on IOS are just Google Reader clients1</a>. The sudden shock of Reader's closure will harm a news ecosystem that I already believe to be deeply ill</a>. Google Reader is not just a core part of my information diet - it's also the most direct channel I have to readers of this blog. As of today, the Reader subscriber count for corte.si</a> stands at about 3 times the total number of other subscribers combined. Some of these readers will migrate to other services and stay in touch, but many will inevitably abandon the idea of direct subscription to blogs entirely. In the next few months, tens of thousands of small blogs will lose direct contact with a large fraction of their readers. The truth is this: Google destroyed the RSS feed reader ecosystem with a subsidized product, stifling its competitors and killing innovation. It then neglected Google Reader itself for years, after it had effectively become the only player. Today it does further damage by buggering up the already beleaguered links between publishers and readers. It would have been better for the Internet if Reader had never been at all. ^1Yes, I'm aware that there are a few hardy outliers still playing in this place. My own logs show that their reach is insignificant, though, and when I tried to shift my subscriptions about a year ago, there was nothing as good as Reader itself. Once NewsBlur's</a> servers have recovered, I definitely plan to give it another shot. </div> Things I found on GitHub: aspell custom dictionary entries 2013-02-26T00:00:00+00:00 I've been doing a series of posts looking at data gathered with ghrabber</a>, a simple tool I wrote that lets you grab files matching a search specification from GitHub. Last week, I looked at shell history</a> in the broad, and then specifically at pipe chains</a>. Today, I move on to something different - custom aspell</a> dictionaries. When aspell finds a word it doesn't recognize, the user is prompted to correct it, ignore it, or add it to a custom dictionary so that it will be recognized as correct in future. These words are written to the user's custom dictionary - a file named .aspell_en_pw that lives in the user's home directory. It turns out that 30 people have checked aspell dictionaries into GitHub, containing a total of 9501 custom words. The chart below shows the top 50 words, with the X-axis showing the percentage of files the word appeared in. </a> </div> There were a few requests for the raw data behind the previous two posts, so this time round you can also download a CSV file</a> with the occurrence totals for each word in the dataset. Things I found on GitHub: pipe chains 2013-02-22T00:00:00+00:00 Earlier this week I published ghrabber</a>, a simple tool that lets you grab files matching an arbitrary search specification from GitHub. I used ghrabber to retrieve all the bash_history and zsh_history files accidentally checked in to repos, and took a light look at the dataset with some simple graphs</a>. In total, I obtained 234 shell history files with 165k individual command entries. This is a very rare opportunity to "shoulder-surf", to actually see what people do at the command prompt, and perhaps get some insights into how to improve things. Along those lines, today's post looks at pipe chains - that is, compound commands that pipe the output of one command to another. The pipe operator lies at the core of the Unix command-line philosophy. The fact that we can easily compose complex operations is the reason why we are able to write small tools that "do one thing well" without losing generality. The shell history data on Github can give us some real data about what people do with composed commands, and how they do it. </a> </div> It turns out that about 2% of all commands issued on the command-line use pipes. The graph above shows the prevalence the most common pipe chains - that is, what percentage of the user in my sample used each chain. There's a lot of fascinating stuff we can read straight from this image. Starting at the top, the first thing we notice is how widely used the ps | grep chain is. About 17% of users in my sample used this chain - given the type of data we have, the real-world prevalence would surely be higher still. I've just been extolling the virtues of small tools and composability, but in this case practicality should beat purity. I suggest that everyone should have a command-alias similar to this in their shell configuration: alias pg="ps aux | grep"</code></pre> I've added this to my .zshrc today, and I've already used it twice. Next up, we have the ls | grep pipes. The vast majority of uses here could actually be accomplished using the shell's filename generation mechanism. This ranges from simple redundancies like grepping for file extensions, to performing quite complex matching operations that could be done using the shell's advanced glob operations. I'm guilty of this myself - I rarely use features like recursive globbing, expansions using character ranges, case insensitive globbing, and so forth. I've brushed up on filename expansion for my chosen shell</a>, and perhaps you should too. The last thing I want to point out is a pattern that's genuinely dangerous - curl | bash, along with its cousins curl | sh and wget | sh. Unfortunately, this has become the recommended installation pattern for some tool - the vast majority of invocations here are for RVM</a> and Yeoman</a>. I don't think it's a good idea to pipe anything from the web straight into a local shell, but the situation is made particularly dire by the fact that almost half of these invocations are either over plain HTTP or explicitly turn certificate validation off. I'll stop here, although there are interesting things to say about nearly every entry in the graph above. Next week, I'll move on from the shell history sample, look at some other juicy datasets extracted using ghrabber. Things I found on GitHub: shell history 2013-02-19T00:00:00+00:00 Github recently introduced hugely improved code search</a>, one of those rare moments when a service I use adds a feature that directly and measurably measurably improves my life. Predictably, there was soon a flurry</a> of</a> breathless</a> stories about the security implications. This shouldn't have been news to anyone - by now, it should be clear that better search in almost any context has security or privacy implications, a law of the universe almost as solid as the second law of thermodynamics. We saw this with Google's own code search</a>, as well as Google proper</a>, Facebook's Graph Search</a> and even Bing</a>. A certain fraction of people will always make mistakes, and and any sufficiently powerful search will allow bad guys to find and take advantage of the outliers. After the dust had settled a bit I started wondering what else we could do with Github's search - other than snookering schmucks who checked in their private keys. I'm always enticed by data, and the combination of search and the ability to download raw checked-in files seemed like a promising avenue to explore. Lets see what we can come up with. ghrabber</a> - grab files from GitHub</h2> First, some tooling. I've just released ghrabber, a simple tool that lets you grab all files matching a search specification from GitHub. Here, for instance, is an obvious wheeze - fetching all files with the extension ".key": ./ghrabber.py "extension:key"</code></pre> Downloaded files are saved locally to files named user.repository. Existing files with the same name are skipped, which means that you can reasonably efficiently stop and resume a ghrab. Shell history files</h2> I've been having a lot of fun exploring Github with ghrabber. I'll return to this in future posts - today I'll start with a quick illustration of what can be done. One type of difficult-to-find information that is sometimes checked in to repos is shell history. Two simple ghrabber commands for the two most popular shells is all we need: ./ghrabber.py "path:.bash_history"</code></pre> and ./ghrabber.py "path:.zsh_history"</code></pre> After cleaning the data a bit, I had 234 history files varying in length from 1 line to just over 10 thousand, containing a total of 165k entries. I fed this into Pandas</a> for analysis, parsing each command using a combination of hand-hacked heuristics and the built-in shlex</a> module. The remainder of this post is a light exploration of some approaches to this dataset, steering clear of the obvious and tediously well-covered security implications. </a> </div> One way to slice the data is to look at the percentage of history files a given command appears in. This gives us a nice listing of the top commands by user prevalence, which you can see in the graph on the left above. On the right, I've taken the same list of commands, and checked how many invocations are preceded by a man lookup for the command. This gives us an idea of which commonly-used commands have difficult or unintuitive interfaces. It's interesting that ln is right at the top of the list, considering how simple the command syntax is. My theory is that everyone forgets the order of the source and target files. </a> </div> </a> </div> Since we have a list of the most widely used commands, it's also trivial to do silly popularity comparisons. Above is the obvious look at the state of the editor wars (vim is winning, folks), and a check on how tmux</a> is doing in supplanting screen (the faster the better). </a> </div> </a> </div> </a> </div> </a> </div> Another interesting thing to do is to look at the most commonly used flags to commands. I think having "real data" of command use may well guide us to design better command-line interfaces. I'd love to know the most common invocation flags for some of the tools I write. I'll stop there. The data pool in this case is very deep, and there are a huge range of interesting bits of command-line ethnography that could be done. Stay posted for more in the coming weeks. The trouble with social news 2013-01-24T00:00:00+00:00 There is something terribly awry with the social news ecosystem. This is a feeling that's been growing on me over the last few years, and is the reason why I've cut both Reddit</a> and Hacker News</a> (who together constitute pretty much all of "social news") out of my information diet. Although I've mulled over things in various conversations, I've never actually tried to put my feeling of unease in writing, until today. What's spurring me into action is a proposal by Yann LeCun</a> that a model similar to social news be adopted for scientific peer review - self-assembled Reviewing Entities voting on streams of submitted papers, regulated by a reputation system for authors and reviewers. Basically, this is science a la Reddit: complete with subreddits, karma and upboats. I find the idea frankly terrifying. I guess it's time, then, to put finger to keyboard and lay out what disquiets me about social news. Karma Corrupts</h2> You start by introducing a reputation mechanism like karma</a> to improve some outcome - say, to increase the quality of comments, or to apply a threshold to restrict voting to trustworthy community members. This seems like a plausible and even elegant mechanism at first, until you discover the terrible side-effects. Humans are fundamentally status-seeking social apes, and you've now introduced a visible measure of social worth that people will be driven to maximize. In the real world, we have a word for those who spend their lives accumulating karma - we call them politicians. And so, within karma communities, we see the rise of a political class - persuasive centrists who cater (perhaps unconsciously) to a constituency, and who express (perhaps eloquently) opinions calculated to appeal to the masses and avoid controversy. Hacker News and many subreddits are dominated by people like this, whose comments are largely predictable and rarely add anything new or unexpected to the conversation. At the bottom end of the food chain, we have a different class of creature with the same basic aim as the politicians, but without the persuasive charm needed to pull off the political approach. These are the karma whores, who use a mixture of frank pandering, provocation and calculated outrage to achieve the same aims. The karma maximization game often acts contrary to the goals we aimed to achieve by introducing karma in the first place: the tenor of the community suffers, the diversity of opinion declines, and the karma whores post pictures of their cats everywhere. The Lossy Sieve</h2> Go and have a look at the new story submission queue</a> on Hacker News. Scroll through a few pages, and pay attention to the stories stuck at one vote - they will most likely never receive another upvote and will die in obscurity. Now, go look at the front page</a>. When I do this exercise I'm struck by the fact that there's plenty of crap on the front page, and quite a bit of good stuff in the submission queue languishing in obscurity. So, quality can't be the sole metric here - what determines what gets onto the front page and what doesn't? Lets try a thought experiment. First, set up a small number of voting accounts - say, 10 or so. Now, in the new submission queue, pick 5 random stories every hour, and give them a small number of upvotes soon after they are submitted. I predict that you will find that stories that received this small initial boost are vastly more likely to end up on the front page. If I'm right, then chance dominates story selection - as long as an article exceeds some basic quality threshold, it all depends on who happens to see the story soon after it is submitted, and whether the spirit moves them to vote. Note that this is not the case at the extremes - frankly bad content won't be upvoted, and really important stories will usually find their way to the top. The lossy sieve phenomenon affects everything in between. What this boils down to is that social news doesn't provide an effective filter - good content gets lost, and mediocre content finds its way onto our screens. The Pinhole Effect</h2> In social news, the front page is king. Most users never go beyond the first or second page of top stories. However, front-page real estate is incredibly limited compared to the volume of submissions on most popular subreddits and on Hacker News. The effect of this is that we're looking at a fast-flowing river of information through a pinhole. Even assuming that the selection mechanism works flawlessly, what you see on the front page is a small sliver of the total, chosen through a consensus mechanism that takes no account of individual variation in tastes and interests. The news you see is not tailored to you - it's tailored to some abstract, average participant, with all the rough edges of individuality smoothed away. The effect of this is that even at its best, the stories that emerge from the social news system feel like a predictable pablum dished up by the hivemind. The subreddit system tries to improve this by allowing communities to self-assemble around interests, but the pinhole effect still dominates in busy subreddits like /r/programming</a>. Gaming The System</h2> Social news systems are eminently gameable, and cheating is rife. Part of the reason for this is that a story's destiny depends on a relatively small number of votes. If your story has any merit at all, you significantly increase the likelihood that it will end up on the front page by giving it a small nudge at the beginning of its life. If it has no merit whatsoever, you can still force it onto people's screens with a few tens or hundreds of votes. Conversely, you can use the same effect to censor and oppress views you disagree with if your social news site has downvotes. Anyone who's kept an eye on these things can rattle off examples of gaming in action: the voting rings</a>, the "social media consultants"</a>, the vigilante thought-polizei</a>, the political operators</a>, and dozens of other types of manipulation and villainy. What's more - these visible scandals are just the tip of the iceberg. Eyeballs are valuable, and there's an active arms race with social news sites on the one side, and a dark army of spammers, scammers and true believers on the other. How much of what we see is affected by this type of cheating? We just don't know, but my suspicion is that the effect is significant. The point here is broader than any particular instance of gaming. It's that social news sites are structurally susceptible to manipulation in ways that can't be fixed without changing the core of their operation. A system like this might be good enough to deliver rage comics</a>, but I feel queasy trusting it any further. Community Collapse Disorder</h2> My final beef with social news is a problem that it shares with pretty much all online communities, especially technical ones. We're all familiar with the life-cycle of technical forums. They start with a small community of insiders who create value, which then attracts more people to participate, which then dilutes the quality of the contributions (and often introduces a few pathological bad actors), which then causes the good contributors to move on, which causes the magic well to dry up. Everyone then take their toys and move to the next community, and the cycle repeats. We saw this with Usenet and the original C2 wiki, and we are seeing it now with Hacker News and many technical subreddits all at various points in this life-cycle. I believe that Community Collapse Disorder is one of the Big Problems online that we don't yet have a satisfactory solution to. People are trying, though. Hacker News, for instance, seems to be rather poignantly aware of its own decline</a>, with some of the best of the old-timers calling for an alternative</a>. Paul Graham himself recognizes the issue, and has been tweaking things in various ways to combat the phenomenon, without much success. At the moment, we just don't know how to build online communities that are both inclusive and stable. Democracy, here, seems to lead inevitably to decline, and social news sites are no exception. A better way forward?</h2> A big part of the reason I don't use social news anymore is that my existing social networks have become so much more effective at turning up good content. The absolute best source of news for me is simply the set of links shared by the folks I follow on Twitter</a>. I follow people who post interesting content, and whom I trust to act as information filters for me. Most of them share my technical interests, but some are interesting because they are from my home town, or because they share some more esoteric pursuit with me. So, the news stream I see is exactly tailored to me. At the same time, there is also room idiosyncrasy - if someone I follow shares something left-field that tickles their fancy, I'll see it. In turn, I try to be a responsible information filter for those who follow me - I find a link or two worth tweeting on most days. There are still things I miss - Twitter is great for sharing links, but is an awful medium for technical discussion. Google+</a> could be a better alternative, but just doesn't seem to have achieved liftoff for me. I would also love better tools for aggregating and harvesting links from my social network. At the moment I use Flipboard</a> and Prismatic</a>, but I have issues with both. On the whole, though, these are quibbles. It seems to me that using social networks to filter news is a better way forward - if I was tackling the social news problem, I'd be building tools to support this process. Go: a nice language with an annoying personality 2013-01-18T00:00:00+00:00 Last week, I had the pleasure of attending Dropbox</a>'s annual company hack fest</a>. It was a great opportunity to get a look at how Dropbox works internally, and mingle with the smart and driven folks who make one of my favourite products. In the spirit of hack week, me and my friend @alexdong</a> decided to do our project in Go. We'd both wanted to explore the language, but had never quite been able to make time - a week-long code holiday seemed to be the perfect opportunity. I was hopeful that Go would turn out to hit a magical sweet spot: a light set of abstractions hugging close to the machine, while still providing the indoor plumbing and civilized conveniences of life that I had grown used to with languages like Python. Five days of furious hacking later, I can report that Go might well deliver on this promise, but has enough annoying personality quirks that I will think twice about basing any more projects on it. My main beef with Go has nothing to do with fundamental language design, and may seem almost inconsequential at first glance. The Go compiler treats unused module imports and declared variables as compile errors. This is great in theory and is something you might well want to enforce before code can be committed, but during the actual process of producing code it's nothing but an irksome, unnecessary pain in the ass. Let's look at a concrete example, starting with a snippet of code as follows 1</a> import ( "io/ioutil" ) ... ... m, err := ioutil.ReadFile(path) if err != nil { return nil, err } ... ... DoSomething(m)</code></pre> I'm a firm believer that printing stuff to screen is a programmer's best debugging tool, so say we're hacking away and want to print the value of m while running our unit tests. We change the code as follows, adding an import for the "fmt" module and a call to Print: import ( "io/ioutil" "fmt" ) ... ... m, err := ioutil.ReadFile(path) if err != nil { return nil, err } fmt.Print(m) ... ... DoSomething(m)</code></pre> Now we keep hacking, and want to comment out the print statement for a moment like so: import ( "io/ioutil" "fmt" ) ... ... m, err := ioutil.ReadFile(path) if err != nil { return nil, err } //fmt.Print(m) ... ... DoSomething(m)</code></pre> This is a compile error. We have to switch contexts, move to the top of the module, also comment out the import, and then move back to the spot we're really hacking on: import ( "io/ioutil" //"fmt" ) ... ... m, err := ioutil.ReadFile(path) if err != nil { return nil, err } //fmt.Print(m) ... ... DoSomething(m)</code></pre> A few seconds later, we want to re-enable the Print statement - so up we go again to the top of the module to re-enable the import. This is even worse when we want to, say, comment out the DoSomething call while hacking: import ( "io/ioutil" ) ... ... m, err := ioutil.ReadFile(path) if err != nil { return nil, err } ... ... //DoSomething(m)</code></pre> This is also a compile error because now m is unused. We have to hunt up in our code to find the declaration, which could be explicit or implicit using an := assignment. So, in this case we find the declaration, and use the magic underscore name to throw the offending value away: import ( "io/ioutil" ) ... ... _, err := ioutil.ReadFile(path) if err != nil { return nil, err } ... ... //DoSomething(m)</code></pre> That should fix it, right? Well, no. It turns out we've previously declared and used err (a very common idiom), so this is still a compile error. We're using the "declare and assign" syntax, but have no new variables on the left-hand side of the ":=". So we need to make another tweak: import ( "io/ioutil" ) ... ... _, err = ioutil.ReadFile(path) if err != nil { return nil, err } ... ... //DoSomething(m)</code></pre> Five seconds later, we want to re-enable DoSomething, and now we have to unwind the entire process. The cumulative effect of all this is like trying to write code while someone next to you randomly knocks your hands off the keyboard every few seconds. It's a pointlessly pedantic approach that adds constant friction to your write-compile-test cycle, breaks your flow, and just generally makes life a little harder for very little benefit. There's no way to turn this mis-feature off, no flag we can pass to the compiler to temporarily make this a warning rather than an error while hacking2</a>. The irony of the situation is that I agree with the sentiment behind this. I don't want dangling variables or imports in my codebase. And I agree that if something is worth warning about it's worth making it an error. The mistake is to confuse the state we want at the conclusion of a unit of hacking3</a>, with what we need at every point in between, during the write-compile-test cycle. This cycle is the core of the process of actually producing code, and the exhilarating sense of weightlessness</a> that you get when hacking in Python is largely due to the fact that the language works really, really hard to optimize this process. Go has given away this feeling of exhilaration, basically for nothing. Despite all this, it's still possible that the benefits of Go do outweigh its irritating personality. Interfaces, memory management, first-class concurrency and static type checking is a knockout combination, and the language in general has something of the taut practicality that I love in C. So, despite the rantiness of this post, I'll keep hacking on our project and make sure I produce a few thousand more lines of code before making a final call on the language. Look for a project release and a blog post along these lines in the coming months. ^{1 Ellipses indicate "an arbitrary amount of intervening code" </div> ^{2 I edited this paragraph a bit for tone. I originally accused the Go documentation of being faintly smug about all of this - which is not fair, and doesn't add anything to the argument. </div> ^{3 Why don't we have a word for this? By "unit of hacking", I mean the work that goes on between starting to hack on a change-set and doing a commit. At the beginning and at the end, the code is in a clean state, but in between there are many periods of transition where cleanliness requirements are relaxed. </div>}}} Released: pathod 0.3 2012-11-16T00:00:00+00:00 I've just released pathod 0.3</a>, which beefs up pathoc</a>'s fuzzing capabilities, improves the spec language and includes lots of bugfixes and other small tweaks. Get it while it's hot! Better fuzzing</h2> A major focus of this release is to improve pathoc</a>'s capabilities as a basic fuzzing tool. I've had fun breaking webservers</a> with pathoc, and it's even come in handy in my Day Job. Here's a quick summary of how things have changed. The -x flag tells pathoc to explain its requests. This prints out an expanded pathoc query specification, with all randomly generated content and query modifications resolved. If you trigger an exception, you can precisely replay the offending query using this explanation.</li> The options for outputting requests and responses have been expanded hugely. First, the -q and -r flags tell pathoc to dump complete records of requests and responses respectively. This data is sniffed by instrumenting the socket, so is canonical regardless of our ability to interpret returned data. The -x option makes pathod dump this data in hexdump format (otherwise unprintable characters are escaped to preserve your terminal).</li> A number of options have been added to let you ignore expected responses. -C takes a comma-separated list of response codes to ignore. -T ignores server timeouts. This lets you hone in on the exceptional responses that you care about, and ignore the rest.</li> </ul> Language improvements</h2> I've simplified response specifications by making the response message a standard component with the "r" mnemonic.</li> I've added the "u" mnemonic to request specifications, as a shortcut for specifying the User-Agent header:</li> </ul> get:/:u"My Weird User-Agent"</code></pre> We also have a small library of representative User-Agent strings that can be used instead of specifying your own. For example, this specifies the GoogleBot User-Agent string: get:/:ug</code></pre> The list of available shortcuts are in the docs, and can be listed from the commandline using the --show-uas flag to pathoc: > ./pathoc --show-uas User agent strings: a android l blackberry b bingbot c chrome f firefox g googlebot i ie9 p ipad h iphone s safari</pre></code></pre> pathoc: break all the Python webservers! 2012-09-27T00:00:00+00:00 A few months ago, I announced pathod</a>, a pathological HTTP daemon. The project started as a testing tool to let me craft standards-violating HTTP responses while working on mitmproxy</a>. It soon became a free-standing project, and has turned out to be incredibly useful in security testing, exploit delivery and general creative mischief. In the last release, I added pathoc - pathod's malicious client-side twin. It does for HTTP requests what pathod does for HTTP responses, and uses the same hyper-terse specification language</a>. In this post, I show how pathoc can be used as a very simple fuzzer, by finding issues in a number of major pure-Python webservers. None of the tested servers failed catastrophically - they all caught the unexpected exception and continued serving requests. None the less, I think it's reasonable to say that we've triggered a bug if a) the server returns an 500 Internal Server Error response or terminates the connection abnormally, and b) we see a traceback in our logs. In fact, by this definition, I found bugs in every pure-Python server I tested. All of the problems I list below are simple failures of validation - what they have in common is that somewhere in the project code is called with input that it doesn't expect and can't handle. This matters - in fact, I'd argue that the majority of security problems fall in this category. It's interesting to ponder why this type of issue is so ubiquitous in Python servers. I have no doubt that part the answer lies in Python's use of exceptions - errors that would be explicit in other languages can be implicit in Python, and code that seems clean and intuitive might in fact be buggy. I think this is especially relevant right now, given the recent flurry of discussion surrounding the Go language</a> and its error handling. It's pretty instructive to read Russ Cox's recent riposte</a> to this post</a> criticizing Go's explicit approach, while looking at the bugs below. I love Python</a> and I think it's a fine language, but I also think the designers of Go probably made the right choice. Basic fuzzing with pathoc</h2> My methodology for these tests was very simple indeed. I launched each server in turn, and used pathod to fire corrupted GET requests at the daemon until I saw an error. I then looked at the logs, and boiled the distinct cases down to a minimal pathoc specification by hand. This exercises a rather shallow set of features in the server software - mostly parsing of the HTTP lead-in and request headers. It's possible to give software a much, much deeper workout with pathoc, but I'll leave that for a future post. My pathoc fuzzing command looked something like this: pathoc -n 1000 -p 8080 -t 1 localhost 'get:/:b@10:ir,"\x00"'</code></pre> The most important flags here are -n, which tells pathoc to make 1000 consecutive requests, and -t, which tells pathoc to time out after one second (necessary to prevent hangs when daemons terminate improperly). The request specification itself breaks down as follows: get</td> Issue a GET request</td> </tr> /</td> ... to the path / </td> </tr> b@10</td> ... with a body consisting of 10 random bytes </td> </tr> ir,"\x00"</td> ... and inject a NULL byte at a random location.</td> </tr> </table> It's that last clause - the random injection - that makes the difference between simply crafting requests and basic fuzzing. Every time a new request is issued, the injection occurs at a different location. I varied the injected character between a NULL byte, a carriage return and a random alphabet letter. Each exposed different errors in different servers. For a complete description of the specification language, see the online docs</a>. Results</h2> For each bug, I've given a traceback and a minimal pathoc call to trigger the issue. The tracebacks have been edited lightly to shorten file paths and remove irrelevances like timestamps. CherryPy</h3> pathoc -p 8080 localhost 'get:/:b@10:h"Content-Length"="x"'</code></pre>ENGINE ValueError("invalid literal for int() with base 10: 'x'",) Traceback (most recent call last): File "cherrypy/wsgiserver/wsgiserver2.py", line 1292, in communicate req.parse_request() File "cherrypy/wsgiserver/wsgiserver2.py", line 591, in parse_request success = self.read_request_headers() File "cherrypy/wsgiserver/wsgiserver2.py", line 711, in read_request_headers if mrbs and int(self.inheaders.get("Content-Length", 0)) > mrbs: ValueError: invalid literal for int() with base 10: 'x'</code></pre>pathoc -p 8080 localhost 'get:/:i4,"\r"</code></pre>ENGINE TypeError("argument of type 'NoneType' is not iterable",) Traceback (most recent call last): File "cherrypy/wsgiserver/wsgiserver2.py", line 1292, in communicate req.parse_request() File "cherrypy/wsgiserver/wsgiserver2.py", line 580, in parse_request success = self.read_request_line() File "cherrypy/wsgiserver/wsgiserver2.py", line 644, in read_request_line if NUMBER_SIGN in path: TypeError: argument of type 'NoneType' is not iterable</code></pre>Tornado</h3> pathoc -p 8080 localhost 'get:/:b@10:h"Content-Length"="x"'</code></pre>[E 120927 11:42:26 iostream:307] Uncaught exception, closing connection. Traceback (most recent call last): File "tornado/iostream.py", line 304, in wrapper callback(*args) File "tornado/httpserver.py", line 254, in _on_headers content_length = int(content_length) ValueError: invalid literal for int() with base 10: 'x' [E 120927 11:42:26 ioloop:435] Exception in callback <tornado.stack_context._StackContextWrapper object at 0x1012e28e8> Traceback (most recent call last): File "tornado/ioloop.py", line 421, in _run_callback callback() File "tornado/iostream.py", line 304, in wrapper callback(*args) File "tornado/httpserver.py", line 254, in _on_headers content_length = int(content_length) ValueError: invalid literal for int() with base 10: 'x'</code></pre>pathoc -p 8080 localhost 'get:/:h"h\r\n"="x"'</code></pre>[E iostream:307] Uncaught exception, closing connection. Traceback (most recent call last): File "tornado/iostream.py", line 304, in wrapper callback(*args) File "tornado/httpserver.py", line 236, in _on_headers headers = httputil.HTTPHeaders.parse(data[eol:]) File "tornado/httputil.py", line 127, in parse h.parse_line(line) File "tornado/httputil.py", line 113, in parse_line name, value = line.split(":", 1) ValueError: need more than 1 value to unpack [E ioloop:435] Exception in callback <tornado.stack_context._StackContextWrapper object at 0x1012bd7e0> Traceback (most recent call last): File "tornado/ioloop.py", line 421, in _run_callback callback() File "tornado/iostream.py", line 304, in wrapper callback(*args) File "tornado/httpserver.py", line 236, in _on_headers headers = httputil.HTTPHeaders.parse(data[eol:]) File "tornado/httputil.py", line 127, in parse h.parse_line(line) File "tornado/httputil.py", line 113, in parse_line name, value = line.split(":", 1) ValueError: need more than 1 value to unpack</code></pre>Twisted</h2> pathoc -p 8080 localhost 'get:/:b@10:h"Content-Length"="x"'</code></pre>[HTTPChannel,4,127.0.0.1] Unhandled Error Traceback (most recent call last): File "twisted/python/log.py", line 84, in callWithLogger return callWithContext({"system": lp}, func, *args, **kw) File "twisted/python/log.py", line 69, in callWithContext return context.call({ILogContext: newCtx}, func, *args, **kw) File "twisted/python/context.py", line 118, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "twisted/python/context.py", line 81, in callWithContext return func(*args,**kw) --- <exception caught here> --- File "twisted/internet/selectreactor.py", line 150, in _doReadOrWrite why = getattr(selectable, method)() File "twisted/internet/tcp.py", line 199, in doRead rval = self.protocol.dataReceived(data) File "twisted/protocols/basic.py", line 564, in dataReceived why = self.lineReceived(line) File "twisted/web/http.py", line 1558, in lineReceived self.headerReceived(self.__header) File "twisted/web/http.py", line 1580, in headerReceived self.length = int(data) exceptions.ValueError: invalid literal for int() with base 10: 'x'</code></pre>SimpleHTTP</h2> pathoc -p 8080 localhost 'get:"/\0"'</code></pre>Exception happened during processing of request from ('127.0.0.1', 54029) Traceback (most recent call last): File "lib/python2.7/SocketServer.py", line 284, in _handle_request_noblock self.process_request(request, client_address) File "lib/python2.7/SocketServer.py", line 310, in process_request self.finish_request(request, client_address) File "lib/python2.7/SocketServer.py", line 323, in finish_request self.RequestHandlerClass(request, client_address, self) File "lib/python2.7/SocketServer.py", line 638, in __init__ self.handle() File "python2.7/BaseHTTPServer.py", line 340, in handle self.handle_one_request() File "lib/python2.7/BaseHTTPServer.py", line 328, in handle_one_request method() File "lib/python2.7/SimpleHTTPServer.py", line 44, in do_GET f = self.send_head() File "lib/python2.7/SimpleHTTPServer.py", line 68, in send_head if os.path.isdir(path): File "lib/python2.7/genericpath.py", line 41, in isdir st = os.stat(s) TypeError: must be encoded string without NULL bytes, not str</code></pre>Waitress</h3> pathoc -p 8080 localhost 'get:/:i16," "'</code></pre>ERROR:waitress:uncaptured python exception, closing channel <waitress.channel.HTTPChannel connected 127.0.0.1:62330 at 0x1007ca310> ( <type 'exceptions.IndexError'>:list index out of range [lib/python2.7/asyncore.py|read|83] [lib/python2.7/asyncore.py|handle_read_event|444] [lib/python2.7/site-packages/waitress/channel.py|handle_read|169] [lib/python2.7/site-packages/waitress/channel.py|received|186] [lib/python2.7/site-packages/waitress/parser.py|received|99] [lib/python2.7/site-packages/waitress/parser.py|parse_header|158] [lib/python2.7/site-packages/waitress/parser.py|get_header_lines|247] )</code></pre> Edit: The first version of this post had examples that were due to the test WSGI application, not waitress. I've replaced them with the traceback above, which has been reformatted for clarity. Werkzeug</h3> pathoc -p 8080 localhost 'get:/:h"Host"="n\r\0"'</code></pre>Traceback (most recent call last): File "flask/app.py", line 1518, in __call__ return self.wsgi_app(environ, start_response) File "flask/app.py", line 1507, in wsgi_app return response(environ, start_response) File "/usr/local/lib/python2.7/site-packages/werkzeug/wrappers.py", line 1082, in __call__ app_iter, status, headers = self.get_wsgi_response(environ) File "werkzeug/wrappers.py", line 1070, in get_wsgi_response headers = self.get_wsgi_headers(environ) File "werkzeug/wrappers.py", line 986, in get_wsgi_headers headers['Location'] = location File "werkzeug/datastructures.py", line 1132, in __setitem__ self.set(key, value) File "werkzeug/datastructures.py", line 1097, in set self._validate_value(_value) File "werkzeug/datastructures.py", line 1065, in _validate_value raise ValueError('Detected newline in header value. This is ' ValueError: Detected newline in header value. This is a potential security problem</code></pre> Limits of data visualization with space filling curves 2012-09-20T00:00:00+00:00 I recently wrote a series</a> of posts</a> using the Hilbert curve</a> to visualize binaries, culminating in a gallery showing regions of high entropy in malware</a>. </a> </div> The fact that the Hilbert curve has excellent locality preservation means that one dimensional features are preserved (as much as they can be) in the two-dimensional layout. This lets us visually pick out features of interest, and makes it possible, for instance, to quickly identify different malware packers just based on their layout characteristics. An obvious next step is to ask if it's possible to extend this idea to let us visually compare binaries, creating a sort of visual diff. Unfortunately, we now bump our heads against the limitations of space-filling curve visualization. I made the animation below after a recent conversation along these lines, and I think it illustrates the main issues nicely. It shows a single contiguous stretch of data (the black area) being shifted progressively through a binary. At each timestep, the only thing that changes is the starting location of the data block: </a> </div> Two things are immediately clear: The block of data doesn't retain its shape at different offsets - identical stretches of data can look totally different depending on their locations.</li> There's no way to quickly see where in the binary a piece of information lies. Unless you are very familiar with the particular curve and know its exact orientation, you can't say, for instance, when the data block lies a third of the way through the binary.</li> </ul> It's often worthwhile to trade off these things for locality preservation, but it definitely scotches certain use cases. I do wonder if it might be possible to tune the trade-off somewhat - sacrificing some locality preservation for better shape retention and offset estimation. I've toyed with some ideas along these lines (see the unrolled layouts in the binary visualization post</a>), but I still don't have a satisfying solution. If anyone out there knows of one, drop me a line. Findng the UDID leak: a guessing game 2012-09-07T00:00:00+00:00 It's become quite a popular parlor game to guess who is responsible for the recent Antisec UDID leak. I've now seen no less than six separate apps named as the probable source (two of which came from Marco Arment</a>). Before we pick the next culprit, I think it's worth taking a step back to consider the list of things we don't know: We don't know that we're dealing with just one source. The Antisec dump may well be an amalgam of data from various sources.</li> We don't know that we're looking for just one app, or even a set of apps by one developer. The leak may well come from one of the myriad of 3rd party services which could be included in thousands of apps.</li> We don't know that Antisec is being truthful about the scale of the database, or the additional data they claim is associated with the UDID/APNS records.</li> We certainly don't know that the data was filched from an FBI laptop or that the NCFTA was in any way involved.</li> </ul> Given all of these unknowns, I think a simple process-of-elimination approach to tracking down the leak will probably be fruitless, or worse, result in the finger being pointed at even more innocent parties. The one entity that may already have the answer to this question is Apple. They have a list of a million affected UDIDs, and they presumably have records of all apps that have ever used the associated push tokens. Given a large and precise sample like this, it should be possible to find the origin(s) of the leak reasonably easily. Indeed, if Apple is on the ball they may already have done this. Now for some frank speculation of my own. Let's assume for a moment that Antisec has been entirely truthful about the data, and that we're dealing with a single source. In that case, we're looking for: ... an app or third-party service integrated into multiple apps</li> ... with 12 million or more users</li> ... that is APNS-enabled</li> ... which also gathers user data like real names and zip codes.</li> </ul> I'll throw my hat in the ring and say that my money is on a third-party service, not a single app. If my hunch is right, the list of possible culprits is actually rather short. The UDID leak is a privacy catastrophe 2012-09-04T00:00:00+00:00 Something I've been worrying about for a long time has just happened: Antisec has leaked a database with more than a million UDIDs</a>. The UDID issue has been a bit of a white whale of mine - I've written many blog posts about it and spent more hours than I care to think negotiating responsible disclosure with companies misusing UDIDs. Let's recap some of the posts I've written about this: In May 2011</a>, just before its sale to Gree was announced, I showed that OpenFeint</a> was misusing UDIDs in a way that allowed you to link a UDID to a user's identity, geolocation and Facebook and Twitter accounts. I didn't discuss it openly at the time, you could also completely take over an OpenFeint account, and access chat, forums, friends lists, and more using just a UDID. This resulted in a class-action lawsuit against OpenFeint, which has since petered out.</li> Later that month</a>, I published a survey looking at how UDIDs are used in practice. The data is now slightly out of date, but shows just how widely UDIDs are used and misused.</li> In September 2011</a>, I published the most troubling news so far, which paradoxically also got the least coverage in the press. I looked at all the gaming social networks on IOS - basically OpenFeint and its competitors - and found catastrophic mismanagement by nearly everyone. The vulnerabilities ranged from de-anonymization, to takeover of the user's gaming social network account, to the ability to completely take over the user's Facebook and Twitter accounts using just a UDID.</li> </ul> As serious these problems are, I'm afraid it's just the tip of the iceberg. Negotiating disclosure and trying to convince companies to fix their problems has taken literally months of my time, so I've stopped publishing on this issue for the moment. It's disheartening to say it, but some of the companies mentioned in my posts still have unfixed problems (they were all notified well in advance of any publication). I will also note ominously that I know of a number of similar vulnerabilities elsewhere in the IOS app ecosystem that I've just not had the time to pursue. When speaking to people about this, I've often been asked "What's the worst that can happen?". My response was always that the worst case scenario would be if a large database of UDIDs leaked... and here we are. Defiler 2012-08-26T00:00:00+00:00 I've been living out of a bag for the last 3 weeks, working hard on a series of intense but fun audits. After running in high gear for a while I find that I need a mental palate cleanser - something to help me refocus and stop me from getting snowblind. I then grab my camera, strap on my macro rig, and walk out the door to try to catch the local wildlife in the act. It's become a bit of a game - the aim is to catch creatures in their natural setting and leave them completely undisturbed when I go, with no posing, prodding or other disturbances. Getting a usable shot of a 5mm target sitting on a twig swaying in the wind is a fun challenge. Today I find myself in Sydney, working in a part of the town that is shot through with unreasonably beautiful walking tracks. The place is also blessed with a huge diversity of invertebrate life that makes my adopted home town</a> seem barren by comparison. I walked along a nearby track until I found a quiet, leafy spot, geared up, and leopard-crawled through the underbrush. Not long after, I came face-to-face with this imposing little chap sitting on the tip of a fern frond. </a> </div> This is a Lymantriid</a> caterpillar of some variety, probably one of the tussock moths native to Australia. "Lymantria" means "defiler" - some species of this family can cause huge damage to foliage, and are considered to be destructive pests. So much so, that when a single male Gypsy Moth</a> (Lymantria dispar) was discovered in Hamilton, New Zealand, they sprayed the entire city with a caterpillar-specific bacterial insecticide</a>. No need for drastic measures with this particular fellow, though - he's native to this ecosystem, and the only pest is me and my camera. He was head down munching away when I found him, and paid absolutely no attention to me when I moved in close to get these shots. He's got reason to be cocksure, too - those tufts of hair on his back contain hollow, poison-filled spines that can cause a pretty unpleasant reaction when touched. </a> </div> An few hours exploring and photographing is a very effective brain-cleaner, leaving me ready to deal with spiny, venomous defilers of the digital variety. pathod 0.2: the daemon gets an evil twin 2012-08-22T00:00:00+00:00 I've just pushed pathod 0.2 out the door. This is a huge release, with many new features: pathoc</a>, pathod's evil client-side twin.</li> libpathod.test</a>, a framework for using pathod in your unit tests.</li> Improved mini language</a>, including many new abilities and improvements.</li> A rewrite of the networking core.</li> </ul> The project also has a new website at pathod.net</a>. Yes, pathod is now self-hosting, so you can try out both pathod and pathoc specifications right on the website. There's also a new public pathod instance</a>, which I'm sure everyone will use entirely responsibly. Introducing pathod: a pathological HTTP server 2012-05-01T00:00:00+00:00 I've just released pathod</a>, a pathological HTTP/S daemon useful for testing and torturing HTTP clients. At its core is a tiny, terse language for crafting HTTP responses. It also has a built-in web interface that lets you play with the response spec language, inspect logs, and access pathod's full help document. The rest of this post is a quick teaser showing some of pathod's abilities. See the detailed documentation on the pathod site</a> if you want more. The simplest possible response</h2> The easiest way to craft a response is to specify it directly in the request URL. Lets start with the simplest possible example. Start pathod, and then visit this URL: http://localhost:9999/p/200</code></pre> The "/p/" path is the location of the response generator in pathod's default configuration - everything after that a response specification in pathod's mini-language. The general form of a response spec is as follows: code[MESSAGE]:[colon-separated list of features]</code></pre> In this case, we're specifying only the HTTP response code - that is, an HTTP 200 OK with no headers and no content, resulting in a response like this: HTTP/1.1 200 OK</code></pre>Specifying features</h2> One example of a "feature" is a response header. Lets embellish our response by adding one: 200:h"Etag"="foo"</code></pre> The first letter of the feature - "h", in this case - is a mnemonic indicating the type of feature we're adding. The full response to this spec looks like this: HTTP/1.1 200 OK Etag: foo</code></pre> Both "Etag" and "foo" are Value Specifiers, a syntax used throughout the response specification language. In this case they are literal values, as indicated by the fact that they are quoted strings. The Value Specification syntax also lets us load values from files or generate random data. For instance, here is a specification that generates 100k of random binary data for the header value: 200:h"Etag"=@100k</code></pre> Now, binary data in the header value will probably break things in interesting ways, but is unlikely to be read by the client as a valid (but over-long) value. To see if the client really drops off its perch if we feed it a single 100k header, we have to constrain the random data. Here's the same response, but with data generated only from ASCII letters: 200:h"Etag"=@100k,ascii_letters</code></pre> pathod has a large number of built-in character classes from which random data can be generated. Pauses and Disconnects</h2> Next, we can disrupt the communications in various ways. At the moment, this means adding pauses and disconnects to a response. Let's start with an HTTP 404 response with a body consisting of a 100k of random binary data: 404:b@100k</code></pre> Here's the same response, but with a 120 second pause after sending 100 bytes: 404:b@100k:p120,100</code></pre> And, the same response again, but with hard disconnect after sending 100 bytes: 404:b@100k:d100</code></pre> Instead of specifying a time explicitly, we can ask pathod to just randomly disconnect at a time of its choosing: 404:b@100k:dr</code></pre> That's it for the teaser - hopefully it's enough to entice you into looking at pathod</a>'s full documentation. What's next?</h2> pathod is an "airport project" - the first draft was written in its entirety during a 40-hour trip back home from New York (I drew a bad lot in stopovers). I've now firmed it up a bit, but there's still work to be done. In the next month, mitmproxy's test suite will move to pathod, after which there will be a simple, well-documented way to unit test. I also plan to build out the JSON API (which is used to drive pathod in test suites), and expand the mini-language with convenient ways to generate pathological cookies, authentication headers, SSL errors, and cache control. mitmproxy 0.8 2012-04-09T00:00:00+00:00 </a> </div> I'm happy to announce the release of mitmproxy 0.8</a>. This release has a few major new features, big speedups, and many, many small bugfixes and improvements. Here are the headlines: Android interception</h2> The most prominent new feature is that we now have a supported way to intercept Android traffic. What's more, we can do this without a cumbersome transparent proxying rig - see the Android section in the documentation</a> for the details. Special thanks goes to Jim Cheetham</a> for lending me an Android device and helping to get this feature off the ground. Replacement patterns</h2> Another exceedingly useful new feature is replacement patterns</a>. These consist of a filter, a regular expression and a replacement string, and run continuously while mitmproxy processes requests and responses. You can pass these either on the command-line, or using a built-in replacement pattern editor. </a> </div> I'm sure you can immediately think of many uses for this flexible feature, but my favourite is to use it during testing as a way to conveniently inject complicated exploits into web traffic. I do this by setting a replacement pattern that swaps a short but likely unique string (say MYXSS) for a long exploit, and then I use simple interaction and front-end tools like Firebug to inject exploits into requests manually based on the short string marker. Improved pretty-printing of request and response contents</h2> This release of mitmproxy has a completely redesigned subsystem for pretty-printing request and response bodies. For instance, we now extract EXIF tags and other basic information to give you something better than a hex dump when looking at an image: </a> </div> We also have much improved HTML indenting (using lxml</a>), and a built-in JavaScript beautifier (thanks to JSBeautifier</a>) that teases out compressed and obfuscated scripts into something readable. Changelog</h2> Detailed tutorial for Android interception. Some features that land in this release have finally made reliable Android interception possible.</li> Upstream-cert mode, which uses information from the upstream server to generate interception certificates.</li> Replacement patterns that let you easily do global replacements in flows matching filter patterns. Can be specified on the command-line, or edited interactively.</li> Much more sophisticated and usable pretty printing of request bodies. Support for auto-indentation of JavaScript, inspection of image EXIF data, and more.</li> Details view for flows, showing connection and SSL cert information (X keyboard shortcut).</li> Server certificates are now stored and serialized in saved traffic for later analysis. This means that the 0.8 serialization format is NOT compatible with 0.7.</li> Add a shortcut key ("f") to load the remainder of a request or response body, if it is abbreviated.</li> Many other improvements, including bugfixes, and expanded scripting API, and more sophisticated certificate handling.</li> </ul> mitmproxy 0.7 2012-02-27T00:00:00+00:00 </a> </div> I'm happy to announce the release of mitmproxy 0.7</a>. The biggest visible change is a new structured editor for headers, query strings and form fields. Other new feature include a reverse proxy mode, extended script API that makes many common tasks much easier, and a myriad of improvements to the interface (including a massive increase in speed). Everybody still on 0.6 should upgrade - get it here: mitmproxy-0.7.tar.gz</a> (docs)</a></h2> You can also now install mitmproxy using pip</a>, like so: pip install mitmproxy</code></pre> In other news, the project has had an amazing month, after a rash of high-profile results obtained using mitmproxy were published. It started with Arun Thampi's discovery</a> that Path uploads users' address books to their servers. Things snowballed from there, and for a few days mitmproxy seemed to be everywhere. Similar findings were made for Hipster</a>, The Verge</a> did a mitmproxy-driven AddressbookGate expose (including vaguely threatening background shots of mitmproxy doing its dastardly work), and lots of people said nice things on Twitter. To see the impact all of this for the mitmproxy project, you need only look at the Github page</a> - watchers of the repo went from about 200 a month a go, to 950 at the time of this post. Changelog</h2> New built-in key/value editor. This lets you interactively edit URL query strings, headers and URL-encoded form data.</li> Extend script API to allow duplication and replay of flows.</li> API for easy manipulation of URL-encoded forms and query strings.</li> Add "D" shortcut in mitmproxy to duplicate a flow.</li> Reverse proxy mode. In this mode mitmproxy acts as an HTTP server, forwarding all traffic to a specified upstream server.</li> UI improvements - use Unicode characters to make GUI more compact, improve spacing and layout throughout.</li> Add support for filtering by HTTP method.</li> Add the ability to specify an HTTP body size limit.</li> Move to typed netstrings for serialization format - this makes 0.7 backwards-incompatible with serialized data from 0.6!</li> Significant improvements in speed and responsiveness of UI.</li> Many minor bugfixes and improvements.</li> </ul> OpenBSD in decline? 2012-02-26T00:00:00+00:00 My leisurely Sunday activity today is to set up a new OpenBSD</a> firewall for my mobile app testing lab. I haven't done a from-scratch OpenBSD install for years, so I spent some time reading through the change logs for the last few versions to catch up with what's changed. Although the project is clearly still making steady, well-engineered progress, I had the nagging feeling that the rate of change wasn't what it used to be. So, I pulled some numbers from CVS commit message list archives</a>, and graphed them. Here are the number of commits per month from January 2001 to January 2012. The orange line is a simple 12-month moving average: </a> </div> Now, we should be cautious about interpreting this - the number of commits doesn't tell us anything about the quality, importance or magnitude of code change. Even if it did all of these things, there are other and perhaps better measures of a project's health. Still, the trend is clear, and suggests a sustained decline in activity. I just bought some T-shirts</a> to help support one of my favourite open source projects. You should too. Malware 2012-01-05T00:00:00+00:00 Edit: Since this post, I've created an interactive tool for binary visualisation - see it at binvis.io</a> Hover and click for more.}

corte.si

Spacecurve

Installation</h2> spacecurve</strong> is a Rust library for generating a variety of space-filling curves, including Hilbert, Peano, Sierpinski, Moore, and Z-order curves.</p>

spacecurve web</h2> Because egui supports webassembly, I've also deployed the egui app to the web. Access it by clicking below, or on any of the images above.</p>

Generative zoology with neural networks

Some personal thoughts on our national tragedy

mitmproxy v1.0.0: Christmas Edition

mitmproxy v0.18

Hobbes

modd: a flexible tool for responding to filesystem change

mitmproxy v0.15

Trawling Github for cookies, bookmarks and browsing history

devd v0.3

mitmproxy: release v0.14

devd v0.2 (and some thoughts on small tools)

devd: a web daemon for developers

Features</h2>

mitmproxy: release v0.13

mitmproxy v0.12.1

mitmproxy: release v0.12 and some project news

Project News</h2> Before we get to the new release, I'd like to give a quick update on some internal project developments.</p> First up, after a somewhat involved process that included a couple of rounds of community voting and much discussion, we have a new logo:</p>

binvis.io - a browser-based tool for visualising binary data

mitmproxy 0.11.2

mitmproxy and pathod 0.11

Mitmproxy Changelog</h2> Performance improvements for mitmproxy console</li> SOCKS5 proxy mode allows mitmproxy to act as a SOCKS5 proxy server</li>

mitmproxy now supports #gotofail

Exploiting CVE-2014-1266 with mitmproxy

mitmproxy and pathod 0.10

Changelog</h2> Support for multiple scripts and multiple script arguments</li> Easy certificate install through the in-proxy web app, which is now enabled by default</li>

How I Learned to Stop Worrying and Love Golang

mitmproxy and pathod 0.9.2

Introducing choir.io

mitmproxy 0.9.1

Skout: a devastating privacy vulnerability

How mitmproxy works

pathod 0.9

mitmproxy 0.9

Changelog</h2> Upstream certs mode is now the default.</li> Add a WSGI container that lets you host in-proxy web applications.</li> Full transparent proxy support for Linux and OSX.</li>

Google, destroyer of ecosystems

Things I found on GitHub: aspell custom dictionary entries

Things I found on GitHub: pipe chains

Things I found on GitHub: shell history

The trouble with social news

Go: a nice language with an annoying personality

Released: pathod 0.3

pathoc: break all the Python webservers!

Limits of data visualization with space filling curves

Findng the UDID leak: a guessing game

The UDID leak is a privacy catastrophe

Defiler

pathod 0.2: the daemon gets an evil twin

Introducing pathod: a pathological HTTP server

Specifying features</h2> One example of a "feature" is a response header. Lets embellish our response by adding one:</p>

mitmproxy 0.8

mitmproxy 0.7

OpenBSD in decline?

Malware

spacecurve web</h2>
Because egui supports webassembly, I've also deployed the egui app to the web. Access it by clicking below, or on any of the images above.</p>
Web Viewer</a></p>