JavaScript – John Bafford

Twitterslurp open source release

John Bafford — Tue, 30 Jun 2009 22:44:35 +0000

Last month, I wrote about Twitterslurp, the twitter searching tool I developed at The Bivings Group, which displays a constantly-updating stream of tweets, as well as a leaderboard and stats graphs.

Today, we are very happy to release it as open source. You can download Twitterslurp from its Google Code project page at http://twitterslurp.googlecode.com/.

Since last month, I’ve made a lot of changes to improve the quality (and ease of configuration) of the Twitterslurp code. Twitterslurp’s error handling has been improved, and I added the ability to start and stop the tweet stream and show more than the most recent 20 tweets. Our graphics team also created a spiffy logo.

Yesterday and today, Twitterslurp has been driving a video wall of tweets at the Personal Democracy Forum conference in NYC. The conference, which just ended, had over 17,000 tweets in the last two days.

Previously, we ran test versions of Twitterslurp during mysqlconf and php|tek, and officially on behalf of the Dutch PHP Conference. Twitterslurp started as a project for a client to allow them to track tweets, and give members of their website rewards for tweeting with a particular hashtag.

We’ve also set up a copy of Twitterslurp tracking itself.

We’d love for you to check out Twitterslurp, and we’re open to any and all feedback.

Tracking php|tek Tweets With Twitterslurp

John Bafford — Wed, 20 May 2009 17:03:02 +0000

For a client at work a few months ago, I created a Twitter search tool, Twitterslurp, that put all the tweets related to the client’s project on their webpage, updated in (close to) real-time via AJAX.

I’ve since added a lot of features, including a set of graphs, and we’ve set up a version of Twitterslurp for php|tek 2009.

Twitterslurp consists of two parts. The first is a PHP cron job that queries Twitter for new tweets matching a search string and adds them to a database. The second is JavaScript code that interfaces with a PHP script that searches the database for new tweets. In a nutshell, the JavaScript examines the web page for placeholders that represent different types of data (tweets, the leaderboard, and the graphs), and then posts to the PHP script with a JSON object detailing the data it wants. Once PHP responds, also with a JSON object, the JavaScript takes that data, formats it, and updates the page with the new content. The web page updates automatically based on an interval specified in the response object, so in the unlikely event the server came under heavy load, we would be able to easily throttle requests.

The new feature I’m really happy with is the stats page I’ve added, using jQuery and a graphs library called flot.

Flot uses the canvas tag introduced in Safari, and later supported by Firefox and Opera (and Chrome), to draw 2D images (graphs, in this case) within the browser. IE is supported with a library that implements canvas support on top of VML.

Flot is pretty simple to learn and use, thanks to its detailed API documentation and helpful set of examples. It actually has relatively few API functions. For the most part, you implement graphs simply by building a JavaScript object with the graph data and configuration options you want, and tell flot where to put the graph. (Twitterslurp builds this object in PHP and gives it to the JavaScript code via a JSON object.) With a few hours of work, I was able to put together the four graphs shown on the stats page, all of which have a rollover that shows the value of individual data points.

To lessen the server load, the stats page updates less frequently, and the graph data is cached on the server via APC, so the graph data doesn’t have to be regenerated on every request.

Please feel free to take a look, and let me know if you have any comments or suggestions, either here, or via Twitter (@jbafford).

We’re planning on taking what we’ve learned with Twitterslurp for php|tek, clean up the code (which wasn’t intended on providing graph data), adding a few more new features, and releasing it with an open source license in June. Stay tuned for more details…

Aggregate Map Tools, Part 1: GlobalMapTiles

John Bafford — Thu, 26 Feb 2009 20:53:29 +0000

My fourth open source release this week comes from work I’ve done for my employer, The Bivings Group. Today, we are releasing a set of code that assists with aggregating markers on a Google Map. Our clients wanted to be able to display markers on a map reflecting the locations of people who provided their location (city, state, zip, and in some cases, street address), but with tens of thousands of expected sign-ups, it’s not feasible to display all the points on the map at once.

One of the challenges with displaying a map with lots of markers is that if there are too many markers in one location, the placemarks themselves quickly become an unmanageable sea that obscures the map. In some cases, this is the desired effect, but in our case, we wanted to present a cleaner map that showed a single marker that provided an indication of the number of people present in an area. (Less markers also has the advantage of loading faster.)

One solution to this problem is, instead of showing all of the markers all at once, you can group all of the markers that are close together into one. But, how do you determine if two locations are “close together” so that you can do that aggregation?

Geohash

One way is to use a geohash, which takes a latitude and longitude coordinate and, using a hashing algorithm, mixes the two numbers together into a a string that represents the position. More accurately, it represents a box which contains the coordinate in question. For example, the coordinates of the TBG office (38.9193540 °N, 77.0705180 °W) is represented by the geohash dqcjqjncwv2t9. This string has an interesting property: as you remove characters from the string, you get a less accurate representation of the location – the box gets larger. With a list of addresses stored in geohash format, you can do a search for all of the addresses whose geohashes all start with the same characters.

However, because of the way geohashes are encoded, different geohashes have different shapes, there is some overlap between different geohashes, and in some cases, removing a character from a geohash can result in a larger shape on the map that doesn’t cover the entirety of the more specific geohash. As a result, we determined that they weren’t suitable for our needs: because of the overlap, it was possible for one location to appear to be in the space covered by multiple geohashes, thus potentially leading to unreliable counting.

Quadtree

Some further searching led me to maptiler.org, which provided an alternate answer. Since the Earth is a sphere, it can be broken up into quadrants (corresponding to the four hemispheres: north west, north east, south west, and south east), which (with some stretching) are square. Each of those quadrants can be subdivided into subquadrants, and those into more subquadrants, each of which contains one quarter of the total area covered by the quadrant that contains it.

In fact, this is how Google Maps map tiles work. When zoomed all the way out, the entire earth is represented as a 256 x 256 pixel map tile image. Each larger zoom level contains four times the number of map tiles as the previous zoom level, with no overlap. At the maximum zoom level (20), the entire Earth is represented by 4¹⁹ = 274,877,906,944 256 x 256 pixel images.

How does that help us? Well, if we number each of the quadrants (0 = NW, 1 = NE, 2 = SW, 3 = SE), and add a digit for each sub-quadrant we “zoom in” to, we get a string that represents a square on the map at a zoom level equivalent to the length of the string. For example, the coordinates of the TBG office can be represented as 0320100322312003121. If we remove a character from the hash, it’s guaranteed that the larger space contains all the possible locations in the smaller space. There’s no overlap between hashes of the same length. This means we’re guaranteed the accurate counting we wanted. And we can very conveniently scale our searches of the database of addresses with the zoom level of the map window we’re viewing the results in.

Pulling it all Together: GlobalMapTiles

On the client’s map page, whenever the map position or zoom level changes, we send a few ajax queries to the web server with quadtree codes covering the map window. The server obligingly searches the database for the number of people in the area represented by the code, and returns a result list containing the number of people in the quadtree a level or two deeper. We then take the results, convert the quadtree code back into a coordinate on the map, and add a marker at its center.

As luck would have it, there’s a Python script, globalmaptiles.py on the maptiler.org page that handles the conversion from latitude and longitude to these quadtree coordinates. Although Python is a fine language (and some of the most fun code I’ve written has been in Python), we needed this code in PHP and JavaScript. But that’s ok. The code is almost entirely simple mathematical operations, and that’s very easy to port from Python to JavaScript and PHP. This port, plus some additions of my own to facilitate turning a quadtree string back into its bounding coordinates on the map is what we’re releasing today.

This is, of course, only half of the solution. We now have a system by which we can identify locations on the map which are nearby and aggregate them together, but we still need a way to pull that information out of the database and send it on-demand to the web browser to be displayed. That’s a more interesting problem (and solution). I’m still refining that code and adding additional functionality, and we’ll release that soon, when we’re satisfied with it.

As the original implementation was licensed with the X11 license, GlobalMapTiles is also licensed under the X11 license. For more details on how to use it, and to download, see the Aggregate Map Tools page.