-
What do you have in your filesystem? D3.js
I have been playing quite a lot recently with D3.js. I couldn’t resist myself and tried some cool visualizations in the sample gallery. Using this script, you can get the data of files, directories and sizes for a given path, in the appropriate format for a particular D3.js template. The visualization shows blocks for directories…
-
Stay updated on your favourite TV shows
I made a script to get the newest K magnet links of your favourite show from thepiratebay. I wanted to load these links somehow to a bittorrent client, or even implement a torrent downloader in python, but I wasn’t able to compile the library. Meh, I’m not wasting more than an afternoon on this :P.
-
What linux command do you use the most?
I wrote a python script that plots horizontal barcharts to answer this. Here are my results, grouped by command name And again, by full command line. The name of certain host has been blurred to protect its privacy 😉 Just a quick practical application of groupby() function recently reviewed in this blog!
-
Hadoop-free map reduce in Python
Python provides some native functions that can help us doing map reduce with no frameworks or infrastructure needed. This is specially useful if you are used to think in terms of MR, you want to learn MR, and your data size is not huge. Recently I had to aggregate a list of JSON files. I…
-
SheilaDB – A NoSQL REST interface on top of MySQL (2/2)
In a previous post I have shown results on how Sheila DB performs better than MongoDB on insertions of certain type of JSON documents under specific DB conditions . In this post I am going to show similar results on query operations. The exact same dataset has been loaded to both MongoDB and Sheila. This…
-
SheilaDB – A NoSQL REST interface on top of MySQL (1/2)
Recently I have been engaged with a personal project. I wanted to develop a Mongo-like REST interface for full CRUD database operation on top of a MySQL server. So I did. Please check out the project on Github. At first it was just about fun stuff using Flask, but when I decided to test the…
-
Monapi: restful interface for MongoDB built on Flask
I have started a new project! I though it might be useful to provide some MongoDB functionalities as Restful services. For the moment it’s just basic system administration, data retrieval and storage, and map reduce job submission, but I plan to extend it a bit more! Feel free to check it out, comment, etc… https://github.com/jkklapp/monapi
-
BASH: Looping over strings with whitespaces… $IFS
A little shell linux tip that I still find it very useful (and somehow not widely known). If you ever need to loop over ‘ls -lha’ command output in linux line by line, you may find that BASH splits every string using whitespaces, but you can tweak this by accessing the $IFS environment variable. IFS…
-
Chaining MapReduce jobs to retrieve and link germplasm and genetic data. (2/3)
In the previous post I showed how we can ‘map’ GET requests to GBIF datasets and ‘reduce’ species names. In this post I shall explain how we can use these species names to retrieve genetic data available in genebanks. The input in this step will be unique species name strings such as: This is the…
-
Chaining MapReduce jobs to retrieve and link germplasm and genetic data. (1/3)
In this post I will explain how to chain several map reduce jobs to do some bioinformatics. The idea is to download the entire Eurisco germplasm database from GBIF, match species names with genetic data from NCBI and store the resulting documents into a MongoDB database. The latter election will allow us to use the…