A few months ago, while a solution for the MTA budget shortfall was being debated by the New York State Senate, The Open Planning Project helped parse MTA budget data into a machine searchable format. The MTA originally published the budget as a PDF. To extract the data I used a utility called pdftohtml to convert it into an XML document. I then used the python library lxml to convert the document into a set of csv files. The results of this labor can be seen on TOPP’s data site.
Soon after I published this data I was told by a number of people that the data would be more useful if presented in another format. At first I just started creating a bunch of command line python scripts that would suck in these csv files and spit them out in different formats. I quickly realized that I could accumulate these scripts and create a quick and dirty web application.
Over a few train rides I created an application called DataIO, and this week I finally got a chance to upload it to Google App Engine. Specifically I received three requests for data in different formats. I’ll give examples using the data set containing the MTA’s annual labor expenses.
http://www.dataio.org/data/Wfb?format=flot&base_column=0&base_row=0
The “base_column” query string parameter represents the column in the CSV file that will used for the legend of the graph. The “base_row” represents the row in the CSV file that contains the values for the x-axis of the graph.
It’s not obvious how that JSON will display, so DataIO allows you to preview the graph by adding a “preview” query string argument:
http://www.dataio.org/data/Wfb?format=flot&base_column=0&base_row=0&preview=true
http://www.dataio.org/data/Wfb?format=gchart_line&base_column=0&base_row=0
which returns the URL for the following image:
http://www.dataio.org/data/Wfb?format=html&multiplication_factor=1000000&multiplication_start_row=1
or in millions of Euros:
http://www.dataio.org/data/Wfb?format=html&multiplication_factor=0.734&multiplication_start_row=1
The number to multiply by is sent in via the multiplication_factor argument and the multiplication_start_row tells DataIO not to multiply the first row by the factor.
A complete list of query string arguments that can be used to interact with DataIO are located on its front page. The code for this application is hosted at bitbucket.
No Comments Leave a comment
No comments yet.
Leave a comment