OSS GIS

[Geek moment. Tune out now.] I’m pleased to report that I believe I’ve finally succeeded in setting my computer up for open source GIS and more (fingers still crossed though!). For my ACSP presentation, I need to generate some maps illustrating the shifting location of employment in the US over the last fifty years. And I’d decided to use PUMS data, since it has more detailed information.* And the PUMS files are huge! Since I’m now all open source,in the last week I’ve had to learn how to use a database, hook it up to GIS, and then also connect it to a statistical program.

The gory details of my saga follow. I started off with MySQL, which seemed like the most widely respected, open source database program. I learned enough SQL to import and process the fixed-width flat file. Meanwhile I set up QGIS as the frontend to GRASS GIS, which was simple. However, when I went to connect the census data to the geographic data, I discovered that GRASS doesn’t like up to MySQL well and that I needed to use PostgreSQL with PostGIS instead. Reviews suggest that PostgreSQL is somewhat slower than MySQL but more common in enterprise uses. It took me a day and half to figure out which packages I needed and how to use them (in part due to my own belief that I’d already installed everything I needed). This involved learning a slight variant of SQL, which is supposedly closer to or completely compliant with the open source standard, learning how to export and import database tables between database programs, configuring a new database that can handle geographic files (via PostGIS), and connecting QGIS to PostgreSQL.

Then, suddenly, I could make maps! It was all together. While pretty amazing and ultimately supplying what I need, QGIS does not have all of the features of a commercial product like ArcGIS. I started making graduated layers only to run into the difficulty of not being able to customize ranges from within the program. Instead you have to export the style and edit it ina text editor. This is not really hard to do and probably simplifies many things, but it was offputting at first. But then I discovered that I have to come up with appropriate ranges for the maps that I can apply across decades. The best way to do this is a histogram (or optionally a scatterplot), but QGIS doesn’t possess that ability. (I don’t know if ArcGIS does.) So now I had to get the data into R. I could have done this by simply exporting the files from the database, but this is a less than optimal situation for two reasons. First, it’s an unnecessary duplication of data, which eats up memory and opens additional possibilities that data will get corrupted or lost in the process. Thus, I decided that the solution was to set up and ODBC (open database connection) that would allow R to access the PostgreSQL database. This took some doing, since I was unable to find clear directions and had to find my solution through trial and error. (They key lie in setting up /etc/odbc.ini and /etc/odbcinst.ini appropriately and rebooting to test the new settings, since they seem to be a part of starting the PostgreSQL server during system start.) Of course, much of my progress was hampered by inexperience and an adversity to reading manuals!

Ultimately, however, it was done. I am done. And now I can make maps from and analyze data from the same database. While most people who have read this far are asking why I don’t just use ArcGIS or settle for simpler methods, I view this as a major step forward in my ability to harness the computer for future projects. Understanding databases and database connections opens up all sorts of potentially more efficient ways of storing and processing data. It also builds flexibility in combining other programs and processes later. Rather than remaining dependent on a single, self-contained program, I now have the crude tools necessary to do far more complex work.

Now if I could only learn Python, perl, and php…

[Update: Note that all of these programs are available in Windows and MacOS versions as well.]

* It turns out that for some of the employment categories in which I’m interested sample sizes are too small for the level of geography I’d like to use (MSA), so I’ve had to turn to broader measures.

  1. No comments yet.

  1. No trackbacks yet.