Digital Geography

28. November 2012

Python for Geospatial Data Analysis (Part I)

For my first post on digital-geography.com I wanted to begin a discussion on a topic I have strong feelings about, Python.

A common question I get from students and experienced colleagues is “What analysis environment do you use?”. Where I work most people use either IDL/Envi or Matlab for raster data analysis. These are good packages, with many advanced capabilities, but they can be restrictive. IDL or Matlab code can be difficult to move from system to system or share with others because both environments require licenses that can be quite expensive. That is particularly a problem if you are collaborating with colleagues in organizations that do not have large software budgets.

Python is a high-level programming language that is easy to write. I perform most of my analyses on a MacBook Pro, but the software I write could run on a Unix/Linux machine or a PC running Windows.

I am planning for a series of posts on this topic that can be used as a basic tutorial exploring the capabilities of Python for geospatial data analysis. I hope that you, the reader, will follow along and maybe even try some of these things on your own computer. Before we can do that, however, you need to have the software that will be used in this series of posts.

Python is installed by default on most Linux and OSX machines. The instructions below should get people started. To begin, you will need a functioning version of python and have the GDAL and Numpy bindings installed. GDAL is the Geospatial Data Abstraction Library and we will use it widely in these examples.

Mac OSX:
I use Macports, but fink or Darwinports is fine, to install these products. Once you have macports installed, you can install the packages using the following commands in a terminal window

sudo port install python26
sudo port install py26-numpy
sudo port install py26-gdal

My system is currently using Python 2.6 and this will be used in the examples, however, any version of Python 2 should be fine.

Windows:
Python for 64-bit or 32-bit Windows installations can be found at www.python.org. Install the appropriate Python 2.7.3 binary for your system.
Numpy and GDAL can be downloaded and installed as binary packages. For windows I prefer Christoph Gohlke’s pre-compiled libraries. Choose and install the version of the library that matches the version of Python you chose. You should install the numpy-MKL version if it is supported by your system.

Linux:
Install the necessary modules using the appropriate package manager for your system or build from source. Both numpy and gdal are provided as source code and are reasonably easy to build.

Verification:
After you have installed the necessary modules and software verify the modules are available. In a terminal (Mac/Linux) or command prompt (Windows) type python followed by a return. Once the python prompt appears, type in commands as below and you should get no errors.

Python 2.6.8 (unknown, Oct 7 2012, 22:44:17)
[GCC 4.2.1 Compatible Apple Clang 3.0 (tags/Apple/clang-211.10.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> from osgeo import gdal
>>> quit()

Next tutorial will cover basic python and reading raster files.

References:
For additional information on Python, GDAL or Numpy, please visit the following:
Python http://www.python.org
GDAL http://www.gdal.org
Numpy http://www.numpy.org/