Digital Geography

29. November 2012

Python for Geospatial Data Analysis (Part II)

Reading Geospatial Files

In the last post in this thread I provided a bit of background and some simple instructions for installing python and the necessary modules for geospatial analysis. In this post I will cover some basic python syntax and reading raster data from a geospatial file. Let’s get started.

Python files typically end in the extension .py. In a OSX/Linux environment, the first line in your file should be the interpreter you want to use. On my macbook, my python executable is /opt/local/bin/python, so the first line of my file is

#!/opt/local/bin/python

After this, you need to load the appropriate libraries. For this example you will need Numpy and GDAL.

import numpy as np
from osgeo import gdal
import sys

In this case, we have loaded gdal from osgeo and will reference it as gdal, and numpy will be referenced as np. This is done simply to ease writing. We have also loaded the sys library, which will give us access to sys.argv for command line parameters.

We’re going to start by creating a subroutine to read a file and print some information about that file. Python uses the def keyword to indicate a subroutine. Python uses indents to define code blocks. This may seem a bit odd if you are used to code blocks and variable scope being defined by curly braces, brackets or keywords. However, it helps to make python very readable. Let’s say that our subroutine will be called readFile, then we have:

def readFile(filename):

Python subroutine definitions always end in a colon (:). This subroutine requires a single variable be passed to it containing the name of the file we want to read, referred as filename. The first thing we need to do is open the file using a gdal function and this will return a filehandle through which we can access the file. The start of our function looks like the following:

def readFile(filename):
    filehandle = gdal.Open(filename)
    band1 = filehandle.GetRasterBand(1)
    geotransform = filehandle.GetGeoTransform()
    geoproj = filehandle.GetProjection()

In this block we have opened the file and gathered the filehandle. Then, using the filehandle to access the file we have returned the first band, signified by band 1, and the geotransform block and projection information. This is all you need to write a geospatial aware file. However, the variable band1 is not in an array yet and so not accesible in a useful manner. For that, we will use the function ReadAsArray. This will place the data in an array of appropriate size.

band1data = band1.ReadAsArray()

Called in this way, ReadAsArray will read in the entire band at the dimensions given in the file metadata and return it as a numpy array. Now that we have the data as a Numpy array we have access to all the power of Numpy. As far as mathematical calculations go, the world is our oyster, so to speak. You can access the file dimensions directly through RasterXSize and RasterYSize as follows:

xsize = filehandle.RasterXSize
ysize = filehandle.RasterYSize
return xsize,ysize

So, now our initial subroutine is complete. It doesn’t do a whole lot yet, but give it time. For now we will read a file, get the first band, convert it to a numpy array, retrieve the file dimensions and return those to the main program. So, our entire subroutine looks like:

def readFile(filename):
    filehandle = gdal.Open(filename)
    band1 = filehandle.GetRasterBand(1)
    geotransform = filehandle.GetGeoTransform()
    geoproj = filehandle.GetProjection()
    band1data = band1.ReadAsArray()
    xsize = filehandle.RasterXSize
    ysize = filehandle.RasterYSize
    return xsize,ysize

If we provide the file name we want to read as a command line parameter, we can access it through sys.argv. Python arrays start at 0 and sys.argv[0] is the script name. The filename will be sys.argv[1] and we can assign it to a variable that we will pass to our subroutine. We will then collect the returned sizes and print them, as follows:

file = sys.argv[1]

(x,y) = readFile(file)

print 'X Size is ',x
print 'Y Size is ',y

So, I think this post has covered enough ground for now. If you want to download the file generated in this example, you can find it here. In the next post, we will continue looking at information we can extract from our files and how we can write the data back to a new file. Thanks for reading.