Digital Geography

3. December 2012

Python for Geospatial Data Analysis (Part III)

Writing Geospatial Files

In the last post in this thread I began discussing basic syntax and how to open and read a geospatial raster file. This installment in the series will demonstrate how to take the data we read from the file and write it out to a new file. In this case, we won’t change anything in the data, just use it as a means to demonstrate writing a file.

In order to write a file, there needs to be a small addition to the subroutine for reading data. In the previous post the subroutine didn’t return the data or any geoinformation, only the x- and y-sizes. The subroutine looked like this:

def readFile(filename):
    filehandle = gdal.Open(filename)
    band1 = filehandle.GetRasterBand(1)
    geotransform = filehandle.GetGeoTransform()
    geoproj = filehandle.GetProjection()
    band1data = band1.ReadAsArray()
    xsize = filehandle.RasterXSize
    ysize = filehandle.RasterYSize
    return xsize,ysize

In order to return the data and geoinformation, we just need to add it to our return line. So, it will now read:

def readFile(filename):
    filehandle = gdal.Open(filename)
    band1 = filehandle.GetRasterBand(1)
    geotransform = filehandle.GetGeoTransform()
    geoproj = filehandle.GetProjection()
    band1data = band1.ReadAsArray()
    xsize = filehandle.RasterXSize
    ysize = filehandle.RasterYSize
    return xsize,ysize,geotransform,geoproj,band1data

Of course the calling statement needs to be modified slightly as well, as follows:

(x,y,transform,proj,data) = readFile(file)

Now the data and other information have been returned and collected. That sure was easy! To write the data back to a file, we will simply make a subroutine similar to that for reading a file. It will be called writeFile and in this case we will need to pass it the name of the file we want to write, the geotransform block, the projection and the data. We are going to hard code a couple things that we will return to later in the series to make them easier to work with right now. The function will look something like this:

def writeFile(filename,geotransform,geoprojection,data):
    (x,y) = data.shape
    format = "GTiff"
    driver = gdal.GetDriverByName(format)
    dst_datatype = gdal.GDT_Byte
    dst_ds = driver.Create(filename,y,x,1,dst_datatype)
    dst_ds.GetRasterBand(1).WriteArray(data)
    dst_ds.SetGeoTransform(geotransform)
    dst_ds.SetProjection(geoprojection)
    return 1

Now let’s break this down a bit The first thing the subroutine does is get the x and y dimension of the array by calling the shape Numpy method on the array. This brings up a very interesting point, though, and one that is likely to confuse some people. Python, in general, refers to its arrays in row-major order. This means that the x dimension is the number of rows (vertical) in the array and the y dimension is the number of columns (horizontal) in the array. For anyone familiar with Matlab, this is backwards from the way Matlab handles arrays. Additionally, it is backwards from the way we normally think about raster imagery, where x refers to horizontal/longitude/easting and y refers to vertical/latitude/northing. That is why four lines later, when the call to driver.Create is made it use the values in reverse order, because, driver.Create expects column-major order. If you ever try to write a file and get an error that the x,y bounds have been overrun, come back and check to make sure you did not forget the proper order.

In this case, we have hard coded the output format to be Geotiff. This is the default GDAL format and the easiest to work in. Following that, we create a destination dataset of type Byte with one band of proper dimension. After this dataset is created we can act on it using GetRasterBand, SetGeoTransform, and SetProjection. We’ve just handed it the same information that was in our initial test file without alteration.

The subroutine is complete for now and all that is left to do is call it. A hard-coded filename, testwrite.tif, is created and then subroutine is called.

filename = 'testwrite.tif'
writeFile(filename,transform,projection,data)

This will write a one-band geotiff with the same geospatial data as the source file. You can download the python script here. In the next post we’ll begin discussion of the geotransform and projection information in the file and using Numpy. Thank you for reading.