CSV to SHP with Python

Python is a well established script language in the GIS/geodata world. And as a Facebook friend asked how to read csvs with Python I thought about “How to convert a csv to a shp with Python?”. Keeping in mind that most GPS solutions and many internet tools offers a csv export and it’s common in any stats/spreadsheet program this can be a handy solution for your everyday life. See my solution here…

Reading A CSV With Python

As a first task, we need to read a csv. This is accomplished using the csv module in Python. We will access a file and will read line after line:

import csv
with open('/home/ricckli/Desktop/example.tsv', 'rb') as csvfile:
	reader = csv.reader(csvfile, delimiter='\t') #my example uses the tab as delimiter
	for line in reader
		print '; '.join(line)

So by reading line by line we can easily do everything we would like with the content. So maybe convert the csv first into a nice dictionary with the column names as attributes and “cells” as values for those. Therefore we will not use the reader function in the csv module. Instead we use the DictReader 😉 :
import csv
with open('/home/ricckli/Desktop/example.tsv', 'rb') as csvfile:
	reader = csv.DictReader(csvfile, delimiter='\t') #my example uses the tab as delimiter
	for row in reader:
		print(row['LAT'], row['LON']) #these are my geometry columns

The Tricky Part: Designing The Shapefile

The ogr module enable us to build a shapefile from scratch. Yet it is not easy:
import osgeo.ogr, osgeo.osr #we will need some packages
from osgeo import ogr #and one more for the creation of a new field
spatialReference = osgeo.osr.SpatialReference() #will create a spatial reference locally to tell the system what the reference will be
spatialReference.ImportFromEPSG(4326) #here we define this reference to be wgs84..
driver = osgeo.ogr.GetDriverByName('ESRI Shapefile') # will select the driver foir our shp-file creation.
shapeData = driver.CreateDataSource('/home/ricckli/Desktop/example_points.shp') #so there we will store our data
layer = shapeData.CreateLayer('Example', spatialReference, osgeo.ogr.wkbPoint) #this will create a corresponding layer for our data with given spatial information.
layer_defn = layer.GetLayerDefn()
As you might have seen, we already have defined the reference system for our coordinates. If your file have coordinates in another System, use the CRS of your source. Furthermore we don’t have any fields in our shapefile at the moment. But how to get field names in a generic way. Therefore we will analyse the dictreader object :
with open('/home/ricckli/Desktop/example.tsv', 'rb') as csvfile:
	readerDict = csv.DictReader(csvfile, delimiter='\t')
	for field in readerDict.fieldnames:
		new_field = ogr.FieldDefn(field, ogr.OFTString) #we will create a new field for each header element
		layer.CreateField(new_field)
Yet we do have a problem here. We assume, that all the information in the csv is text information. In fact we do have some numbers as well. But if you would like to take this into account, you need to build each field by yourself (Is there another/generic way?)

Bringing It All Together

Coming back to the lines/points of the separated file. As for each line in the csv we need to add a feature with coordinates defined in the columns LAT and LON and add the attributes to the fields. Furthermore let’s get this script called with four input parameters (import csv file, EPSG code, delimiter and export shapefile):
from sys import argv
script, input_file, EPSG_code, delimiter, export_shp = argv
import csv
import osgeo.ogr, osgeo.osr #we will need some packages
from osgeo import ogr #and one more for the creation of a new field
spatialReference = osgeo.osr.SpatialReference() #will create a spatial reference locally to tell the system what the reference will be
spatialReference.ImportFromEPSG(int(EPSG_code)) #here we define this reference to be the EPSG code
driver = osgeo.ogr.GetDriverByName('ESRI Shapefile') # will select the driver for our shp-file creation.
shapeData = driver.CreateDataSource(export_shp) #so there we will store our data
layer = shapeData.CreateLayer('layer', spatialReference, osgeo.ogr.wkbPoint) #this will create a corresponding layer for our data with given spatial information.
layer_defn = layer.GetLayerDefn() # gets parameters of the current shapefile
index = 0

with open(input_file, 'rb') as csvfile:
	readerDict = csv.DictReader(csvfile, delimiter=delimiter)
	for field in readerDict.fieldnames:
		new_field = ogr.FieldDefn(field, ogr.OFTString) #we will create a new field with the content of our header
		layer.CreateField(new_field)
	for row in readerDict:
		print(row['LAT'], row['LON'])
		point = osgeo.ogr.Geometry(osgeo.ogr.wkbPoint)
		point.AddPoint(float(row['LON']), float(row['LAT'])) #we do have LATs and LONs as Strings, so we convert them
		feature = osgeo.ogr.Feature(layer_defn)
		feature.SetGeometry(point) #set the coordinates
		feature.SetFID(index)
		for field in readerDict.fieldnames:
			i = feature.GetFieldIndex(field)
			feature.SetField(i, row[field])
		layer.CreateFeature(feature)
		index += 1
shapeData.Destroy() #lets close the shapefile

 
Attributes after shp creation from csv
Attributes in QGIS
In the end you can call the whole file like this in your terminal/cmd console. You can enhance it further and make the LAT/LON names generic. The first line works for tab-separated files, second for “;” separated files:
python /home/ricckli/Desktop/csv_to_shp.py /home/ricckli/Desktop/example2.tsv 4326 $'\t' /home/ricckli/Desktop/test2.shp
python /home/ricckli/Desktop/csv_to_shp.py /home/ricckli/Desktop/example.tsv 4326 ";" /home/ricckli/Desktop/test.shp
You can download the python script here and also the example files. I’ll appreciate any comment!
5 2 votes
Article Rating
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
annalisapg
annalisapg
8 years ago

good job! the only spare reflection when I read the title was exactly this: you have to know the CRS of your data in advance, in order to assign them one at a specific moment. BTW, (a typo in the comment I suppose) 4326 is the EPSG of WGS84 Lat-Long, not projected.

Raquel
Raquel
8 years ago

Thanks for this! I have one question though, why not use pyshp for creating the shapefile? It looks more simple to use than ogr. See https://code.google.com/p/pyshp/

Josef Fritzer
Josef Fritzer
8 years ago

Geiler Scheiß, DANKE, DANKE

Silke
Silke
7 years ago

Thank you so much, it works excellently! I’m only starting out with osgeo and python. Do you maybe have any pointers on to how to use this with projected/national coordinate systems (Specificly the Dutch grid RD new, EPSG::28992)? Just exchanging “LAT” and “LON” with the respective X- and Y-coordinate headers didn’t seem to do the trick 🙂

Thank you!