22. December 2016
Over the last couple of years, we have seen an increase in demand and offerings of geospatial solutions in the cloud for tasks like web map creation, centralized data storage and management, data vizualization and so on. There is a lot of Software-as-a-service (SaaS) options to explore, but in this article I want to briefly discuss if SaaS is the right thing for you, or wether you should look into rolling your own.
Depending on your field of work, you may have already heard of or have worked with services like Carto (formerly CartoDB), Mapbox, Gis Cloud, Azimap and similar. There is a wide range of these SaaS vendors, that try to make your life as GIS analyst, data journalist or developer easier. But if you have worked with some of them, you will soon realize that it gets pricey if you rely on them in the long term or use multiple services for different use cases in parallel. Also, you might be worried about vendor lock-in and your company or customers have might have strict policies on using external services for critical data. So if one of these concerns apply to your case, you might consider rolling your own. In this article, I want to outline the possibilities of combining existing open source tools to build your own central geospatial database, data visualization platform or web based GIS. The article will be based on our experience from a wide range of projects at Geolicious.
To start off, lets get an overview of the typical workloads that you might want to run either at a Software-as-a-Service provider or on your own servers.
First of all, there is the basis of all professional geospatial workflow, namely the data storage and management part. As soon as there is more than one collaborator involved, it probably makes sense to centralize your spatial data storage on a file server (if you are still into Shapefiles for example) or even better a spatially enabled database. If you were looking for SaaS solutions that provide storage of your spatial data in the cloud, you will find a wide range of those, each with a different focus and feature set. Prominent examples for geospatial data storage are Carto and Mapbox.
But as soon as you are facing edge cases of their pricing models or features, you may want to have a look into custom solutions that take developer time to build, but may save you money and headaches in the long run.
In such cases, my first alternative is always a self hosted PostgreSQL database. If you are working with spatial data and never heard of PostGIS, you should definitely check it out now. Long story short: PostGIS is an extension to PostgreSQL that allow you to store geometries in columns of regular tables, which also enables you to check for constraints like “make sure the geometry is valid”. In addition to that, PostGIS gives you full GIS functionality with its built in functions to operate on the geometries that you have stored in your spatial tables.
Depending on your needs, you can build a lot of functionality around PostGIS with more or less effort. If you just need a central data store for your geodata, you could simply spin up a server in your office or at your hosting provider of choice, install PostgreSQL, load PostGIS, which is already bundled with PostgreSQL, configure to your needs, and let all your staff access your database directly from your desktop GIS. QGIS for example has full PostGIS support by default and you can load tables with geometry columns or saved queries on these tables into your map canvas and work with these as with every other layer. The sophisticated user and role management of PostgreSQL allows for very flexible and fine grained authorization and privilege management.
While building a setup like this is not really straight forward if you dont have any upfront experience, it is totally worth the effort to learn about it, because it opens up a whole universe of benefits and further possibilities to build upon.
But if you are looking for a solution where your non-GIS staff can work with the data, even if only with the medatata, you might be better off with an easy and appealing user interface like a web application rather than a very techy desktop GIS or database client. In these cases, you will find a lot of open source tools that you can connect to your PostGIS enabled database and provide a limited set of functionality tailored for your specific use cases.
Let’s take for example the scenario of a small or medium enterprise with a couple of people working with spatial data. Instead of using shared directories to sync shapefiles between you staff computers, you could set up such a central postgis database and attach one of the myriads of frontend applications, that allow your staff or customers to access and manage the geometries and metadata in a simpler way.
First of all, we have to distinguish native and web applications. Native applications are for example standalone GIS systems like ArcGIS or QGIS, which are installed on the staffs computers to directly access the geo-database via the native database protocols. But native applications could as well be mobile device applications, altough I dont have hands on experience or example for native Android or iOS apps for connecting to PostgreSQL for example. One major downside of native applications is that often they are targeted at specific operating systems (like ArcGIS, which is Windows only), and it’s often a hassle to keep up with updates on all your staffs computers, which isn’t a problem at all if you’re using web apps.
If your target audience has a more technical background, open source solutions like GeoServer, QGIS Server and others provide very powerful web frontends as well as programming interfaces (APIs) to programmaticaly access the geodata from web maps for example.
Another solution to consider is also the fully customized development of web applications tailored to your use case. As an example, I would like to describe how we at Geolicious use the Python Flask web framework to provide customers with highly customized web applications, but options are not limited to that.
Flask is an extendible microframework, which enables for rapid prototyping of web frontends and APIs based on the Python programming language. On top of that, we use some other great libraries to build the functionalities requested by our customers. We use SQLAlchemy to connect our web applications to PostGIS databases and run queries. For providing a simple frontend, we use Flask-Admin to easily generate tabular data overviews and data management forms, and Flask-Admin also ships with a geo extension, which makes it a no brainer to make geographical data models in your database editable right inside the form view of your geospatial features. (If you want to try out a demo, go to http://geocms.geolicious.de and log in with firstname.lastname@example.org, password=password)
With just a few hours of work, you can have a basic spatially enabled content management system, allowing your staff to create, read and edit features in a simple database frontend. On top of that, we develop for example custom features like versatile web maps, showing the features of your geographic data models, statistical dashboards, interactive data visualisations, just to name a few. Our solutions are mostly used in the B2B context, providing spatial intelligence or decision support tools. But it is not a standalone system, you can easily attach this to existing digital business infrastructure by using APIs and open source tools for deep integration. To give you an example: If you need to generate arbitrary web maps from data that already exists in your spatial database, you can use this to generate web maps that are embeddable in any other HTML page by just using iframes.
But of course, if you look into roll-your-own solutions to your spatial workloads and business needs, there are a lot of other possibilities out there. Below I will list some open source tools that you can take as a starting point. Feel free to comment if you are missing something.
Example components of “roll your own” spatial cloud solutions
- PostgreSQL: the database foundation that we at Geolicious built most of our solutions upon. Very reliable, actively maintained, and with an ever growing feature list (streaming replication, JSON datatypes, first class spatial support through PostGIS, just to name a few) (https://www.postgresql.org/)
- PostGIS: PostgreSQL extension that allows for storage and processing of spatial data. Also supports raster data. A fully functional GIS on its own. (http://postgis.net/)
- GDAL: “translator library for raster and vector geospatial data formats.“ The swiss knife of geospatial data conversion. Works with raster and vector data and offers some spatial processing tools as well. As a command line tool, it is mainly used for one-time use cases, but you can as well integrate it in your extract-transform-load (ETL) workflow by writing shell script. (http://gdal.org/)
- GeoServer: map server written in Java, compliant with the standards of the Open Geospatial Consortium (OGC), offering Web Map Service (WMS), Web Coverage Service (WCS), Web Feature Service (WFS, transactional WFS) and Web Processing Service (WPS)
- QGIS Server: web map server similar to Geoserver, did not have any use case for it, but want to try it out some time in the future. Maybe you should consider QGIS Server if you are working with QGIS anyway, as it should provide a good integration.
- OSRM (Open Source Routing Machine): “Modern C++ routing engine for shortest paths in road networks.” If you need routing capabilities in your use case, this is probably your go-to library, altough there are other open source competitors. (http://project-osrm.org/)
- SQLAlchemy/GeoAlchemy: Object relational mapper for Python and its Postgis extension. Framework agnostic library to access databases, for example PostgreSQL, without writing a single line of SQL, using all the power of Python. Similar libs exist for most programming languages. (http://www.sqlalchemy.org/ and https://geoalchemy-2.readthedocs.io/en/latest/)
- Python Flask: A microframework for developing web applications. In contrast to the more prominent Django, it doesn’t come with “batteries included”, but that isn’t so much of a downside, as you can extend it with lots of other libraries for database acesss, API design, user authentication, and so on. From my experience, Flask is a solid foundation for developing custom geospatial cloud solutions. (http://flask.pocoo.org/)
Of course there are tons of other open source tools to build upon, but I only wanted to include the ones I am familiar with and that are considered stable enough to base your next project upon. Depending on your use case, you might want to add more libraries and softwares for (geographical) data visualization, number crunching and statistical analysis, interactive data exploration, sensor based data input, export functionality, service integration and so on. The sky is the limit. If you have any project in mind, but dont know how to get your head around it, just drop me a line, and we will see how we can help.
I hope you find this article helpful.