Some thoughts on Ecoinvent geographies

Spatial datasets each have their own version of the world

Figure 1: The Louisiana shoreline, as given by different datasets - red is state borders from NationalAtlas.gov, green county borders from the same source, and black is Natural Earth 10m cultural.

Ecoinvent version 3 uses spatial data to define the locations of inventory datasets (see the data quality guidelines, spreadsheet of geographies, list of geographies and spatial coordinates in Ecospold2 XML, and individual geographies in KMZ format). There is also a website for creating new geographies (source code).

These locations are called geographies because their spatial coordinates are stored as geographic coordinates, i.e. longitude and latitude, and are not projected to a plane.

However, the current implementation is a pain to deal with, both for existing users, and for editors. The ideal solution would have, at a minimum, the following qualities:

  • The spatial information should produce the same calculation results on different spatial databases, including SQL server, the database used by Ecoinvent.
  • It should be relatively easy to everyone to see and download individual geographies, and to create new geographies and import them to the database.
  • It should be easy to build the geographies in Ecoinvent using software and spatial datasets that are free and open source. This software should be well documented and tested.

In my opinion, the best way forward from what we have right now is:

  • The current software at geography.ecoinvent.org should be improved to include current geographies, and to be able to automatically provide a notification to the Ecoinvent centre when a geography should be imported to the Ecoinvent database.
  • A new software that will import datasets like Natural Earth, and recreate the geographies.xml file. This software should be as easy as possible to install and use (i.e. probably not use a spatial database). It should also round coordinates to 8 digits after the decimal point to make the math easier, reduce spurious errors, and reduce file sizes.

A more revolutionary thought is whether the current system of using spatial coordinates to define spatial relations is needed for Ecoinvent at all. The current system uses the OGC Simple Feature relation contains to determine if an inventory dataset can be linked to another dataset, e.g. the state of Louisiana is contained in the USA, and therefore contributes to its market mixes. However, if you look closely at Figure 1, you can see that the borders of the state and country don't match up. What this means practically is that each state has to be manually modified to fit inside the borders of the country, and even then what is contained on one machine with one set of geospatial libraries is not contained on another machine. Moreover, there are areas in the country borders of the USA which are not contained in the state of Louisiana borders, meaning that the union of all states would not equal the country borders.

As an alternative, we could consider moving away from using the actual spatial data to define spatial relationships, and instead do this manually in a separate file. At first, this sounds a little crazy - if you know where things are, with bleeding GIS coordinates, then of course we should use this data! The answer to this objection is three-fold:

  1. We only actually use the spatial data to build a tree of spatial relationships - and we know what the spatial relationships should be. Manually specifying them would only save some time and a lot of frustration.
  2. The actual spatial data is also used for regionalized life cycle assessment, but for regionalized LCA there is no contains requirement, meaning we could just use the native spatial datasets like Natural Earth without modification. Using the native datasets is more flexible, as we just follow their updates, and much easier for Ecoinvent.
  3. Providing a list of what region lies within what other regions is much more transparent than a > 100 megabyte file with spatial data that can't be easy loaded into e.g. Qgis or Google Earth.