Hi, I am Amisha. I have been working as a data engineer at Mapbox, since 2 years and keep an interest in open data projects such as OpenStreetMap, Wikidata and Wikipedia. As a part of Outreachy program, I have to contributed to gnome-maps in the past. Apart from that, I like spending time on learning on Machine learning algorithms.

In this talk, I will be talking about how we at Mapbox protect maps data from Vandalism, touching upon relationship between GNOME and Mapbox and how this particular project stands advantageous for GNOME users.

Abstract: We at Mapbox, source geospatial data from a crowdsourced based open data project : OpenStreetMap(OSM), to display on our map tiles. OSM is also referred to as wikipedia of maps and is considered one of the largest living map of the world. It receives nearly 2.6M changes by around 4.5K volunteers on a daily basis. Validating crowdsourced geographic data at this scale and ensuring there are no cases of vandalism or intentional destruction of data that can affect downstream data users is a massive challenge.

Vandalism in Maps

Zero tolerance for Vandalism At Mapbox, we approached different strategies and ideas to validate data from OSM, since Mapbox maps rely heavily on OSM data and make it’s way to billions of users around the world. Therefore the quality of data plays a crucial role and we cannot afford any vandalism on the maps.

Strategies In the past, we have worked a lot on rule based approach to catch vandalism. But that couldn’t help us in 100% protection against vandalism and therefore at one point of time, Mapbox went through Map freeze and we blocked all the upstream data and worked through building an entire new pipeline. From this talk, I would like share a bit more about our past approaches, the failures and the learnings we gained to build a new system which helps us validating each and every change on the map.I will talk in detail about our new system which focuses on geospatial clustering and an efficient tasking system.

GNOME-Maps <-> Mapbox In the past we have seen that the tile services used by GNOME-Maps shut down their service and since then Mapbox offered their tiles to Gnome-maps. Though the data which used to show up in Gnome-maps earlier was also OSM but with the new system for data validation, now GNOME users will also get validated version of OSM.

Conclusion The OpenStreetMap data is built by collective efforts and values of a huge community. Therefore continuous efforts to preserve the data quality is really essential to keep the map data healthy. This methodology in general will help us in understanding the processes around validation of crowdsourced data project.

2018 September 8 - 11:15
45 min
Libre Application Summit