{"id":571,"date":"2018-08-05T22:39:03","date_gmt":"2018-08-05T22:39:03","guid":{"rendered":"https:\/\/www.diggernaut.com\/blog\/?p=571"},"modified":"2019-01-12T08:36:29","modified_gmt":"2019-01-12T08:36:29","slug":"improving-the-functionality-for-working-with-geospatial-data","status":"publish","type":"post","link":"https:\/\/www.diggernaut.com\/blog\/improving-the-functionality-for-working-with-geospatial-data\/","title":{"rendered":"Improving the functionality for working with geospatial data"},"content":{"rendered":"<p>We are continually working on improving the functionality of our web scraping and data extraction platform Diggernaut. This time, the enhancement package includes functions that supposed to work with geospatial data.<\/p>\n<p>First, we would like to inform you that we have completely redesigned the function to extract multi-polygons by the OSM relation ID. Previously, we used a third-party <a href=\"http:\/\/polygons.openstreetmap.fr\/index.py\">service<\/a>. However, as it turned out during the intensive usage, not all relations can be converted into WKT using this service. Also, this service does not work correctly if there are inner rings (holes) in the multi-polygon. It merely turns all inner rings into outer ones. That\u2019s why we code own routine to extract multi-polygons in WKT format. It works correctly with inner rings and can give out any relation that has at least one closed ring (polygon) in WKT format. To do it, you will need to use the <strong>wkt<\/strong> command as before. Please note that the changes have only affected the WKT format, for the GeoJSON format everything remains the same.<\/p>\n<p>The second great improvement was the addition of address parsing functions for almost any country in the world. We connected the <a href=\"https:\/\/github.com\/openvenues\/libpostal\">libpostal<\/a> library as a microservice, and its functionality became available from scrapers within the package for working with geospatial data. The library is written in C and uses the statistical NLP for parsing and normalizing postal addresses, using pre-trained models with data from OpenStreetMap, OpenAddresses and other sources. We added 2 functions: <strong>address_parse<\/strong> \u2013 for parsing (splitting to the elementary parts) and <strong>address_expand<\/strong> \u2013 for address normalization. Since very often the addresses on the sites are represented as a single block with the text, splitting it into parts (street, city, zip, etc.) may become problematic, it seems to us that this functionality together with the geocoding command can be extremely useful for you to solve your tasks. If you want to learn more about how libpostal works, we recommend that you read the following article: <a href=\"https:\/\/medium.com\/@albarrentine\/statistical-nlp-on-openstreetmap-part-2-80405b988718\">Statistical NLP based on OpenStreetMap data, part two<\/a>.<\/p>\n<p>More information about these and other functions of the package for working with geospatial data can be found in the sections of our documentation: <a href=\"https:\/\/www.diggernaut.com\/dev\/meta-language-methods-geospatial-data-working-with-addresses.html\">Working with addresses<\/a> and <a href=\"https:\/\/www.diggernaut.com\/dev\/meta-language-methods-geospatial-data-working-with-geospatial-data.html\">Working with geodata<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>We are continually working on improving the functionality of our web scraping and data extraction platform Diggernaut. This time, the enhancement package includes functions that supposed to work with geospatial data. First, we would like to inform you that we have completely redesigned the function to extract multi-polygons by the OSM relation ID. Previously, we [&hellip;]<\/p>","protected":false},"author":4,"featured_media":573,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[27,35],"tags":[],"class_list":["post-571","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-diggernaut-engine","category-integrations"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/571","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/comments?post=571"}],"version-history":[{"count":2,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/571\/revisions"}],"predecessor-version":[{"id":619,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/571\/revisions\/619"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/media\/573"}],"wp:attachment":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/media?parent=571"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/categories?post=571"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/tags?post=571"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}