{"id":179,"date":"2016-11-25T17:02:11","date_gmt":"2016-11-25T17:02:11","guid":{"rendered":"https:\/\/blog.diggernaut.com\/?p=179"},"modified":"2019-01-12T18:08:38","modified_gmt":"2019-01-12T18:08:38","slug":"json-to-xml-or-transform-in-6-seconds","status":"publish","type":"post","link":"https:\/\/www.diggernaut.com\/blog\/json-to-xml-or-transform-in-6-seconds\/","title":{"rendered":"Json to XML, or &#8220;transform in 6 seconds.&#8221;"},"content":{"rendered":"<p>Hi folks. I want to share with you some details about our engine. As you know, it is written in Go. We use a lot of libraries there, and one of them \u2013 <code>mxj<\/code> \u2013 an outstanding library to work with <code>XML<\/code>.<\/p>\n<p>Now I am going to briefly tell you how our engine\u2019s <code>json2xml<\/code> routine works. First, we convert <code>json<\/code> to the <code>map [string] interface {}<\/code>, and then feed this object to mxj following way: <code>xmlValue, err: = mxj.AnyXmlIndent (data, &quot;&quot;, &quot;&quot;, &quot;body&quot;)<\/code>. After it, we fix the <code>self-closed<\/code> tags and pass the object. We used this logic for 3 months, and everything was just fine, but suddenly it comes that we need to parse larger volumes of <code>json<\/code> than usual. So it turned out to be a problem. One of the diggers works 8 hours instead of 15 minutes. So we did the necessary research. Page processing takes 16 minutes, which, for obvious reasons, is unacceptable. It turned out that there is 2.5 MB of json. Processing takes about 3 minutes using <code>mxj<\/code> library, and then some magic happened \u2013 the engine went crazy, and it took 13 minutes to process <code>XML<\/code>. Of course, we were not happy with it, and we decided to improve <code>mxj<\/code> first.<\/p>\n<p><code>mxj<\/code> library problem lay in the fact that it uses a string concatenation. Everyone knows that the strings in Golang are immutable, respectively, each such operation allocates memory for the old string and a new string. We decided to get around and have written a few new functions, which uses <code>bytes.Buffer<\/code> instead of strings. Only by this simple change, we were able to speed up <code>XML<\/code> processing in <code>mxj<\/code> library by about 180 times. Now it takes less than 1 second to process the same set of data we used before, so we made it from 3 min to 1 sec.<\/p>\n<p>During further research we found were we made a mistake, our engine expects <code>HTML<\/code> and when we are working with <code>JSON<\/code>, it may come up that some self-closed <code>HTML<\/code> tags (like <code>img<\/code> or <code>area<\/code> etc.) are used in <code>XML<\/code> as standard tags and it caused problems, so we made another change to the library that allowed us to replace some tags with safe versions. It solved all the issues we had, and the page that previously took 15 min to process now takes just 6 sec.<\/p>\n<p>Repository with the library we modified can be found <a href=\"https:\/\/github.com\/Diggernaut\/mxj\">here<\/a>.<\/p>\n<p>As a bonus, we wrote a simple converter that allows you to load data from <code>MongoDB<\/code> and convert it to<code>XML<\/code>. You can get it <a href=\"https:\/\/github.com\/Diggernaut\/xmlconverter\">here<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Hi folks. I want to share with you some details about our engine. As you know, it is written in Go. We use a lot of libraries there, and one of them \u2013 mxj \u2013 an outstanding library to work with XML. Now I am going to briefly tell you how our engine\u2019s json2xml routine [&hellip;]<\/p>","protected":false},"author":5,"featured_media":197,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[32,27,25,24,26,2],"tags":[],"class_list":["post-179","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-codeproject","category-diggernaut-engine","category-go","category-golang","category-programming","category-web-scraping"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/179","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/comments?post=179"}],"version-history":[{"count":11,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/179\/revisions"}],"predecessor-version":[{"id":670,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/179\/revisions\/670"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/media\/197"}],"wp:attachment":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/media?parent=179"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/categories?post=179"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/tags?post=179"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}