JSON vs XML
Data serialization is a very common task, and JSON and XML are probably the most popular formats for that. They both can be used to store essentially arbitrary data. However that doesn’t mean they are interchangeable.
For example, consider this XML document:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="fiction">
<title>Dragon Flames</title>
<author>John Doe</author>
<year>2012</year>
<price>35</price>
</book>
<book category="non-fiction">
<title>Localisation & Globalisation</title>
<author>Smith White</author>
<year>2020</year>
<price>120</price>
</book>
</bookstore>
The same data can be encoded in JSON much more succinctly:
{
"books": [
{
"category": "fiction",
"title": "Dragon Flames",
"author": "John Doe",
"year": 2012,
"price": 35
}, {
"category": "non-fiction",
"title": "Localisation & Globalisation",
"author": "Smith White",
"year": 2020,
"price": 120
}
]
}
This looks much better. However it’s not actually the same data. It may be the same for the application but, note the differences:
yearandpriceare numbers in JSON. XML doesn’t know numbers, they’re all strings.categoryis an attribute in XML. In JSON it’s a field like everything else.- In XML each entry is labelled as a
bookseparately. In JSON only the array is labelled (with a field name).
For a specific application that probably doesn’t matter (though would be a big pain for a generic converter). But, consider a slightly modified example:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="fiction">
<title>Dragon Flames</title>
<author>John Doe</author>
<year>2012</year>
<price>35</price>
</book>
<magazine category="non-fiction">
<title>Localisation & Globalisation</title>
<editor>Smith White</editor>
<year>2020</year>
<price>120</price>
</magazine>
</bookstore>
The change is minuscule but, how to reflect it in JSON? One way is to use separate arrays for books and magazines but, if the relative order matters, the only way is to add a special key field, like:
{
"items": [
{
"$type": "book",
"category": "fiction",
"title": "Dragon Flames",
"author": "John Doe",
"year": 2012,
"price": 35
}, {
"$type": "magazine",
"category": "non-fiction",
"title": "Localisation & Globalisation",
"editor": "Smith White",
"year": 2020,
"price": 120
}
]
}
This works but, it’s not all that clean anymore. Especially given that the field order can be arbitrary so, $type may end up somewhere in the middle.
There is more however. XML isn’t just for data serialization, it can also handle text markup. In fact, that’s its original purpose. Consider:
<review user="Willy Smith" rating="5">
<p>I’m <em>so</em> glad I’ve read it! It’s great!</p>
<p>Totally recommend!</p>
</review>
It is sure possible to store anything in JSON but...
{
"user": "Willy Smith",
"rating": 5,
"text": [
{
"type": "paragraph",
"content": [
"I’m ",
{ "type": "emphasis", "content": "so" },
" glad I’ve read it! It’s great!"
]
},
{
"type": "paragraph",
"content": "Totally recommend!"
}
]
}
...is it really desirable?
And if you think the example is exaggerated, maybe a bit but, the real issue is mixing text and markup. JSON just isn’t designed to handle this.
There is a reason however, why XML is often considered “legacy”. While it is actually much more flexible than JSON real-world data doesn’t need that flexibility, which gets in the way instead. Just look at the first example again: if all you need is a bunch of items with specific fields, JSON is perfectly good for it! Only if your data has to be a mixed bag of everything like an AST or “just” marked up text XML starts to gain advantage — and may end up being a better choice.