In a previous guide, I wrote a short overview of how to document JSON. If you haven't read it yet, I highly recommend that you do so. The following content relies on previously listed information. For this guide, we will be revising our robot API.
Note: this guide makes references to HTML. You do not need to know HTML to continue with this guide but it does help to understand how HTML elements work.
In this guide, we will revisit our delivery robot API as mentioned in previous technical writing guides that I have written.
XML stands for eXtensible Markup Language which is very much like HTML but meant to describe data. Unlike HTML, XML tags are not predefined and you can define your own tags as you see fit. Because of this customizable nature, XML can be used for any kind of structured data. Also, unlike JSON, XML has only one type: String
With that, lets jump into structures of XML and learn a little about XML in general.
Structured Data
XML allows for the transfer of data via a structure format. Think of this structure as a tree: From the root, you have several branches. Each branch in turn has it's own branch and so on until you finally reach the end with a leaf.
XML can contain a dictionary of lists, lists of dictionaries, and dictionary of dictionaries. A dictionary, in terms of XML, is similar in nature to JSON's key/value pair. But, unlike JSON, the first line of an XML file has to be used to declare a few things. This line is referred to as the header.
This header information usually includes version, character encoding, and so forth. But from a API writer's standpoint, you can basically ignore it.
Example of a declaration:
<?xml version="0.8" encoding="UTF-8" standalone="no" ?>
Tags and Content
Tags works like HTML starting with angle brackets (< >). The start and end tags must match that can only contain letters, numbers, and underscores. All XML tags must start with a letter character.
Example:
<type>roller</type>
Empty tags can use self-closing tags like this one:
<space_travel/>
Content is what is found between the opening and closing tags. If the content has no tags, it's treated like a string. In the type (tag) example above, roller is the content.
Nesting
Just like JSON objects, nested data can sits inside a set of tags.
Example of nested tags:
<robot>
<type>roller</type>
<weight>20</weight>
<sensors>optical</sensors>
</robot>
In this example above, we have a robot tag with three nested tags within it (type, weight, and sensors).
Here is an example of a full XML file as noted from a robot object with type, weight, and sensors data:
<?xml version="0.8" encoding="UTF-8" standalone="no" ?>
<robot>
<type>roller</type>
<weight>2</weight>
<sensors>
<sensor>optical</sensor>
<sensor>distance</sensor>
<sensor>vibration</sensor>
<sensor>fuel</sensor>
<sensors>
<robot>
Attributes
In addition to content, tags can have attributes that contains simple data (such as a string or number). Attributes act like key/value pairs when accessing the data within it.
The key is created as a string within the tag but you do not have to put quotes around it. The value should be in a set of quotation marks. Key names must start with a letter character and can use any combination of letters, numbers, and underscores. Spaces and punctuation characters are not allowed for the name of keys names.
Most common designs with XML files is to use attributes as some sort of property about the data (such as metadata).
Example of attributes:
<robot>
<weight unit="kilograms">20</weight>
<velocity decimals="2" unit="km/h">32.22</velocity>
<battery_life unit="hours" active="true">10</battery_life>
</robot>
In this particular case, the robot has a weight of 20kg, is moving at 32.22km/h, and has 10 hours of active battery life left.
Here is an example of attributes and an array:
<robot type="roller" weight="20" weight_unit="kilograms">
<sensors>
<sensor name="optical"/>
<sensor name="distance"/>
<sensor name="vibration"/>
<sensors>
<robot>
From this example, we can tell that the robot is a roller type, weighs 20kg, and has three sensors including optical, distance, and vibration.
Comments
You can use comments in XML files just like HTML files by using the opening and closing comment tags of <!-- and -->. Everything in between these comment tags will be ignored.
Consider the following:
<robot type="roller" weight="20" weight_unit="kilograms">
<sensors>
<!-- This attribute needs at least one sensor to be listed in order to work properly. If this is empty, please review the source file. -->
<sensor name="optical"/>
<sensors>
<robot>
As noted, the comment nested in the sensors tag provides some information about the data but it won't be read or displayed by any parser or auto-documentation system.
Namespaces
Namespaces are commonly used in structured hierarchies, like XML documents, to allow the reuse of names but in different contexts. Namespaces are usually a prefix given to a tag name that is separated by a colon (:).
For example, we have a tag called wheels, and another set called front:wheels. The obvious difference here is that one tag set handles data about wheels in a more general sense while front:wheels handles the specifics of wheels located on the front side.
<wheels>
<tire_count count="4"/>
</wheels>
<front:wheels>
<tire_pressure_left unit="pounds" value="20"/>
<tire_pressure_right unit="pounds" value="20"/>
</front:wheels>
Indentation
Indentation is typically used to indicate nesting of information. Indentation is typically handled via "White space" which means spaces, new lines, and so on. You may have noticed the indentation in all the previously listed examples and didn't think much about it but each example used a very rigid spacing policy of two white spaces per level.
While some think that spacing doesn't matter in a XML file unless it is inside a quotation mark, I tend to disagree. I believe that one should have clear and consistent usage of white spacing throughout the entire documentation set. Consistent spacing makes the XML document easier to read and just looks more professional.
Properly formatted XML includes:
- An indent for every new level of brackets
- 2 to 4 white spaces (depending on your team's/company's coding policy)
- Tags that do not contain other tags can have start and end tags on the same line.
- Tags that are nested should be on their own lines.
There is an ongoing argument about white spaces versus tabbed spaces. Depending on the options of your favorite XML editor, a tabbed space can be configured to use white spaces instead or use any other spacing configuration that the development team has blessed.