Beginner's Guide to YAML

Written by Alex Lee

Published on 22 February 2020

Estimated reading time of 9 minutes

YAML, a recursive acronym of "YAML Ain't Markup Language", is billed as a human-friendly data serialization language that works well for common use cases such as configuration files, log files, and cross-language data sharing (yaml.org | Introduction).

This guide aims to provide you with an introduction to some of the main features and concepts of the language to help you get started using it. This guide will be built around an example in which we create an employee directory in YAML. This will provide us with a central place to store all important information about our employees such as their name, job title, skills, etc. We will update the directory as we progress though the guide, applying what we learn at each step.

YAML is a superset of JSON, a data serialization language many of you may be more familiar with. As such, I will provide the JSON equivalent next to some of the YAML examples to hopefully aid in your understanding and demonstrate how YAML can be a simpler syntax to work with.

Data Structures

YAML is built around three primitive representations of data (yaml.org | Introduction):

mappings (dictionaries/hashes)
scalars (strings/numbers)
sequences (lists/arrays)

Each of these will be covered in this guide (clicking on the links will take you to the relevant section).

Mappings

Mappings use a colon : followed by a space to denote a key-value pair e.g. key: value. The key provides a way to reference a piece of information and the value is the information being referenced. It might help to replace the : with an = in you mind e.g. key: value could be read as key = value.

The following code block contains our first example of the employee directory we are going to be working with in this guide. In it, we have started recording some data about a single employee, Joe Blogs, who works as a software engineer.

name: Joe Blogs
jobTitle: Software Engineer

Mappings are equivalent to objects in JSON. The following JSON is equivalent to the above YAML.

{
  "name": "Joe Blogs",
  "jobTitle": "Software Engineer"
}

Whitespace

Whitespace is an important entity in YAML as it is one of the mechanisms used to provide structure to the document i.e. by indenting a mapping, you can indicate that the mapping belongs to the preceding key. For example, we might want to store the employee's name split into their first and last names as in the following example.

name:
  first: Joe
  last: Blogs
jobTitle: Software Engineer

In this example, the document contains a mapping with the keys name and jobTitle. This is the base structure in the document. The value of the name key in the base mapping is itself a mapping containing the keys first and last.

This is equivalent to the following JSON.

{
  "name": { "first": "Joe", "last": "Blogs" },
  "jobTitle": "Software Engineer"
}

The YAML specification states that only spaces should be used for indention, not tab characters (yaml.org | Indentation Spaces). This shouldn't pose a problem when writing YAML as most modern text editors can be configured to enter spaces when the tab key is pressed rather than a tab character. The common value for indentation is two spaces as used in these examples.

Scalars

Scalars contain the data that isn't structurally important to the document. We have already seen some examples of scalars in the start of our employee directory (repeated below for convenience).

name:
  first: Joe
  last: Blogs
jobTitle: Software Engineer

In this example, the scalars are the values in the shown mappings i.e. Joe, Blogs and Software Engineer.

Flow Scalars

When the value is defined on the same line as the key, as in the above example, these scalars are given the name flow scalars. There are three different styles of flow scalar: plain which aren't wrapped in anything; single quoted which are wapped with '; and double quoted which are wrapped with " and enables the inclusion of special characters such as new line characters \n.

plain: Plain flow scalars look like this
single: 'Single quoted flow scalars look like this'
double: "Double quoted flow scalars look like this"

Block Scalars

There are times when it is useful to contain more than a single line of information in a scalar. This is where block scalars come in. There are two types of block scalar: literal which preserves all white space contained in the scalar (apart from leading indentation) and folded which replaces new line characters with spaces (new lines can be included by leaving a blank line).

To use a block scalar, you have to specify a special character on the same line as the key the block scalar will belong to. The content of the block scalar follows on the following lines. A pipe character | is used to introduce literal blocks and a greater than character > is used to introduce folded blocks.

literal: |
  This is a literal block scalar.
  New lines are preserved so this will appear on a new line.
folded: >
  This is a folded block scalar. 
  New lines aren't preserved so this line will appear in the same 
  paragraph as the first, separated by only a space.

  Leaving a blank line will mean this line will be the start of a
  new paragraph.

The following is how this would be represented in JSON (note the new line characters \n in the strings).

{
  "literal": "This is a literal block scalar.\nNew lines are preserved so this will appear on a new line.",
  "folded": "This is a folded block scalar. New lines aren't preserved so this line will appear in the same paragraph as the first, separated by only a space.\nLeaving a blank line will mean this line will be the start of a new paragraph."
}

Let's use the folded style block scalar to provide our employees with a short bio. Something like the following.

name:
  first: Joe
  last: Blogs
jobTitle: Software Engineer
bio: >
  Joe loves writing programs to solve any problem he comes across,
  whether it be simple day to day problems or full blown
  enterprise solutions.

  He has been a software engineer for roughly 15 years and has
  loved every second of it.

Scalar Data Types

The scalars we have seen so far are all text based and as such these are interpreted as strings. However, YAML checks to see if the scalar would be better represented as another data type, for example an integer.

Below are some data types that YAML recognizes:

integers e.g. 1, 100, 5000
floating points e.g. 1.2, 40.6, 1e+10
boolean e.g. true, false
null e.g. null

In the following example we have added an age to our employee entry. YAML can see that age contains a number and as such will interpret it as the number 35 rather than as a string containing the characters 3 and 5.

name:
  first: Joe
  last: Blogs
jobTitle: Software Engineer
bio: >
  Joe loves writing programs to solve any problem he comes across,
  whether it be simple day to day problems or full blown
  enterprise solutions.

  He has been a software engineer for roughly 15 years and has
  loved every second of it.
age: 35

Sequences

YAML uses a hyphen - followed by a space to denote a new node in a sequence e.g. - new node. A sequence is like an array or list and a node is like an element in the array/list. Consider the following example where we have added a sequence of programming languages that our employee is an expert in.

name:
  first: Joe
  last: Blogs
jobTitle: Software Engineer
bio: >
  Joe loves writing programs to solve any problem he comes across,
  whether it be simple day to day problems or full blown
  enterprise solutions.

  He has been a software engineer for roughly 15 years and has
  loved every second of it.
age: 35
programmingLanguages:
  - Python
  - Java
  - C++

In this example, programmingLanguages is a sequence of three nodes, each containing a plain scalar (see Flow Scalars for a reminder of what a plain scalar is). Considering only the programmingLanguages key, the following is the equivalent JSON.

{ "programmingLanguages": ["Python", "Java", "C++"] }

Sequences aren't just limited to having scalar values though, they could be mappings or even other sequences. In the following example, programmingLanguages has been updated to contain a bit more information about each of the languages. This has been done by changing the node value from a plain scalar to a mapping. When looking at the example, it is important to remember that the - character denotes a new item in a sequence and hence a new mapping, meaning each mapping in the sequence contains the keys name and yearsExperience. Don't worry if it isn't immediately obvious, it can be a little hard to read when you aren't familiar with YAML, but you will soon get the hang of it!

name: Joe Blogs
jobTitle: Software Engineer
bio: >
  Joe loves writing programs to solve any problem he comes across,
  whether it be simple day to day problems or full blown
  enterprise solutions.

  He has been a software engineer for roughly 15 years and has
  loved every second of it.
age: 35
programmingLanguages:
  - name: Python
    yearsExperience: 7
  - name: Java
    yearsExperience: 4
  - name: C++
    yearsExperience: 15

Again, only considering the programmingLanguages key, the following is the equivalent JSON.

{
  "programmingLanguages": [
    { "name": "Python", "yearsExperience": 7 },
    { "name": "Java", "yearsExperience": 4 },
    { "name": "C++", "yearsExperience": 15 }
  ]
}

Our employee directory isn't particularly useful with only a single employee in it. It would be better to have a sequence of employees that work for the company. In following example, we turn the employee directory into a sequence that has a node containing Joe Blogs's information and the information of a second employee.

- name: Joe Blogs
  jobTitle: Software Engineer
  bio: >
    Joe loves writing programs to solve any problem he comes
    across, whether it be simple day to day problems or full blown
    enterprise solutions.

    He has been a software engineer for roughly 15 years and has
    loved every second of it.
  age: 35
  programmingLanguages:
    - name: Python
      yearsExperience: 7
    - name: Java
      yearsExperience: 4
    - name: C++
      yearsExperience: 15
- name: Jane Smith
  jobTitle: Junior Web Developer
  bio: >
    Jane is new to her job and is really excited to get stuck into
    her first project so she can develop some new skills.
  age: 21
  programmingLanguages:
    - name: JavaScript
      yearsExperience: 0

The following block is the equivalent in JSON.

[
  {
    "name": "Joe Blogs",
    "jobTitle": "Software Engineer",
    "bio": "Joe loves writing programs to solve any problem he comes across, whether it be simple day to day problems or full blown enterprise solutions.\nHe has been a software engineer for roughly 15 years and has loved every second of it.",
    "age": 35,
    "programmingLanguages": [
      { "name": "Python", "yearsExperience": 7 },
      { "name": "Java", "yearsExperience": 4 },
      { "name": "C++", "yearsExperience": 15 }
    ]
  },
  {
    "name": "Jane Smith",
    "jobTitle": "Junior Web Developer",
    "bio": "Jane is new to her job and is really excited to get stuck into her first project so she can develop some new skills.",
    "age": 21,
    "programmingLanguages": [
      { "name": "JavaScript", "yearsExperience": 0 }
    ]
  }
]

Concluding Thoughts

That concludes our guide which has hopefully given you enough information to get started working in YAML. Most data can be represented by the structures and scalars that have been covered in this guide, but there is more to learn about YAML. I am aiming to write a guide covering some slightly more advanced topics in the near future, but until then Learn YAML in Y Minutes provides a good reference once you are familiar with the YAML basics.