Beginner's Guide to YAML
Written by Alex Lee
Published on 22 February 2020
Estimated reading time of 9 minutes
YAML, a recursive acronym of "YAML Ain't Markup Language", is billed as a human-friendly data serialization language that works well for common use cases such as configuration files, log files, and cross-language data sharing (yaml.org | Introduction).
This guide aims to provide you with an introduction to some of the main features and concepts of the language to help you get started using it. This guide will be built around an example in which we create an employee directory in YAML. This will provide us with a central place to store all important information about our employees such as their name, job title, skills, etc. We will update the directory as we progress though the guide, applying what we learn at each step.
YAML is a superset of JSON, a data serialization language many of you may be more familiar with. As such, I will provide the JSON equivalent next to some of the YAML examples to hopefully aid in your understanding and demonstrate how YAML can be a simpler syntax to work with.
Data Structures
YAML is built around three primitive representations of data (yaml.org | Introduction):
Each of these will be covered in this guide (clicking on the links will take you to the relevant section).
Mappings
Mappings use a colon :
followed by a space to denote a key-value
pair e.g. key: value
. The key provides a way to reference a
piece of information and the value is the information being
referenced. It might help to replace the :
with an =
in you
mind e.g. key: value
could be read as key = value
.
The following code block contains our first example of the employee directory we are going to be working with in this guide. In it, we have started recording some data about a single employee, Joe Blogs, who works as a software engineer.
name: Joe Blogs
jobTitle: Software Engineer
Mappings are equivalent to objects in JSON. The following JSON is equivalent to the above YAML.
{
"name": "Joe Blogs",
"jobTitle": "Software Engineer"
}
Whitespace
Whitespace is an important entity in YAML as it is one of the mechanisms used to provide structure to the document i.e. by indenting a mapping, you can indicate that the mapping belongs to the preceding key. For example, we might want to store the employee's name split into their first and last names as in the following example.
name:
first: Joe
last: Blogs
jobTitle: Software Engineer
In this example, the document contains a mapping with the keys
name
and jobTitle
. This is the base structure in the document.
The value of the name
key in the base mapping is itself a
mapping containing the keys first
and last
.
This is equivalent to the following JSON.
{
"name": { "first": "Joe", "last": "Blogs" },
"jobTitle": "Software Engineer"
}
The YAML specification states that only spaces should be used for indention, not tab characters (yaml.org | Indentation Spaces). This shouldn't pose a problem when writing YAML as most modern text editors can be configured to enter spaces when the tab key is pressed rather than a tab character. The common value for indentation is two spaces as used in these examples.
Scalars
Scalars contain the data that isn't structurally important to the document. We have already seen some examples of scalars in the start of our employee directory (repeated below for convenience).
name:
first: Joe
last: Blogs
jobTitle: Software Engineer
In this example, the scalars are the values in the shown mappings
i.e. Joe
, Blogs
and Software Engineer
.
Flow Scalars
When the value is defined on the same line as the key, as in the
above example, these scalars are given the name flow scalars.
There are three different styles of flow scalar: plain which
aren't wrapped in anything; single quoted which are wapped with
'
; and double quoted which are wrapped with "
and enables
the inclusion of special characters such as new line characters
\n
.
plain: Plain flow scalars look like this
single: 'Single quoted flow scalars look like this'
double: "Double quoted flow scalars look like this"
Block Scalars
There are times when it is useful to contain more than a single line of information in a scalar. This is where block scalars come in. There are two types of block scalar: literal which preserves all white space contained in the scalar (apart from leading indentation) and folded which replaces new line characters with spaces (new lines can be included by leaving a blank line).
To use a block scalar, you have to specify a special character on
the same line as the key the block scalar will belong to. The
content of the block scalar follows on the following lines. A pipe
character |
is used to introduce literal blocks and a greater
than character >
is used to introduce folded blocks.
literal: |
This is a literal block scalar.
New lines are preserved so this will appear on a new line.
folded: >
This is a folded block scalar.
New lines aren't preserved so this line will appear in the same
paragraph as the first, separated by only a space.
Leaving a blank line will mean this line will be the start of a
new paragraph.
The following is how this would be represented in JSON (note the
new line characters \n
in the strings).
{
"literal": "This is a literal block scalar.\nNew lines are preserved so this will appear on a new line.",
"folded": "This is a folded block scalar. New lines aren't preserved so this line will appear in the same paragraph as the first, separated by only a space.\nLeaving a blank line will mean this line will be the start of a new paragraph."
}
Let's use the folded style block scalar to provide our employees with a short bio. Something like the following.
name:
first: Joe
last: Blogs
jobTitle: Software Engineer
bio: >
Joe loves writing programs to solve any problem he comes across,
whether it be simple day to day problems or full blown
enterprise solutions.
He has been a software engineer for roughly 15 years and has
loved every second of it.
Scalar Data Types
The scalars we have seen so far are all text based and as such these are interpreted as strings. However, YAML checks to see if the scalar would be better represented as another data type, for example an integer.
Below are some data types that YAML recognizes:
- integers e.g.
1
,100
,5000
- floating points e.g.
1.2
,40.6
,1e+10
- boolean e.g.
true
,false
- null e.g.
null
In the following example we have added an age
to our employee
entry. YAML can see that age
contains a number and as such will
interpret it as the number 35
rather than as a string containing
the characters 3
and 5
.
name:
first: Joe
last: Blogs
jobTitle: Software Engineer
bio: >
Joe loves writing programs to solve any problem he comes across,
whether it be simple day to day problems or full blown
enterprise solutions.
He has been a software engineer for roughly 15 years and has
loved every second of it.
age: 35
Sequences
YAML uses a hyphen -
followed by a space to denote a new node in
a sequence e.g. - new node
. A sequence is like an array or list
and a node is like an element in the array/list. Consider the
following example where we have added a sequence of programming
languages that our employee is an expert in.
name:
first: Joe
last: Blogs
jobTitle: Software Engineer
bio: >
Joe loves writing programs to solve any problem he comes across,
whether it be simple day to day problems or full blown
enterprise solutions.
He has been a software engineer for roughly 15 years and has
loved every second of it.
age: 35
programmingLanguages:
- Python
- Java
- C++
In this example, programmingLanguages
is a sequence of three
nodes, each containing a plain scalar (see
Flow Scalars for a reminder of what a plain
scalar is). Considering only the programmingLanguages
key, the
following is the equivalent JSON.
{ "programmingLanguages": ["Python", "Java", "C++"] }
Sequences aren't just limited to having scalar values though, they
could be mappings or even other sequences. In the following
example, programmingLanguages
has been updated to contain a bit
more information about each of the languages. This has been done
by changing the node value from a plain scalar to a mapping. When
looking at the example, it is important to remember that the -
character denotes a new item in a sequence and hence a new
mapping, meaning each mapping in the sequence contains the keys
name
and yearsExperience
. Don't worry if it isn't immediately
obvious, it can be a little hard to read when you aren't familiar
with YAML, but you will soon get the hang of it!
name: Joe Blogs
jobTitle: Software Engineer
bio: >
Joe loves writing programs to solve any problem he comes across,
whether it be simple day to day problems or full blown
enterprise solutions.
He has been a software engineer for roughly 15 years and has
loved every second of it.
age: 35
programmingLanguages:
- name: Python
yearsExperience: 7
- name: Java
yearsExperience: 4
- name: C++
yearsExperience: 15
Again, only considering the programmingLanguages
key, the
following is the equivalent JSON.
{
"programmingLanguages": [
{ "name": "Python", "yearsExperience": 7 },
{ "name": "Java", "yearsExperience": 4 },
{ "name": "C++", "yearsExperience": 15 }
]
}
Our employee directory isn't particularly useful with only a single employee in it. It would be better to have a sequence of employees that work for the company. In following example, we turn the employee directory into a sequence that has a node containing Joe Blogs's information and the information of a second employee.
- name: Joe Blogs
jobTitle: Software Engineer
bio: >
Joe loves writing programs to solve any problem he comes
across, whether it be simple day to day problems or full blown
enterprise solutions.
He has been a software engineer for roughly 15 years and has
loved every second of it.
age: 35
programmingLanguages:
- name: Python
yearsExperience: 7
- name: Java
yearsExperience: 4
- name: C++
yearsExperience: 15
- name: Jane Smith
jobTitle: Junior Web Developer
bio: >
Jane is new to her job and is really excited to get stuck into
her first project so she can develop some new skills.
age: 21
programmingLanguages:
- name: JavaScript
yearsExperience: 0
The following block is the equivalent in JSON.
[
{
"name": "Joe Blogs",
"jobTitle": "Software Engineer",
"bio": "Joe loves writing programs to solve any problem he comes across, whether it be simple day to day problems or full blown enterprise solutions.\nHe has been a software engineer for roughly 15 years and has loved every second of it.",
"age": 35,
"programmingLanguages": [
{ "name": "Python", "yearsExperience": 7 },
{ "name": "Java", "yearsExperience": 4 },
{ "name": "C++", "yearsExperience": 15 }
]
},
{
"name": "Jane Smith",
"jobTitle": "Junior Web Developer",
"bio": "Jane is new to her job and is really excited to get stuck into her first project so she can develop some new skills.",
"age": 21,
"programmingLanguages": [
{ "name": "JavaScript", "yearsExperience": 0 }
]
}
]
Concluding Thoughts
That concludes our guide which has hopefully given you enough information to get started working in YAML. Most data can be represented by the structures and scalars that have been covered in this guide, but there is more to learn about YAML. I am aiming to write a guide covering some slightly more advanced topics in the near future, but until then Learn YAML in Y Minutes provides a good reference once you are familiar with the YAML basics.