Docker swarm on AWS with Ansible
Docker swarm on AWS with Ansible
Introduction
In this tutorial we'll be using Ansible to deploy a swarm cluster of 3 nodes on AWS.
Ansible
From the official documentation: Ansible is an IT automation tool. It can configure systems, deploy software, and orchestrate more advanced IT tasks such as continuous deployments or zero downtime rolling updates. Ansible’s main goals are simplicity and ease-of-use.
Ansible scripts are meant to reach a desired state
and not to do a list of commands
. If you wish to install curl
on a server, Ansible will try to install curl ONLY IF it isn't installed yet.
This is an ansible task
which download a file at a specific dest
with specific user permissions
. And in case the file already exists at the desired dest
it will only replace the file if the content has changed.
- name: Fetching super_saiyan.py
get_url:
url: https://vegeta.world/super_saiyan.py
dest: /my/path/super_saiyan.py
owner: goku
group: dragonballz
mode: u=rwx,g=rwx,o=rx
force: yes
When running an ansible script you'll see statuses such as ok
or changed
.
ok
: nothing has been done, either the state is already met or we just set afact
.changed
: A command has been run to meet the desired state.
Docker swarm
Docker swarm is an orchestration service. Easy to use, it allows us to run our webapps at scale in production. You can see a summary list of its features here.
Some ressources from Docker official documentation on how to setup and use Docker swarm:
Setup and deploy ec2
For this project I'm using Python 3.7.4
and pip 19.0.3
and I'm running under Fedora 30
. In this tutorial I assume:
- You have an AWS account
- You are familiar with
ec2
andsecurity groups
from AWS - You have an IAM user with Programmatic access (AWS documention)
Dependencies
For this project we will need:
- ansible
- boto3
- boto
- botocore
# install dependencies
pip install ansible boto boto3 botocore
Here are the versions I have while I'm writing this.
[goku@vegeta tutorial]$ pip freeze | grep -E "boto3|boto|botocore|ansible"
ansible==2.8.5
boto==2.49.0
boto3==1.9.233
botocore==1.12.233
Export in your shell the AWS_SECRET_ACCESS_KEY
, AWS_ACCESS_KEY_ID
and AWS_REGION
variables.
export AWS_ACCESS_KEY_ID='YOUR_AWS_API_KEY'
export AWS_SECRET_ACCESS_KEY='YOUR_AWS_API_SECRET_KEY'
# I'm currently on eu-west-1
export AWS_REGION='YOUR_AWS_REGION'
Let's test if everything is good. Create a file test_ec2_playbook.yml
.
# test_ec2_playbook.yml
---
# we don't connect to any servers, we use the AWS api
- hosts: localhost
# Do not use ssh
connection: local
# Do not fetch variables from the server (there is none)
gather_facts: False
tasks:
- name: "Gathering facts about all subnets and set fact: 'vpc_subnets_facts'"
# AWS module to gather facts (variables) about subnets
# https://docs.ansible.com/ansible/latest/modules/ec2_vpc_subnet_facts_module.html
ec2_vpc_subnet_facts:
# Create a fact "vpc_subnets_facts" from the result
register: vpc_subnets_facts
- name: "Printing 'vpc_subnets_facts' fact"
# https://docs.ansible.com/ansible/latest/modules/debug_module.html
debug:
var: vpc_subnets_facts
Now run it. It will print all your subnets
in the region.
ansible-playbook test_ec2_playbook.yml
If you have something like this, you're all good ! (if you see warnings, ignore them).
[...]
"id": "subnet-e6bd14bc",
"ipv6_cidr_block_association_set": [],
"map_public_ip_on_launch": true,
"owner_id": "859761867273",
"state": "available",
"subnet_arn": "arn:aws:ec2:eu-west-1:859761867273:subnet/subnet-e6bd14bc",
"subnet_id": "subnet-e6bd14bc",
"tags": {},
"vpc_id": "vpc-453c0e23"
},
{
"assign_ipv6_address_on_creation": false,
"availability_zone": "eu-west-1c",
"availability_zone_id": "euw1-az1",
"available_ip_address_count": 4091,
"cidr_block": "172.31.0.0/20",
"default_for_az": true,
"id": "subnet-94c0acf2",
"ipv6_cidr_block_association_set": [],
"map_public_ip_on_launch": true,
"owner_id": "859761867273",
"state": "available",
"subnet_arn": "arn:aws:ec2:eu-west-1:859761867273:subnet/subnet-94c0acf2",
"subnet_id": "subnet-94c0acf2",
"tags": {},
"vpc_id": "vpc-453c0e23"
}
]
}
}
PLAY RECAP *******************************************************************************************************
localhost : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Choose one subnet
, write down your vpc_id
and your subnet_id
. Here they are "vpc_id": "vpc-453c0e23"
and "subnet_id": "subnet-94c0acf2"
. If you have multiples, just pick one. We will need them later.
Setup architecture
Now we need to deploy our ec2 instances. In our script we will have 4 steps:
- Create
security_groups
- Create a
key pair
(or use pre-existing one) - Find an
AMI
, in this tutorial we'll use Debian 9 (Stretch) - Create 3
ec2
Because we have 4 clear steps, we will write a file for each with their belonging tasks.
Create security groups
In AWS the security groups allow us to manage our ports. This is like a firewall around the server. It gives us features which make easier to manage our ports.
Create a folder tasks
. Within it create a file named create_security_groups.yml
.
Do not worry about "{{ VPC_ID }}"
this is a variable and we will see later how to set a value for it.
# tasks/create_security_groups.yml
---
- name: create Swarm security groups
# https://docs.ansible.com/ansible/latest/modules/ec2_group_module.html
ec2_group:
# The id of the security group
name: swarm
description: >-
Allow swarm members to communicate between themselves,
allow ssh and open ports for the example voting app.
# The shown name
tags:
Name: swarm
# Left as it is, we will set the variable later
vpc_id: "{{ VPC_ID }}"
rules:
- proto: tcp
from_port: 22
to_port: 22
rule_desc: "Allow ssh from everywhere."
cidr_ip: 0.0.0.0/0
cidr_ipv6: ::/0
- proto: tcp
from_port: 2377
to_port: 2377
group_name: swarm
rule_desc: "Cluster management communications."
- proto: tcp
from_port: 7946
to_port: 7946
group_name: swarm
rule_desc: "Communication among nodes."
- proto: udp
from_port: 7946
to_port: 7946
group_name: swarm
rule_desc: "Communication among nodes."
- proto: udp
from_port: 4789
to_port: 4789
group_name: swarm
rule_desc: "Overlay network traffic."
- proto: tcp
from_port: 8080
to_port: 8080
rule_desc: "Open ports for visualizer."
cidr_ip: 0.0.0.0/0
cidr_ipv6: ::/0
- proto: tcp
from_port: 5000
to_port: 5001
rule_desc: "Open ports for the vote and result apps."
cidr_ip: 0.0.0.0/0
cidr_ipv6: ::/0
Here we create our security_group
in order to allow ssh
, the ports for the service
we will deploy and the ports needed by docker swarm
but only limited to the members of the swarm
.
Create key pair (or use pre-existing one)
In order to connect by ssh to our servers will we need a key pair.
# tasks/create_key_pair.yml
---
- name: "Trying to create a key pair"
# https://docs.ansible.com/ansible/latest/modules/ec2_key_module.html
ec2_key:
name: "{{ KEY_PAIR_NAME }}"
# If it exists do not overwrite
force: false
register: ansible_tutorial_private_key
# Do not stop the playbook if it fails
ignore_errors: yes
###############################################################
# ALL the code below will only run if the key pair is created #
# look at the "when" keys which controls when to run a task #
###############################################################
- name: "Saving private key to {{ KEY_PAIR_LOCAL_PATH }}"
# https://docs.ansible.com/ansible/latest/modules/copy_module.html
copy:
dest: "{{ KEY_PAIR_LOCAL_PATH }}"
content: "{{ ansible_tutorial_private_key['key']['private_key'] }}"
# Permissions for private key
mode: '0600'
# We do not want to overwrite a pre existing key
force: no
# We wish to run another task in case of error
register: save_ansible_tutorial_private_key
# ONLY run if "ansible_tutorial_private_key" was a success
when: ansible_tutorial_private_key.changed == true
# Do not stop the playbook if it fails
ignore_errors: yes
- name: "Deleting {{ KEY_PAIR_NAME }} since it couldn't be saved"
# https://docs.ansible.com/ansible/latest/modules/ec2_key_module.html
ec2_key:
name: "{{ KEY_PAIR_NAME }}"
# Delete it
state: absent
# ONLY run if the previous tasks didn't write to the file
when: ansible_tutorial_private_key.changed == true and save_ansible_tutorial_private_key.changed == false
register: delete_key_pair
# Stop the playbook and mark it as failed
# https://docs.ansible.com/ansible/latest/modules/fail_module.html
- fail:
msg: >
The file {{ KEY_PAIR_LOCAL_PATH }} already exists but the
key pair {{ KEY_PAIR_NAME }} doesn't exists on AWS.
Change either the KEY_PAIR_NAME or the KEY_PAIR_LOCAL_PATH
# ONLY run if the previous tasks didn't write to the file
when: delete_key_pair.changed == true
When you create a key pair, AWS gives you the private key.
:warning: This is the only time you have access to it ! if you don't download it or loose it, the key pair becomes useless.
There is 4 ways those tasks
can be triggered:
- Neither the key pair nor the file exist. The script will create the key pair and create a file with its private key.
- The key pair exists and the file too. Nothing is done since this is what we already want.
- The key pair doesn't exist but the file does. We won't erase the file as we do not know what is contains, we will delete the key pair as we couldn't save the private key and raise an error asking the user to check his variables.
- The key pair exists but the file doesn't. We will delete the key pair as we can't save the private key and raise an error asking the user to check his variables.
Find an AMI Debian 9 (Stretch)
# tasks/set_fact_ec2_ami.yml
---
- name: "set fact: Debian 9 (Stretch) AMI"
# https://docs.ansible.com/ansible/latest/modules/ec2_ami_facts_module.html
ec2_ami_facts:
# see the full list of filtered elements:
# https://eu-west-1.console.aws.amazon.com/ec2/home?region=eu-west-1#Images:visibility=public-images;search=379101102735,debian-stretch-hvm-x86_64-gp2-*;sort=desc:name
owners: 379101102735
filters:
name: "debian-stretch-hvm-x86_64-gp2-*"
register: debian_ami
- name: "set fact: get id from Debian 9 (Stretch) AMI"
# https://docs.ansible.com/ansible/latest/modules/set_fact_module.html
set_fact:
# There is many debian AMI, let's take the first
debian_ami_id: "{{ debian_ami.images[0].image_id }}"
Those tasks are straightforward. We look for a Debian 9 (Stretch) AMI made by Debian (379101102735), take the first and save the id in a fact
.
Create ec2
We wish to create 3 ec2. 1 manager and 2 workers. We will iterate over a variable with the key loop
.
# tasks/create_ec2.yml
---
- name: Creating ec2 instances
# https://docs.ansible.com/ansible/latest/modules/ec2_module.html
ec2:
key_name: "{{ KEY_PAIR_NAME }}"
# The security group
group: "{{ item.group }}"
instance_type: t2.nano
image: "{{ debian_ami_id }}"
# Wait until the ec2 is created (not running yet)
wait: yes
# Garanty only 1 ec2 with those count_tags
exact_count: 1
# Tags to identify uniquely an ec2
count_tag:
Name: "{{ item.tags.Name }}"
# Other tags to attach to the ec2
instance_tags: "{{ item.tags }}"
# The subnet where to create the ec2
vpc_subnet_id: "{{ SUBNET_ID }}"
assign_public_ip: yes
# Run this tasks for all elements in EC2_INSTANCES
loop: "{{ EC2_INSTANCES }}"
register: create_ec2
Notice the loop: "{{ EC2_INSTANCES }}"
at the end of the task. It iterates the same module over the content of EC2_INSTANCES
var.
Now let's create our EC2_INSTANCES
variable in a file. Create a folder vars
and within it a file named ec2.yml
.
# vars/ec2.yml
---
- EC2_INSTANCES:
- tags:
Name: "swarm-manager"
SwarmType: manager
Swarm: yes
group:
- swarm
- tags:
Name: "swarm-worker-a"
SwarmType: worker
Swarm: yes
group:
- swarm
- tags:
Name: "swarm-worker-b"
SwarmType: worker
Swarm: yes
group:
- swarm
We store the tags
and the security_group
properties.
Create the playbook
In order to use our new tasks we will create a playbook which includes all 4 of them. Just like the playbook we wrote for the test. Create a create_ec2_playbook.yml
file at the root of your project.
# create_ec2_playbook.yml
---
- hosts: localhost
connection: local
gather_facts: False
vars:
# The vpc_id and subnet_id choosen earlier
VPC_ID: "YOUR_VPC_ID"
SUBNET_ID: "YOUR_SUBNET_ID"
# The key pair name on AWS
KEY_PAIR_NAME: ansible_tutorial
# The path to the private key on your machine
KEY_PAIR_LOCAL_PATH: "~/.ssh/ansible_tutorial.pem"
# Load variables from file
vars_files:
- vars/ec2.yml
tasks:
# Include all tasks within file
- include_tasks: tasks/create_security_groups.yml
- include_tasks: tasks/create_key_pair.yml
- include_tasks: tasks/set_fact_ec2_ami.yml
- include_tasks: tasks/create_ec2.yml
Please fill the VPC_ID
and SUBNET_ID
vars with the vpc_id
and the subnet_id
we have chosen after running ansible-playbook test_ec2_playbook.yml
.
(optional) If you wish to use a pre-existing key_pair
please change KEY_PAIR_NAME
to its name on AWS and KEY_PAIR_LOCAL_PATH
to the local path to the private key on your machine.
Your current project structure should look like this now.
[goku@vegeta tutorial]$ tree
.
├── create_ec2_playbook.yml
├── tasks
│ ├── create_ec2.yml
│ ├── create_key_pair.yml
│ ├── create_security_groups.yml
│ └── set_fact_ec2_ami.yml
├── test_ec2_playbook.yml
└── vars
└── ec2.yml
2 directories, 7 files
Now run it !
ansible-playbook create_ec2_playbook.yml
Setup Docker swarm cluster
Now that we have our architecture running we wish to deploy a docker swarm cluster.
Inventory
In order to run our scripts against the desired servers Ansible uses an inventory
. In short, an inventory
is where you store your servers' ip addresses. You name them, create groups, setup variables by hosts / groups, it has a lot of features. It can be a folder or file.
But in our case, we create our servers dynamically, which means each time we deploy new servers, their ip address changes. So we will use the AWS ec2 plugin which creates dynamic hosts
.
If you wish to know more about inventories you check the documentation here.
Ansible and Dynamic Amazon EC2 Inventory Management
Create an inventory
folder at the project root. Within, we will create two files:
ec2.py
the AWS core plugin, you can download it hereec2.ini
the configuration file, I've adjusted one for the tutorial here or you can download the default one here
Make ec2.py
executable to avoid any issues with ansible.
chmod +x inventory/ec2.py
Test if the plugin is working as desired. It may take a few moment to get the result as the plugin is doing multiples requests on AWS
inventory/ec2.py --list
You should see something like this:
[...]
"ec2_tag_Name": "swarm-worker-b",
"ec2_tag_Swarm": "True",
"ec2_tag_SwarmType": "worker",
"ec2_virtualization_type": "hvm",
"ec2_vpc_id": "vpc-453c0e23"
}
}
},
"ec2": [
"34.254.184.38",
"34.244.247.140",
"34.255.136.188"
],
"tag_Name_swarm_manager": [
"34.254.184.38"
],
"tag_Name_swarm_worker_a": [
"34.244.247.140"
],
"tag_Name_swarm_worker_b": [
"34.255.136.188"
],
"tag_SwarmType_manager": [
"34.254.184.38"
],
"tag_SwarmType_worker": [
"34.244.247.140",
"34.255.136.188"
],
"tag_Swarm_True": [
"34.254.184.38",
"34.244.247.140",
"34.255.136.188"
]
}
Here you can see all the different servers, groups and variables available.
if you have any troubles or wish to know more about it, here is the documentation.
Project setup
Now create an ansible.cfg
file at the root of our project to specify a few default options.
# ansible.cfg
[defaults]
# (optional) the host check keys from ssh
# is enabled by default and may causes
# issues while following the tutorial
host_key_checking=false
# Tells ansible where the inventory files are
inventory=inventory/
Here we disable host_key_checking
to remove constraints from this tutorial as it may causes issues (it is security related). We also define where the inventory files
are.
We have set the KEY_PAIR_NAME
and KEY_PAIR_LOCAL_PATH
variables in the create_ec2_playbook.yml
playbook. But we will need them in our next playbooks. So we'll move them in a variable file.
Create a folder group_vars
and create a file all.yml
in it. Move your KEY_PAIR_NAME
and KEY_PAIR_LOCAL_PATH
variables in it.
# group_vars/all.yml
---
KEY_PAIR_NAME: ansible_tutorial
KEY_PAIR_LOCAL_PATH: "~/.ssh/ansible_tutorial.pem"
All files within groups_vars
are automatically read by Ansible depending on the current group
or host
currently running. Here all.yml
will be used for all hosts
or groups
. This way we ensure those variables are accessible from everywhere.
By default our ec2
can be accessed with the user admin
(this is related to the AMI we're using). To specify Ansible which user and which private key to use we will create an other file ec2.yml
in group_vars
. All dynamic hosts from the AWS ec2 plugin are part of the ec2 group.
# group_vars/ec2.yml
---
ansible_user: admin
ansible_ssh_private_key_file: "{{ KEY_PAIR_LOCAL_PATH }}"
Those variables will be used ONLY by the hosts
that belonged to the ec2 group.
To check everything is setup correctly we'll write a little playbook. Note this test will work only if your servers are online and running.
# test_dynamic_aws_host_playbook.yml
---
# Run the playbook for all ec2 instances sharing the tag Swarm=True
- hosts: "tag_Swarm_True"
tasks:
- name: "Ping the ec2 with Swarm=True tag"
# https://docs.ansible.com/ansible/latest/modules/ping_module.html
ping:
Our current project structure should look like this:
[goku@vegeta tutorial]$ tree
.
├── ansible.cfg
├── create_ec2_playbook.yml
├── group_vars
│ ├── all.yml
│ └── ec2.yml
├── inventory
│ ├── ec2.ini
│ └── ec2.py
├── tasks
│ ├── create_ec2.yml
│ ├── create_key_pair.yml
│ ├── create_security_groups.yml
│ └── set_fact_ec2_ami.yml
├── test_dynamic_aws_host_playbook.yml
├── test_ec2_playbook.yml
└── vars
└── ec2.yml
4 directories, 13 files
Run it (may take a few moments because of the AWS dynamic host)
ansible-playbook test_dynamic_aws_host_playbook.yml
If you see something like this everything is working as expected :ok_hand:
PLAY [tag_Swarm_True] *********************************************************************************************************************
TASK [Gathering Facts] ********************************************************************************************************************
ok: [34.240.55.248]
ok: [34.242.96.200]
ok: [18.203.135.17]
TASK [Ping the ec2 with Swarm=True tag] ***************************************************************************************************
ok: [18.203.135.17]
ok: [34.240.55.248]
ok: [34.242.96.200]
PLAY RECAP ********************************************************************************************************************************
18.203.135.17 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
34.240.55.248 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
34.242.96.200 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Something is wrong if you see something like this :collision:
PLAY [tag_Swarm_true] *********************************************************************************************************************
skipping: no hosts matched
It means either your ec2
aren't online and running or they do not have the tag Swarm=True
(case sensitive).
Install Docker
We will install Docker on a Debian 9 (Stretch). I've followed the instruction from the official Docker documentation and wrote the tasks.
# tasks/install_docker.yml
---
- name: Ensure dependencies are installed
# https://docs.ansible.com/ansible/latest/modules/apt_module.html
apt:
name:
- apt-transport-https
- ca-certificates
- curl
- gnupg-agent
- software-properties-common
state: present
- name: Add Docker apt key
# https://docs.ansible.com/ansible/latest/modules/apt_key_module.html
apt_key:
url: https://download.docker.com/linux/debian/gpg
id: 9DC858229FC7DD38854AE2D88D81803C0EBFCD88
state: present
register: add_repository_key
- name: Get debian version
# https://docs.ansible.com/ansible/latest/modules/command_module.html
command: lsb_release -cs
register: debian_version_name
- name: Add Docker repository
# https://docs.ansible.com/ansible/latest/modules/apt_repository_module.html
apt_repository:
repo: "deb [arch=amd64] https://download.docker.com/linux/debian {{ debian_version_name.stdout }} stable"
state: present
update_cache: true
- name: Ensure docker is installed
# https://docs.ansible.com/ansible/latest/modules/apt_module.html
apt:
name:
- docker-ce
- docker-ce-cli
- containerd.io
state: present
update_cache: true
- name: Ensure Docker is started and enabled at boot
# https://docs.ansible.com/ansible/latest/modules/service_module.html
service:
name: docker
state: started
enabled: true
- name: Ensure handlers are notified now to avoid firewall conflicts
# https://docs.ansible.com/ansible/latest/modules/meta_module.html
meta: flush_handlers
- name: "Ensure the user {{ ansible_user }} is part of the docker group"
# https://docs.ansible.com/ansible/latest/modules/user_module.html
user:
name: "{{ ansible_user }}"
groups: docker
append: yes
#########################################################################################
# We install the docker python module as it is recommended #
# for the docker_swarm module that we will use later #
# https://docs.ansible.com/ansible/latest/modules/docker_swarm_module.html#requirements #
#########################################################################################
- name: Ensure docker python module and jsondiff are installed
# https://docs.ansible.com/ansible/latest/modules/pip_module.html
pip:
name:
- docker
# jsondiff and pyyaml are needed by the docker_stack module
- jsondiff
- pyyaml
register: pip_install_docker
ignore_errors: yes
- name: Fetching pip
# https://docs.ansible.com/ansible/latest/modules/get_url_module.html
get_url:
url: https://bootstrap.pypa.io/get-pip.py
dest: "/home/{{ ansible_user }}/get-pip.py"
mode: u=rwx,g=rwx,o=rx
when: pip_install_docker is failed
- name: Installing pip
# https://docs.ansible.com/ansible/latest/modules/command_module.html
command: "python /home/{{ ansible_user }}/get-pip.py"
when: pip_install_docker is failed
- name: Installing docker python module and jsondiff
# https://docs.ansible.com/ansible/latest/modules/pip_module.html
pip:
name:
- docker
# jsondiff and pyyaml are needed by the docker_stack module
- jsondiff
- pyyaml
when: pip_install_docker is failed
Init Docker Swarm
Docker is installed, we can now init the Swarm on a manager node. Docker Swarm is easy to setup:
- Init the Swarm on a server and save the
join token worker
andjoin token manager
- On the other
managers
make them join the Swarm thanks to thejoin token manager
- On the
workers
use thejoin token worker
We will ask Ansible to use the first Swarm manager node to init the swarm. We could specify a specific server but I prefer to just take the first one. In a large swarm cluster you have multiple manager
and worker
nodes and by taking the first, our script will still be working in case the very server we specified is down
for example.
# tasks/init_swarm.yml
---
- name: Setting up Swarm
# Similar to a Try/Except
# https://docs.ansible.com/ansible/latest/user_guide/playbooks_blocks.html#blocks-error-handling
block:
- name: Initiating the swarm with default parameters
# https://docs.ansible.com/ansible/latest/modules/docker_swarm_module.html
docker_swarm:
state: present
register: init_swarm
- name: "set fact: join token worker"
set_fact:
join_token_worker: "{{ init_swarm.swarm_facts.JoinTokens.Worker }}"
- name: "set fact: join token manager"
set_fact:
join_token_manager: "{{ init_swarm.swarm_facts.JoinTokens.Manager }}"
rescue:
- name: Getting join manager token from existing Swarm
# https://docs.ansible.com/ansible/latest/modules/command_module.html
command: docker swarm join-token manager -q
register: join_token_manager_command
- name: Getting join worker token from existing Swarm
# https://docs.ansible.com/ansible/latest/modules/command_module.html
command: docker swarm join-token worker -q
register: join_token_worker_command
- name: "set fact: join_token_manager"
set_fact:
join_token_manager: "{{ join_token_manager_command['stdout'] }}"
- name: "set fact: join_token_worker"
set_fact:
join_token_worker: "{{ join_token_worker_command['stdout'] }}"
Join the worker nodes
Our Swarm is initiated, from the worker nodes, use the join token worker
and specify a manager node's address to make them part of the Swarm.
# tasks/join_worker_node.yml
---
- name: "set fact: first Swarm manager host"
set_fact:
first_swarm_manager_host: "{{ groups['tag_SwarmType_manager'][0] }}"
- name: 'set fact: list remote_addrs'
set_fact:
# Create a list of all managers' private ip addresse
list_remote_addrs: >-
{{ list_remote_addrs | default([]) }} + [ '{{ item }}:2377' ]
loop: >-
{{ groups['tag_SwarmType_manager'] | map('extract', hostvars, 'ec2_private_ip_address') | list }}
- name: Joining worker to the swarm
# https://docs.ansible.com/ansible/latest/modules/docker_swarm_module.html
docker_swarm:
state: join
timeout: 60
# Using PRIVATE IP ADDRESS as they are in the same VPC
advertise_addr: >-
{{ ec2_private_ip_address }}:2377
# hostvars contains all variables related to a host
# https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#accessing-information-about-other-hosts-with-magic-variables
join_token: >-
{{ hostvars[first_swarm_manager_host].join_token_worker }}
# Using PRIVATE IP ADDRESS as they are in the same VPC
remote_addrs: "{{ list_remote_addrs }}"
Here we use the hostvars
. The hostvars
allows us to access variables from other hosts, here we take two variables:
join_token_worker
the token given by the first swarm manager who allows us to join the Swarmec2_private_ip_address
the private ip address. Since our servers share the samesubnet
they can communicate by private ip address. And our security groups ONLY opens the Docker ports to the servers member of the swarm security group and this implies communicating by private ip address.
# deploy_swarm_playbook.yml
---
# Take the first ec2 whose tag SwarmType is equal to "manager"
- hosts: tag_SwarmType_manager[0]
# Use sudo for all tasks
become: yes
tasks:
- include_tasks: tasks/install_docker.yml
- include_tasks: tasks/init_swarm.yml
- hosts: tag_SwarmType_worker
# Use sudo for all tasks
become: yes
tasks:
- include_tasks: tasks/install_docker.yml
- include_tasks: tasks/join_worker_node.yml
Deploy the cluster :rocket: (if it is unreachable
wait a bit until the ec2
get to the running
state).
ansible-playbook deploy_swarm_playbook.yml
Check cluster
Our Swarm cluser
should be ready. To check it let's connect to it by ssh.
(Optional) If you're too lazy to check your swarm manager node's ip
upper or from the aws console you can run this command:
[goku@vegeta tutorial]$ inventory/ec2.py --list | grep tag_SwarmType_manager -A 2
# output
"tag_SwarmType_manager": [
"34.242.96.200" # <- here
],
# ajust ~/.ssh/ansible_tutorial.pem if needed
[goku@vegeta tutorial]$ ssh -i ~/.ssh/ansible_tutorial.pem admin@34.242.96.200
admin@ip-172-31-8-57:~$ docker node ls
# Tada ! Your swarm is running
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
scqe6rb9h0016g8wmeohfkv32 ip-172-31-5-233 Ready Active 19.03.2
yhokk5ae4l22ssmpkslqbz4p7 * ip-172-31-8-57 Ready Active Leader 19.03.2
td9vjq0myikqhr7dzwky8enut ip-172-31-15-91 Ready Active 19.03.2
Deploy service
We have our swarm cluster running. It is time to deploy our first service.
I'm using a stack file from the repo https://github.com/dockersamples. Their projects are complex enough for development and testing. We will be using the docker-stack.yml from the example-voting-app
. It uses redis
, postgres
, an app in python
, an app in nodejs
, a worker in .NET
and even a visualizer !
I've created a example-voting-app
folder and a docker-compose.yml
file in it.
# example-voting-app/docker-compose.yml
version: "3"
services:
redis:
image: redis:alpine
networks:
- frontend
deploy:
replicas: 1
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure
db:
image: postgres:9.4
volumes:
- db-data:/var/lib/postgresql/data
networks:
- backend
deploy:
placement:
constraints: [node.role == manager]
vote:
image: dockersamples/examplevotingapp_vote:before
ports:
- 5000:80
networks:
- frontend
depends_on:
- redis
deploy:
replicas: 2
update_config:
parallelism: 2
restart_policy:
condition: on-failure
result:
image: dockersamples/examplevotingapp_result:before
ports:
- 5001:80
networks:
- backend
depends_on:
- db
deploy:
replicas: 1
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure
worker:
image: dockersamples/examplevotingapp_worker
networks:
- frontend
- backend
depends_on:
- db
- redis
deploy:
mode: replicated
replicas: 1
labels: [APP=VOTING]
restart_policy:
condition: on-failure
delay: 10s
max_attempts: 3
window: 120s
placement:
constraints: [node.role == manager]
visualizer:
image: dockersamples/visualizer:stable
ports:
- "8080:8080"
stop_grace_period: 1m30s
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
deploy:
placement:
constraints: [node.role == manager]
networks:
frontend:
backend:
volumes:
db-data:
Create a new task file deploy_example_voting_app_stack.yml
under tasks
:
# tasks/deploy_example_voting_app_stack.yml
---
- name: Copying example voting app's docker-compose.yml file on the server
# https://docs.ansible.com/ansible/latest/modules/copy_module.html
copy:
src: "example-voting-app/docker-compose.yml"
dest: "/home/{{ ansible_user }}/docker-compose.yml"
owner: "{{ ansible_user }}"
group: "{{ ansible_user }}"
mode: u=rw,g=rw,o=r
force: yes
- name: Deploying example voting app's stack
# https://docs.ansible.com/ansible/latest/modules/docker_stack_module.html
docker_stack:
state: present
name: "{{ STACK_NAME | default('example_voting_app') }}"
compose:
- "/home/{{ ansible_user }}/docker-compose.yml"
The playbook deploy_stack_playbook.yml
# deploy_stack_playbook.yml
---
- hosts: tag_SwarmType_manager[0]
tasks:
- include_tasks: tasks/deploy_example_voting_app_stack.yml
Deploy ! This is the very last step. :rocket:
ansible-playbook deploy_stack_playbook.yml
Once it's done, you may need to wait 10~20 seconds while waiting for all services to get deployed.
Now you can access your apps
from any nodes
:tada: !
(Optional) If you wish the list of all your servers' public IP address you can run this.
inventory/ec2.py --list | grep tag_Swarm_True -A 3
Pick any one of them, doesn't matter if it is a manager
or a worker
the mode routing mesh does it all for you. We have those apps running:
- vote - port: 5000
- result - port: 5001
- visualizer - port: 8080
On any servers you can access vote, result and visualizer !
Clean the tutorial
Now that we have accomplished what we wanted, let's clean up our servers
and security_groups
. Here are the tasks
and playbook
to delete our ec2
and security_group
.
Remove the ec2
.
---
# tasks/remove_ec2.yml
- name: "set fact: Swarm ec2 instances"
# https://docs.ansible.com/ansible/latest/modules/ec2_instance_facts_module.html
ec2_instance_facts:
filters:
"tag:Swarm": "True"
register: swarm_ec2
- name: "Removing ec2 instances"
# https://docs.ansible.com/ansible/latest/modules/ec2_module.html
ec2:
instance_ids: "{{ item.instance_id }}"
wait: yes
state: absent
loop: "{{ swarm_ec2.instances }}"
when: swarm_ec2.instances != []
Remove the security_group
.
---
# tasks/remove_security_groups.yml
- name: "set fact: security_groups"
# https://docs.ansible.com/ansible/latest/modules/ec2_group_facts_module.html
ec2_group_facts:
filters:
group-name:
- swarm
register: security_groups
- name: "Removing security groups"
# https://docs.ansible.com/ansible/latest/modules/ec2_group_module.html
ec2_group:
group_id: "{{ item.group_id }}"
state: absent
loop: "{{ security_groups.security_groups }}"
The playbook.
# remove_ec2_playbook.yml
---
- hosts: localhost
connection: local
gather_facts: False
tasks:
- include_tasks: tasks/remove_ec2.yml
- include_tasks: tasks/remove_security_groups.yml
Clean it. You wouldn't want to forget you have 3 servers running somewhere on AWS.
ansible-playbook remove_ec2_playbook.yaml
Source project
You can find the final version of this project on my repo
Last thoughts 😊
There is many ways to achieve this. This article and this project were written for sharing knowledge and not for production level architecture. If you're looking for more complex projects you can check Ansible galaxy for some roles. In short roles are reusable set of tasks
meant to be used in other playbooks
. I hope this tutorial will help you understand how Ansible works. This is a very powerful tool that is self documented by nature. Ideal to build your architecture and keep a written document on how it works.
Ansible galaxy
Ansible galaxy is a hub for finding and sharing Ansible content. You can find Docker Swarm roles/examples here.
Docker swarm
At Seelk we use Docker swarm for our architecture. It provides us a resilient and stable architecture which is also easy to maintain.
If you're interested to know more about Docker swarm you can find a very good tutorial on their website.

Boris Laporte
More from Seelk
Know your audience on Amazon: The power of reviews
Thu Nov 19 2020
6 min read
Marjorie Borreda-Martinez
January 2020 inside Seelk Studio: A few updates...And an in-depth look at Competition & Share of Voice!
Tue Jan 28 2020
5 min read
Nathaniel Daudrich
Real world metaclass usages inside django
Wed Oct 09 2019
3 min read
Philip Garnero