Cover photo by Grant Hussey @gthussey_art
I recently moved my personal server over to a VPS and decided to automate and refine the process with Ansible. The end experience turned out much better than I thought it would.
In this post I won't go into the details of setting up the entire Ansible project or running the plays. I'm mainly going to show the core plays/roles I used to acheive a really seamless backup and restore experience.
This post should be treated as high level inspiration while also containing snippets you can plug into your own ansible repo.
Overview
Each of my services is deployed as a docker container with a mounted volume. The server is setup with Ansible. The Restic role will create a service's volume and attempt to restore it from a snapshot if it doesn't exist.
Each service then mounts it's volume. Airflow will be setup to mount all the volumes and back them all up periodically by send a snapshot to S3 via Restic.
I'll start off by showing my play, so you can get a high level idea of what is going to happen and what the end result is. I'll then dive into the details of each relevant role for the backup/restore functionality. Finally I'll show a role for my Ghost service as an example of something using a volume that gets backed up.
In each section I'll show you a play/role and it's dependant files (defaults, templates, etc) and briefly explain the important bits.
There are a bunch of Traefik tags I have on each container that I won't get into as it's a bit distracting here.
Server Setup Play
- hosts:
- myhost.com
vars:
# these assume the init role has been applied
ansible_port: 22222
ansible_user: zac
# host is for traefik settings in each service to specify subdomain
host: myhost.com
# Both restic and airflow use this
restic_repo: s3:https://myresticrepo.com
roles:
- role: roles/docker
become: true
docker_users:
- zac
# just installs helpfull packages like curl
- role: roles/common
become: true
- role: roles/restic
become: true
restore_volumes:
- volume: ghost
path: /toback/personal_server/docker/ghost
- volume: postgres
path: /toback/personal_server/docker/postgres
- role: roles/postgres
become: true
volumes:
- postgres:/var/lib/postgresql/data
- role: roles/airflow
become: true
volumes:
- /opt/airflow/dags:/usr/local/airflow/dags
# everything in /toback will be backed up and restored
- postgres:/toback/personal_server/docker/postgres
- ghost:/toback/personal_server/docker/ghost
- role: roles/ghost
become: true
volumes:
- ghost:/var/lib/ghost/content
I won't dive into the first 2 roles. I may make another post about installing Docker as it's a bit tricky, but roles/docker
is pretty generic and there's loads of info online about that.
The roles/common
role is tiny and just installs some packages I want on the machine like curl
.
After that you can see the roles/restic
role being run to setup and restore the volumes, postgres starts up using it's volume, airflow mounts both to be backed up and ghost starts up with it's volume mounted.
roles/restic
This role is what handles our docker volume creation (meaning it needs to be run first) and restoration.
---
- name: Check volume info
docker_volume_info:
name: "{{ item.volume }}"
register: volume_infos
with_items: "{{restore_volumes}}"
- name: Setup volumes to restore
docker_volume:
name: "{{ item.1.volume }}"
state: present
when: not volume_infos.results[item.0].exists
with_indexed_items: "{{restore_volumes}}"
- name: Start Restic Restore
docker_container:
name: restic-restore
image: "{{ image }}"
volumes:
- "{{ item.1.volume }}:/data{{ item.1.path }}"
env:
RESTIC_REPOSITORY: "{{ restic_repo }}"
RESTIC_PASSWORD: "{{ lookup('env', 'RESTIC_PASSWORD') }}"
AWS_ACCESS_KEY_ID: "{{ lookup('env', 'RESTIC_AWS_ACCESS_KEY_ID') }}"
AWS_SECRET_ACCESS_KEY: "{{ lookup('env', 'RESTIC_AWS_SECRET_ACCESS_KEY') }}"
command: restore latest --target /data --include "{{ item.1.path }}"
detach: false
when: not volume_infos.results[item.0].exists
with_indexed_items: "{{restore_volumes}}"
In order, it:
- Collects info on our
restore_volumes
docker volumes - Loops over the volumes, and if a volume doesn't exist create it
- Loops over the volumes, and if a volume didn't exist, it restore data into it from the latest snapshot
One thing to keep in mind here is I've chosen to ingest secrets from my local environment. So this role relies on the following environment variables upon running:
RESTIC_PASSWORD=
RESTIC_AWS_ACCESS_KEY_ID=
RESTIC_AWS_SECRET_ACCESS_KEY=
If you ever want to restore to the latest snapshot, just remove the desired docker volume and re-run!
roles/postgres
Airflow relies on a running SQL db (and I use it for other services like Nextcloud) so it comes next. It's also one of the simpler ones.
---
- name: Ensure volume exists
docker_volume:
name: postgres
state: present
- name: Start Postgres
docker_container:
name: "{{ container_name }}"
image: "{{ image }}"
restart_policy: unless-stopped
volumes: "{{ volumes }}"
networks_cli_compatible: true
networks: "{{ networks }}"
ports: "{{ ports }}"
env: "{{ env }}"
labels: "{{ labels }}"
---
image: postgres:12
container_name: "postgres"
volumes: []
ports: []
networks:
- name: main-net
env:
POSTGRES_PASSWORD: docker
host: REPLACE.ME
labels: {}
roles/airflow
Airflow might be overkill here, but I like the UI and ability to explore previous runs and view logs.
---
- name: Ensure volume exists
docker_volume:
name: airflow
state: present
- name: Creates directories for config
file:
path: /opt/airflow/dags/scripts
state: directory
mode: u=rwx,g=r,o=r
recurse: yes
- name: Add restic backup dag
template:
src: dags/restic-backup.py.j2
dest: /opt/airflow/dags/restic-backup.py
mode: u=rwx,g=r,o=r
- name: Add restic backup script
template:
src: scripts/restic-backup.sh.j2
dest: /opt/airflow/dags/scripts/restic-backup.sh
mode: u=rwx,g=r,o=r
- name: Start Airflow
docker_container:
name: airflow
command: webserver
# not good, but so we can apt-get install restic in the dag
# would love a better way to do this...
user: root
image: "{{ image }}"
restart_policy: unless-stopped
volumes: "{{ volumes }}"
networks_cli_compatible: true
networks: "{{ networks }}"
ports: "{{ ports }}"
exposed_ports: "{{ exposed_ports }}"
env: "{{ env }}"
labels: "{{ labels }}"
In order, it:
- Will ensure the docker volumes exist in case this role is run independently
- Make sure our DAG dir exists on the remote machine
- Add the DAG
- Add the script
- Start airflow
Here are the templates. If you aren't familiar with Airflow, just ignore the DAG and realize it's just going to run the bash script once a week.
from datetime import timedelta, datetime
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.utils.dates import days_ago
default_args = {
'owner': 'zac',
'depends_on_past': False,
'start_date': datetime(2020, 7, 1),
'email': ['zac@mydomain.come'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
}
dag = DAG(
'restic-backup',
default_args=default_args,
description='Backs up home directory and docker volumes mounted to /toback in this container',
schedule_interval='@weekly',
catchup=False
)
t1 = BashOperator(
task_id='install_restic',
bash_command='apt-get update; apt-get -y install restic',
dag=dag,
)
t2 = BashOperator(
task_id='backup',
depends_on_past=False,
# https://stackoverflow.com/questions/42147514/templatenotfound-error-when-running-simple-airflow-bashoperator
bash_command='/usr/local/airflow/dags/scripts/restic-backup.sh ',
retries=3,
dag=dag,
)
dag.doc_md = __doc__
t1 >> t2
AWS_ACCESS_KEY_ID="{{ lookup('env', 'RESTIC_AWS_ACCESS_KEY_ID') }}" AWS_SECRET_ACCESS_KEY="{{ lookup('env', 'RESTIC_AWS_SECRET_ACCESS_KEY') }}" RESTIC_PASSWORD="{{ lookup('env', 'RESTIC_PASSWORD') }}" restic -r {{ restic_repo }} --verbose backup /toback
Note that this script template relies on the same environment variables I mentioned int the Restic role.
Adding other services!
These services will look very similar to our PG service. In fact I've been thinking of creating a generic docker container role for these services as they don't often need anything else.
---
- name: Ensure volume exists
docker_volume:
name: ghost
state: present
# tasks file for traefik
- name: Start Ghost
docker_container:
name: ghost
image: "{{ image }}"
restart_policy: unless-stopped
volumes: "{{ volumes }}"
networks_cli_compatible: true
networks: "{{ networks }}"
ports: "{{ ports }}"
exposed_ports: "{{ exposed_ports }}"
env: "{{ env }}"
labels: "{{ labels }}"
---
# defaults file for traefik
image: ghost:3-alpine
volumes: []
networks:
- name: main-net
env:
database__client: sqlite3
url: https://{{ host }}
ports: []
exposed_ports: []
host: REPLACE.ME
labels:
traefik.enable: "true"
traefik.http.routers.ghost.rule: "Host(`{{ host }}`) || Host(`www.{{ host }}`)"
traefik.http.routers.ghost.entrypoints: "websecure"
traefik.http.routers.ghost.tls.certresolver: "myresolver"
traefik.http.services.ghost.loadbalancer.server.port: "2368"
I left in the Traefik config just for fun here, but as you can see it simply mounts its volume that is being backed up and generally managed by Ansible via Restic. Anything stored there will be backed up with Airflow every week!
All of my other services follow this same pattern, and it's super nice.