Flyway Tutorial – Managing Database Migrations

16.1.2017 | 10 minutes reading time

Many software projects are still using relational databases as an important part of their technology stack. This typically requires the handling of database migrations, also often called schema migrations. Reasons to perform migrations on the database are manifold. A few examples:

New features require new sets of database tables, views and indexes.
Bugfixes require changes to existing database objects.
Performance problems require new indexes for certain database tables.

The problem – or let’s better call it challenge – is to ensure that a certain software release is always delivered with the matching state of the database. Especially if there are several installations of an application this requires:

Supporting new installations of a software release from scratch and
upgrading from any existing release in the field to the latest one.

In addition (update) installations for different releases on different test stages must be handled as well and are potentially imposing requirements to our way of handling database migrations.

All this can of course be “solved” using a bunch of shell- and sql-scripts. But that is not really a sustainable solution as these scripts tend to become overly complex and hard to maintain quite quickly.

Probably by now it is not hard to guess what is the better alternative to this :).

flyway

Installation

Flyway is implemented in Java and the installation is really straightforward. Download the latest package from the Flyway homepage and unpack it to a proper location on the target machine. Make sure to either create a soft-link to the flyway-executable or add the installation directory to your PATH. Keep in mind that the installation is required on developer machines as well as on all servers where the application – or more precise the database of your application – is running. Typically this means test and productive environments.

When talking about tools for schema migrations there are typically two contenders mentioned: Flyway and Liquibase. While we are discussing Flyway in this post you might also want to take a look at our article on Liquibase here .

On a Mac however you can also use Homebrew to install Flyway. This is more convenient and will create the required softlink to the flyway-executable under /usr/local/bin/ right away.

1brew install flyway

As Flyway has been already installed on my machine – due to some earlier evaluation – I used the opportunity to upgrade to the latest release 4.0.3.

1Thomass-MacBook-cc:bin thomasjaspers$ brew upgrade flyway
2==> Upgrading 1 outdated package, with result:
3flyway 4.0.3
4==> Upgrading flyway 
5==> Using the sandbox
6==> Downloading https://search.maven.org/remotecontent?filepath=org/flywaydb/flyway-commandline/4.0.3/flyway-commandline-4.0.3.tar.gz
7######################################################################## 100,0%
8?  /usr/local/Cellar/flyway/4.0.3: 21 files, 12.0M, built in 8 seconds

We are good to go if we can execute flyway from anywhere on the command-line. This should print out a quite extensive usage message showing that the command has been found and was executed successfully.

Database Migrations

Database changes in Flyway are bundled in so called Migrations. In this blog post we will only consider Migrations that are written as plain SQL. There are more advanced concepts like writing Migrations in Java, but those we will be left for discussion in some forthcoming blog post. For all Migrations there is one fundamental concept in Flyway:

A Migration must no change once it has been applied to the database!

The reason for this is simple: If this would not be the case one cannot be sure that two installations – with the “same” set of Migrations – are really resulting in the same database state. The result would be chaos and no one likes chaos!

Flyway is enforcing this by keeping track of a checksum for each Migration that has been executed (together with other information). This checksum is stored in the database in the schema_version table. On first execution this table is created automatically by Flyway in the same schema as the other database objects created with Flyway for that project. Thus there are potentially multiple instances of this table, one per schema that is under control by a Flyway project.

In case the checksum for an already installed Migration has changed Flyway would acknowledge this with an error and stops the installation.

1Thomass-MacBook-cc:db_flyway_sample thomasjaspers$ flyway migrate
2Flyway 4.0.3 by Boxfuse
3 
4Database: jdbc:postgresql://localhost:5432/flywaydemo (PostgreSQL 9.4)
5ERROR: Validate failed: Migration checksum mismatch for migration 2.1
6-> Applied to database : -122752047
7-> Resolved locally    : -609203476

Of course there are ways around this by manipulating checksum values in the schema_version-table or by applying certain configuration values to Flyway. But obviously this should only be the last resort in case some really bad things have happened. Therefore we will take a closer look at how to avoid bad things from happening later on in this blog post when discussing best practices.

For the time being let’s take a look at an example of a Flyway project.

Flyway Sample Project

This is a simple example of a database migration file that is named
V1_1__create_test_tables.sql:


CREATE TABLE flyway_test (
  key VARCHAR(64),
  value VARCHAR(255),
  PRIMARY KEY(key)
);

ALTER TABLE flyway_test OWNER TO flywaydemo;

Well, one might think “Hey, this looks exactly like a plain SQL file”. Well the reason for this probably is that this is a plain SQL file. And this is part of the beauty of the whole approach. There is no need to learn anything new here beside a bit of configuration and execution of the tool. The SQL statements from the Migration-files can easily be tested by executing them directly in any SQL tool or shell.

Now how does Flyway know which SQL-files aka Migrations must be executed? And how does it know which database to connect to? You could do all this by giving a huge amount of command line parameters. But for sure the better solution is using the flyway.conf configuration file for this.

1flyway.driver=org.postgresql.Driver
2flyway.url=jdbc:postgresql://localhost:5432/flywaydemo
3flyway.user=flywaydemo
4flyway.password=flywaydemo
5flyway.locations=filesystem:src/main/resources/flyway/migrations
6flyway.sqlMigrationPrefix=V
7flyway.sqlMigrationSeparator=__
8flyway.sqlMigrationSuffix=.sql
9flyway.validateOnMigrate=true

This configuration file should be stored to the project directory of the database migration project. Then Flyway is executed within that directory and also all configuration entries made are relative to that directory. This way the project can be executed easily on different environments. It would be possible to define a list of directories for the flyway.locations. But I think it is better to use sub-directories per release. This can also be seen from the sample project. The directory used might remind you of the Maven directory structure. That is right :-). There is a blog post of its own showing how to execute Flyway from Maven . The directory structure used in the example is a preparation for this.

The complete sample project shown in this blog post can also be found on GitHub . It is using PosgreSQL as a database, but that can be easily changed to almost any other relational database.

The first few entries are describing the database and how to access it. Note: Flyway comes with a set of pre-installed database JDBC-drivers. PostgreSQL is one of them and therefore no additional action is required here. Otherwise you would need to add the proper driver-JAR to the libexec/drivers-directory of your Flyway installation.

The next entries are defining where to find Migration-files and some naming conventions for them. Setting the flyway.validateOnMigrate to false would disable the validation whether or not existing Migrations have been changed. In any real project this should always be set to true. A comprehensive list of available configuration parameters – with their descriptions – can be found from here.

The Flyway configuration approach is two-tiered. Global settings can be configured in the flyway.conf configuration file in the Flyway installation directory. Additional configuration entries – or overwriting some of the global ones – can then be done in the project-specific flyway.conf configuration file.

Then we can just execute Flyway from the project directory by executing
flyway migrate as follows.

1ThomassacBookcc:db_flyway_sample thomasjaspers$ flyway migrate
2Flyway 4.0.3 by Boxfuse
3 
4Database: jdbc:postgresql://localhost:5432/flywaydemo (PostgreSQL 9.4)
5Successfully validated 3 migrations (execution time 00:00.017s)
6Creating Metadata table: "public"."schema_version"
7Current version of schema "public": << Empty Schema >>
8Migrating schema "public" to version 1.1 - create test table
9Migrating schema "public" to version 1.2 - create test view
10Migrating schema "public" to version 2.1 - create test index
11Successfully applied 3 migrations to schema "public" (execution time 00:00.066s).

The log-messages are helpful to check installations. It is printed out whether a Migration was already applied to the system or is newly applied. In the above example all Migrations are new. Simply re-executing the same command shows the following output.

1ThomassacBookcc:db_flyway_sample thomasjaspers$ flyway migrate
2Flyway 4.0.3 by Boxfuse
3 
4Database: jdbc:postgresql://localhost:5432/flywaydemo (PostgreSQL 9.4)
5Successfully validated 3 migrations (execution time 00:00.018s)
6Current version of schema "public": 2.1
7Schema "public" is up to date. No migration necessary.

That is basically it, besides the need to execute the Migrations in the proper order. This is achieved by the version prefix of each file that is separated from the rest of the filename by the sqlMigrationSeparator that is defined in the flyway.conf configuration file. In our example two underscores are used. Having the following three files – distributed over two directories – they would thus be executed in this order:

1V1_1__create_test_table.sql
2V1_2__create_test_view.sql
3V2_1__create_test_index.sql

This could basically also be seen from the output of the flyway-execution above. Using the sample-project from GitHub as a blueprint this is hopefully enough to get started with Flyway. Nevertheless there are some best practices that should be considered to use the tool most efficiently.

Best Practices

Migrations must be kept stable already during development time
From my own experience I can tell that this is a great topic for lots of discussions inside the development team :-). Of course one can argue that Migrations must only be stable – read “unchanged” – for productive releases. Thus new Migrations added for a not yet delivered release could still be changed. In theory this is true, but going down that road a lot of time will be spent to fix broken installations on development and test environments. The reason behind this is that data is often setup locally or on some test environment to test certain scenarios. In such cases an upgrade installation is definitely preferred over a re-installation where all manually added/changed data on that environment will be lost. Such an upgrade installation is not (easily) possible if there are “broken” Migrations.

Thus if there is a bug in a Migration like wrong definition of a datatype, a wrong name or things like that: Simply write an additional Migration to fix this and do not change already committed ones.

Migrations must be grouped in a meaningful way
Basically there are two possibilities to achieve this:

Grouping the files by release.
Grouping the files by feature.

The concept is in both cases the same, thus grouping the files together in release- or feature-specific sub-directories of the directories scanned for Migrations. What must be always kept in mind is the fact that the execution order is determined by the prefix. This is quite easy when grouping by release, but can become really hard when this is done by feature. The reason for this is that features might be developed in parallel that are related to the same set of database objects. In that case the required order might be mixed up. Therefore it might be better to develop the database migrations per release even if the features are developed independently in branches.

Conclusion and Outlook

Flyway is a great tool to handle database migrations. With tools like Flyway – or likewise Liquibase – there is no excuse to still mess around with some custom-made “solutions” to handle database migrations. The tool keeping track on what has already been executed on a system and what needs to be executed still is extremely helpful. This comes along with some nice logging during the installation and more features that will be part of some forthcoming blog posts on Flyway. This includes integration to Maven and writing complex Migrations in Java. Stay tuned :-).

Was this post helpful?

Blog author

Thomas Jaspers

Senior Software Engineer & AI Enthusiast

Do you still have questions? Just send me a message.

Exploring Dapr: A Deep Dive into Distributed Application Runtime

In a recent blog post, we introduced Dapr (Distributed Application Runtime) and highlighted its potential as a valuable tool for cloud-native applications, in combination with Aspire. This post dives deeper into the inner workings of Dapr, explaining...

Software development
Cloud native
Software architecture
Open Source

10.7.2024 | 10 minutes reading time

Manuel Zapf

Modern Microservices: Unleashing the Power of .NET Core, Aspire, and Dapr

I recall the days when writing a web application in C# with .NET meant deploying it on an IIS web server for accessibility. Today, this approach seems outdated, especially with the shift towards microservice-based architectures. Fortunately, Microsoft...

Software architecture
Open Source
Cloud
Microservices
Infrastructure as Code
.NET
Cloud native

27.6.2024 | 8 minutes reading time

Manuel Zapf

Demystifying the Kubernetes Gateway API: What the heck is it and why should...

When Gateway API debuted in October last year, this concluded a nearly four-year-long process that started in summer 2019. Gateway API is the successor of core Ingress definition, aiming towards various goals. This blog post will give a brief overview...

API
Open Source
Cloud
Networking
Kubernetes
Cloud native

15.3.2024 | 6 minutes reading time

Manuel Zapf

How to gain visibility as a software developer?

No matter if junior, medior or senior, introverted or extroverted: Every software developer can increase their visibility with different tools and should treat the topic as important. The only question is: how and with what effort? In this blog post,...

Training
Software development
Community
Open Source

21.2.2024 | 6 minutes reading time

Building desktop apps with web technologies

Building desktop apps with web technologies In this article I share insights into Electron and what to consider when shipping an desktop app with Electron. After that I introduce you to a new alternative called Tauri. It the end I provide an estimation...

Frontend
JavaScript
Node.js
Open Source
Webdevelopment

20.9.2023 | 13 minutes reading time

Introduction to GitOps with ArgoCD

In this post you will learn what GitOps is about and see the steps to create a setup on your laptop to gain some experience with ArgoCD. Using an industry standard container orchestrator such as Kubernetes, this enables developers to continuously deploy...

CI/CD
Kubernetes
GitHub
Open Source
DevOps
Container
Infrastructure as Code
Infrastructure
Spring

31.10.2022 | 10 minutes reading time

GitHub Actions CI pipeline: GitHub Packages, Codecov, release to Maven...

Stuck with TravisCI? Looking for a worthy alternative to GitLab CI? Here’s a guide on how to create a full CI pipeline publishing GitHub Packages, Codecov reports, releasing to Maven Central and GitHub, including dynamic commitlogs.GitHub Actions – blog...

Open Source
CI/CD
DevOps
GitHub

22.2.2021 | 22 minutes reading time

Creating integration flows with the Reedelk Data Integration Platform

The integration of data from systems of record or legacy systems is one of the elements of a software development project that does not start on a greenfield. In other words, it can help modernize software. Usually the question arises how to transfer...

Agile transformation
Container
Software architecture
Java
Microservices
Open Source
API

3.9.2020 | 8 minutes reading time

Daniel Kocot

Kick-start your microservice project with JHipster

I recently looked for a solution on how to prototype a customer project in a short time and came across JHipster. The target architecture used Spring Boot in the backend and an Angular frontend. JHipster can scaffold this in its simplest variant as...

Node.js
Angular
Software development
Container
NoSQL
Cloud
JavaScript
Java
Keycloak
Kubernetes
Microservices
IT-Security
Open Source
React
Spring

12.5.2020 | 13 minutes reading time

Jörg Riegel

RFC-7807 problem details with Spring Boot and JAX-RS

Application specific problems, e.g. a missing field in a client request, have to be handled properly with machine readable and human friendly custom business error codes — in RESTful web services using JAX-RS, Spring Boot, or any other technology. Only...

Spring
Open Source

10.1.2020 | 16 minutes reading time

Rüdiger zu Dohna

Hyperledger Fabric CouchDB is killing my cloud storage bills

Hyperledger Fabric is a nice DLT platform and offers great customization options. One of which is the ability to choose different databases to store blockchain data. The recommended and best supported option is to use a CouchDB. It offers the ability...

Blockchain
Database
Infrastructure
Open Source

9.1.2020 | 2 minutes reading time

Building your own serverless functions with k3s and OpenFaaS on Raspberry...

In recent years, lots of new programming paradigms have emerged – going from monolithic architectures towards microservices and now serverless functions. As a result, less code needs to be deployed, and updating an application becomes easier and faster...

Cloud
DevOps
Open Source
Database
Kubernetes
Raspberry Pi
Serverless

6.8.2019 | 18 minutes reading time

Spotting mismatches between your spec and your REST-API with hikaku

If you provide a REST-API and you create it contract-first, you either use an approach that involves code generation or you have to find another way to make sure that your specification and your implementation don’t diverge over time. In this article...

Microservices
Open Source
Testing

8.3.2019 | 3 minutes reading time

Fixing history — An event sourcing journey

Introduction Elescore, a platform built by me that tracks elevator disruptions, integrates multiple external data sources. One of these sources is the DB FaSta API , providing disruption information for all facilities operated by Deutsche Bahn. In ....

Open Source
Event Sourcing
Functional programming
IoT
Data Science

30.11.2018 | 15 minutes reading time

Continuous Integration of Hyperledger Composer applications with Gitlab...

In my previous article, “Hyperledger Fabric test network on AWS using Ansible” , I introduced a simple way to provision VM instances in the cloud using Ansible with the necessary software to create a Hyperledger Fabric and Composer test environment. ...

CI/CD
Blockchain
Open Source

18.10.2018 | 9 minutes reading time

Easy integration between services with Apache Camel

For a couple of months now I have been working on an application that uses Apache Camel. I am not sure if it’s a good choice for this application because it does not deal with many sources of information. But I am convinced that Apache Camel can provide...

Software architecture
Java
Microservices
Spring
Open Source
Software development
Testing

19.8.2018 | 6 minutes reading time

Hyperledger Fabric test network on AWS using Ansible

Prompted by my dissatisfaction with existing cloud-based Hyperledger Fabric solutions, I would like to motivate and explain the automated setup of a Hyperledger Fabric test network in the following article.Attention: This article is from 2018 and refers...

Infrastructure
Blockchain
Open Source

12.8.2018 | 9 minutes reading time

Terraform Multi-Provider Deployment Including a Custom Provider

IntroductionIn the post Continuous Delivery on AWS with Terraform and Travis CI we have seen how Terraform can be used to manage your infrastructure as code and automate your deployments. When working on a project involving different infrastructure...

Software architecture
Open Source
AWS
Cloud
DevOps
Go

9.8.2018 | 9 minutes reading time

Measuring your OpenStack Cloud with Gnocchi and Ceph storage backend

To solve our performance problems with Gnocchi and the whole OpenStack telemetry stack, we tried Gnocchi with Ceph as backend starting with OpenStack-Ansible Newton. The experience wasn’t good. Sooner or later, we experienced slow requests and stuck ...

Software architecture
Cloud
Open Source
Infrastructure

15.7.2018 | 4 minutes reading time

A Lovely Spring View: Spring Boot & Vue.js

It´s time to shed some light on the integration of Vue.js with the popular Java Enterprise framework Spring Boot! Both frameworks are shining stars in their respective domain – but how could they be set up together properly? What is a practical project...

Java
Open Source
CI/CD
Spring
Frontend

23.4.2018 | 11 minutes reading time

Flyway Tutorial – Managing Database Migrations

Installation

Database Migrations

Flyway Sample Project

Best Practices

Conclusion and Outlook

Was this post helpful?

Blog author

More articles in this subject area

Exploring Dapr: A Deep Dive into Distributed Application Runtime

Modern Microservices: Unleashing the Power of .NET Core, Aspire, and Dapr

Demystifying the Kubernetes Gateway API: What the heck is it and why should...

How to gain visibility as a software developer?

Building desktop apps with web technologies

Introduction to GitOps with ArgoCD

GitHub Actions CI pipeline: GitHub Packages, Codecov, release to Maven...

Creating integration flows with the Reedelk Data Integration Platform

Kick-start your microservice project with JHipster

RFC-7807 problem details with Spring Boot and JAX-RS

Hyperledger Fabric CouchDB is killing my cloud storage bills

Building your own serverless functions with k3s and OpenFaaS on Raspberry...

Spotting mismatches between your spec and your REST-API with hikaku

Fixing history — An event sourcing journey

Continuous Integration of Hyperledger Composer applications with Gitlab...

Easy integration between services with Apache Camel

Hyperledger Fabric test network on AWS using Ansible

Terraform Multi-Provider Deployment Including a Custom Provider

Measuring your OpenStack Cloud with Gnocchi and Ceph storage backend

A Lovely Spring View: Spring Boot & Vue.js