Note: Amazon has not compensated me or SQLpipe in any way for this article.
I have worked with many databases and Amazon Aurora is one of my favorites.
This article explains Aurora's benefits in plain English and explores how Amazon achieved those benefits.
Then we will discuss Albatross, a SQL Server to PostgreSQL migration tool that works well with Amazon Aurora. It can even help you migrate with zero downtime!
Check out the video below to see Albatross perform a live migration!
My Favorite Aurora Features
The three things I appreciate most about Aurora are:
- Its high query throughput.
- Its exceptional fault tolerance.
- The ability to use Postgres, MySQL, and even SQL Server compatible clients to interact with your data.
You can learn a lot about Aurora and its benefits by visiting the documentation and marketing pages for the product. However, to really understand what makes Aurora special, I recommend reading the white paper Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases.
Query Throughput And Fault Tolerance
The paper says in the opening paragraph, quote “We believe the central constraint in high throughput data processing has moved from compute and storage to the network.” In other words, the Aurora team analyzed the data streams required to create a distributed database and built a custom storage backend that makes use of the unique networking environment inside modern cloud data centers.
This is an example of a picture being worth a thousand words, so let's look at two diagrams from the above white paper. First up is a diagram showing the independent data streams needed to create a mirrorer MySQL deployment.
Each arrow in the diagram represents an independent stream of data. As you can see, MySQL requires many independent data streams to accomplish high availability. This represents an opportunity for consolidation.
Next up is a diagram showing the data streams responsible for creating an Aurora deployment.
Aurora’s diagram shows fewer arrows, which means fewer data streams that need to be serialized, sent across the network, and then deserialized. This is because Aurora distributes the data and achieves a quorum in a more efficient way. As they said, the network is the bottleneck at this point and this is their way of mitigating that bottleneck.
In addition to needing fewer data streams, the Aurora team built a consensus mechanism that allows you to minimize query latency while still achieving impressive fault tolerance. The diagram above actually includes a visualization of the consensus mechanism, but I'm not going to talk about it here. Feel free to read the paper if you'd like to understand how it works.
If I had to put absolutely mission critical relational data anywhere, I would probably put it in Aurora. This is not only because of its impressive technology - other databases offer fantastic reliability at a technical level as well. I just think that Aurora and AWS as a whole is very mature, stable, and reliable, which you can’t always say about the latest database startup company.
Now that we’ve covered how Aurora manages to be highly performant and reliable, let’s discuss the most unique part of Aurora, and the thing I find most interesting, its client flexibility.
Babelfish And Aurora's Client Flexibility
I don’t know any other database that allows you to talk to it like it is a MySQL, Postgres, or SQL Server database. It is a truly exceptional value proposition to be able to take any application that expects to be talking to a certain kind of database, and consolidate them on a single data source. The possibilities enabled by this are only limited by the creativity of how you want to integrate your data systems.
For example, if you have a marketing site running on PostgreSQL and a web application running on SQL Server, you could combine those data sources in Aurora for analytical or even operational needs.
To give a non technical metaphor, let’s say you’re taking a guided trip across Europe. Do you want a travel guide who only speaks English? Or do you want a travel guide who speaks English, French, and German? I would prefer the one who speaks multiple languages, and that’s exactly what Aurora does for you.
At this point we’ve covered what I think are the highlights of Aurora. Let’s move on to the demo to show how you can use our new SQL Server to Aurora migration tool, Albatross, to migrate SQL Server to Aurora.
SQL Server to Aurora PostgreSQL Migration Tool
Albatross works in parallel with Babelfish, a PostgreSQL extension from Amazon that allows it to understand SQL Server queries. Since Babelfish is open source, you can actually run Babelfish anywhere, not just on AWS. However, as I think I’ve made clear, you should definitely consider Aurora as a top migration destination.
Albatross has three key benefits to make it easier and safer to migrate to Aurora:
- First, Albatross automatically catches up your Aurora database to your source SQL Server database without having to do any clunky and error prone manual transfers.
- Second, once Aurora has caught up to SQL server, Albatross can cut your clients over from SQL Server to Aurora with zero downtime.
- Third, Albatross can detect and log any differences in query results between SQL Server and Aurora. We encourage you to run this feature for an extended period before making the final switch so that you have confidence that Aurora is behaving in the way your clients expect.
The benefit of migrating in this way is that you get the immediate financial and technical benefits of adopting Aurora without having to change all of your other code first. Then, once you've made that initial migration, you can reap the benefits of entering the vast PostgreSQL ecosystem, plus the technical benefits of Aurora.
If you would like help with this migration, we also offer professional services to help you get it right the first time.
Now let’s talk about how Albatross works at a high level.
First, you have to deploy a computer running Albatross. It’s best to run it on a Windows host, but it can actually run on any operating system. You can also deploy it anywhere, so it’s possible to run it on prem to migrate to the cloud, or run it on a cloud-based VPC to migrate providers.
Schema Generation And DDL Migration
Next, you need to create schema generation code using SQL Server Management Studio. Then, Albatross runs Compass on that DDL code, a tool from Amazon that scans your code and tells you what needs to change for it to be compatible with Babelfish.
Once you have that report you can break the work out into stories or tickets and start assigning them to engineers or DBAs. Don’t worry, you get as many tries as you need to get those changes right.
Next up is where the magic of Albatross starts. Albatross publishes an endpoint and instructs you to point all of your SQL Server clients at that endpoint.
It then proxies all traffic sent to that endpoint directly to SQL Server. If you perform a rolling update, this will result in zero downtime.
Automatic Data Catchup
Once all of your clients are pointing at Albatross’ proxy endpoint, the process of automatically catching up Aurora to SQL Server begins. This catchup process uses a specialized version of our original product’s technology, SQLpipe. It uses a creative combination of transactions and concurrency control to ensure that your databases are exactly equal at the end of the process.
Once the catchup is complete, Albatross begins sending all incoming queries to both databases. The two databases are identical at this point, and are being sent the same queries, at the same time, in the same order. For most intents and purposes, they can be considering logically equivalent. However, Albatross continues to respond to clients with the answer supplied by the original SQL Server database, just to be on the safe side.
And that brings us to maybe my favorite part about Albatross - the fact that it allows you to ensure that Aurora is giving you the same answers and behaving in the same way as SQL Server.
To do this, once the catchup process is complete, Albatross logs all incoming queries to a CSV file. It also logs the responses to that query from both SQL Server and Aurora and whether those responses were identical or not. We highly recommend letting this process run for at least a day and inspecting the CSV in Excel, or with Python script, to see if there are any meaningful differences between the two databases responses.
You can run this process in Albatross as long as you want and analyze this CSV file any way you want to make sure that your new Aurora database is reliable and safe to use.
Once you have verified that Aurora is a safe and valid replacement for SQL Server, you can perform one final rolling update of your clients to point at your new Aurora endpoint, instead of Albatross.
If you'd like to see this process demoed, contact us or view the video at the top of the page!
I hope you have found this article informative. To summarize, we are big fans of the Aurora managed database from Amazon and have built a tool that eases the migration process.
Our tool can help you migrate with zero downtime and verify that your new database is a valid replacement for the old one.
Please reach out if you have any questions, comments, or would like a live demo!