Jan 2, 2023

Migrate SQL Server to PostgreSQL

In this article we explore the specifics around the migration process

Microsoft SQL Server has established itself as one of the most popular databases worldwide. According to data from Statista, in 2022, SQL Server is the third most widely used DBMS after Oracle and MySQL.

And yet many companies are looking to transition away from SQL Server and to its open-source competitor PostgreSQL. Such decisions to transition are usually driven by the need to reduce the high cost of ownership that may occur with SQL Server or to avoid its restrictive licensing.  

In this article, we provide a high-level overview of the steps an organization would need to follow during a database migration project. We then focus on the specifics around migrating SQL Server to PostgreSQL such as: 

  • How to back up your SQL Server source database
  • What you need to know about stored procedures and indexes
  • Possible changes you would need to make to your application code
  • A few thoughts on the actual migration strategy 
  • Data transfer tools: SQLpipe, Pgloader & sqlserver2pgsql, and AWS tools and services
  • Things to keep in mind post data transfer 

Database migration steps

Regardless of the source and target systems, the main steps in the database migration process can be summarized as follows:

1. Assess and analyze the data source: Define the need to migrate - is the migration necessary and what benefits would it bring? Determine how much data you have in your source system, where exactly it is stored, and how it fits within your application.

2. Create the architecture of the solution and outline the migration plan: Decide what database migration approach to take. There are generally two main types of migration strategies you can go for:

  • A Big Bang migration transfers all of your data from a source to a target within a single operation and it usually involves application downtime. 
  • A Trickle migration breaks the process into smaller phases each having its own scope and deadline. It takes longer, but it’s also less prone to unexpected failures. 

3. Build the migration solution: Ensure your data is backed up. Collect, normalize and validate the data. Choose a migration tool. Prepare migration scripts and test plans.

4. Migrate

5. Test: Ensure the accuracy of the implementation, data integrity and completeness, and application performance. 

SQL Server to PostgreSQL migration specifics

Back up your SQL Server source database

First things first, you need to ensure your source database is fully backed-up. An easy way to do that is through SQL Server Management Studio (SSMS). After you have connected to your SQL Server instance, expand the Databases tab under the Object Explorer, to display all of your available databases. Right-click on the database you want to back up, and chose Tasks -> Back Up. You can choose between Full, Differential, or Transaction Log backups. The database can be backed up to disk or to URL. 

Once you are happy with your backup options, click OK, and SSMS would create a backup file in your preferred destination. Keep in mind any Data security and Compliance regulations in your industry, such as GDPR and HIPAA, as they may impose specific requirements for storing sensitive data. 

Stored Procedures

Stored procedures (prepared SQL code that can be reused) are a common practice in MSSQL used to encapsulate the logic needed to retrieve data. SQL Server’s T-SQL makes this very easy as SQL Server can dynamically determine the result sets returned by the procedure. However, this is not the case for Postgres, where the natural choice for encapsulating the logic to retrieve something is a function.

If your application uses stored procedures, you would have to re-write those as PostgreSQL functions. If your application uses proprietary features of SQL Server, you would have to rewrite the parts of the application that use SQL Server. You should research best practices for translating stored procedures from MSSQL to PostgreSQL functions. Ensure you understand both systems, as well as the problem the respective procedure/function solves, instead of simply converting the syntax. 

Another difference to be considered is that in the SQL Server stored procedures variables can be declared anywhere. However, in PostgreSQL functions local variables have to be declared at the beginning of the function. It is simply a syntax difference but it requires a change of habit.

Indexes

There are a couple of main differences between SQL Server and PostgreSQL in terms of indexes: 

  • There are no clustered indexes in PostgreSQL, rather all indexes are non-clustered by default. 
  • In SQL Server you can extend the functionality of a non-clustered index by including non-key columns. This greatly improves query performance because the data from the included columns is added at the lowest/leaf level, rather than in the index tree, which makes the index smaller. In PostgreSQL there are no indexes with included columns.  

This gives you two choices when migrating indexes to PostgreSQL - you can either add included columns to the index key column list or exclude included columns from the index completely. Generally, in PostgreSQL attention should be paid to column order because an index is most efficient when there are constraints on the leading (leftmost) columns.

Application Side Changes

If your business layer uses ORM, you would need to make changes to your connection strings, driver and packages. In places where plain SQL is used, you would have to make the SQL statements PostgreSQL compatible. Here are some of the most common query changes you may encounter:

  • Boolean values in SQL Server are 0 and 1, while In PostgreSQL they are denoted with true and false.
  • Multiple SQL statements in one query must be separated with semi-colons. In SQL Server 2014 T-SQL statement terminator is still optional, even though Microsoft highly recommends the use of a semi-colon as it will become required in future versions.
  • String concatenation in SQL Server is “SELECT FirstName + LastName FROM...”. In PostgreSQL, the equivalent is “SELECT FirstName || LastName FROM...”.
  • There is no ISNULL function in PostgreSQL, you need to use COALESCE instead.
  • In PostgreSQL, to select a limited number N rows, you add LIMIT N to the end of the query. In SQL Server, you would say SELECT TOP N … in the beginning of the query.

Besides query syntax, you would also need to take into consideration behavior discrepancies such as case sensitivity and Order by with NULLs.

Migration strategy 

Determining the most suitable migration strategy is among the most challenging steps of the migration process, as it requires extensive analysis and planning. Generally, if your organization can provide you with a maintenance window, while applications are offline and tuning performance differences immediately is not a concern, then migration in a single operation may be feasible. 

If your organization cannot afford to be offline for too long but the database is relatively small you could have the application double-write (SQL Server & Postgres) and/or have a background process that syncs the differences. This is useful in case you need to roll back due to missed features or bad performance.  You can also look at using views and Foreign Data Wrappers - an extension that allows you to access a table or schema in one database from another. That way you can quickly switch from a Foreign Data Wrapper to a local table, as an alternative to full in-app syncing. 

In cases when your SQL Server source is really big, some developers opt to shard the data at the application layer. That would entail creating a new database that holds metadata about what shards exist and which shard a "client" lives on. However, we do not recommend sharding at the application layer if you can avoid it. It can lead to schema discrepancies and confusion around where data for a given shard key lives. If you need a database that fits enormous datasets, contact us for a review on NewSQL databases.

Data Migration tools

SQLpipe

SQLpipe is an open-source data movement tool that allows you to transfer the result of a SQL query from a source to a target system. The software currently has two versions - V1, which uses Golang drivers, and V2, which utilizes ODBC database drivers. 

If you find V1 to be more suitable for your needs, you can follow the instructions outlined in our previous article How to transfer data from SQL Server to PostgreSQL

In this article, we will use V2 of SQLpipe to transfer data from a SQL Server source to a PostgreSQL target.

Step 1: Install SQLpipe

You can refer to the Installation guide and get started with the official SQLpipe Docker Image. The image comes with pre-installed unixODBC and ODBC drivers for SQL Server and PostgreSQL. 

Step 2: Gather connection information for both systems

  • Drivername
  • Hostname
  • Port
  • Database name
  • Username
  • Password
  • Schema name

Step 3: Run a transfer

Use SQLpipe’s v2/transfer endpoint to run a query on your SQL Server source system. The command creates a new table in the target PostgreSQL system and transfers the query result by running insert commands. It can automatically generate create table commands, with the proper column names and types.

Here are the endpoint’s required payload keys:

{ "source": { "odbc_dsn": "Driver=<driver_name>;Server=<hostname>;Port=<port_number>;Database=<db_name>;Uid=<username>;Pwd=<password>;" 
 },
 "target": { "system_type": "<postgresql, mysql, mssql, or snowflake>", "odbc_dsn": "Driver=<driver_name>;Server=<hostname>;Port=<port_number>;Database=<db_name>;Uid=<username>;Pwd=<password>;",
   "table": "<target_table_name>", "schema""<OPTIONAL_schema" 
 },  "query": "select * from my_table"
}

An example of the curl request:

curl -d '{"source": {"odbc_dsn": "DRIVER=MSSQL;SERVER=my-sql-server-host.com;PORT=1433;UID=sa;PWD=MyTopS3cretpass193;TDS_Version=7.0"}, "target": {"system_type": "postgresql", "odbc_dsn": "Driver=PostgreSQL;Server=postgresql-host.com;Port=5432;Database=postgres;Uid=postgres;Pwd=dontHackMe!;", "schema": "public", "table": "my_new_table"}, "query": "select * from source_table"}' localhost:9000/v2/transfer

Note: Many drivers allow you to specify additional fields that change the behavior of the driver. When using FreeTDS, a SQL Server driver, you may have to specify TDS_Version according to which version of SQL Server you are trying to connect to. 

Note: If you are using custom Database names, you would need to specify them in the curl request. Otherwise, the DB names can be omitted. 

Pgloader & sqlserver2pgsql

Sqlserver2pgsql is an open-source tool specifically designed to migrate data from Microsoft SQL Server to PostgreSQL. It’s written in Perl and can convert a Microsoft SQL Server schema to a PostgreSQL schema. It utilizes Pentaho Kettle (an open-source ETL tool) to migrate data between these two databases. Kettle, on the other hand, is built on Java and uses JDBC to connect to the source and target systems. 

Sqlserver2pgsql may be good for one-off database migrations, but it is not suitable for use cases that require continuous data synchronization between the source and target databases for an extended period of time. For such scenarios, you should consider using tools like Debezium or Kafka Connectors.

Pgloader is another open-source tool for Postgres that utilizes the COPY PostgreSQL protocol to stream data into the server. Postgres’ transactional behavior causes the bulk load for a table to stop if there is an erroneous line in the input data. Pgloader allows you to implement Continuous Migration from your current database by keeping a separate file of rejected data. 

AWS Tools and Services

AWS provides several tools and services that can be combined to migrate your SQL Server Database to a PostgreSQL instance. We take a look at Babelfish, AWS Schema conversion tool (SCT), and AWS Database migration service (DMS).

Babelfish is a service that allows you to run Microsoft SQL Server applications directly on Aurora Postgresql with little to no code changes. It provides a translation layer for SQL Server’s proprietary SQL dialect (T-SQL) which enables PostgreSQL to understand database requests, both the command and the protocol, without changing libraries, schemas, or SQL statements.Besides translations for the dialect, Babelfish provides cursors, catalog views, data types, triggers, stored procedures, and functions.

Before using Babelfish you would need to assess the compatibility of your SQL Server with Babelfish for Postgres, and consider the PostgreSQL version Babelfish supports. You can make use of the Babelfish Compass tool to analyze the SQL/DDL code for your database and determine which features are supported and what kind of tweaks you might need to make to your existing database (such as changing certain procedures or functions). Further work on your database may be required as it can be challenging to achieve full compatibility with Babelfish for Postgres, especially for large and complex databases. 

In a standard database migration project, you can use the AWS Schema Conversion Tool (SCT) to convert your existing SQL Server database schema to an Amazon Aurora PostgreSQL schema. It is a free feature that comes with AWS Data Migration Service (DMS) and its main purpose is to automate the entire conversion from one database to another. However, you need to ensure that your server has full access to your database source. We have not yet tested SCT ourselves, but users have reported that it may not be the right solution for database-driven applications and that it is geared more toward Oracle products. 

AWS Database Migration Service (DMS) automates the migration process to AWS database services, in this case, Amazon Aurora PostgreSQL. It works by connecting DMS to your source SQL Server database, reading the data, preparing it for compatibility with the target database, and then transferring the data according to predefined migration tasks. It acts as a client to your database and constantly writes to the destination server. It can be used in full copy mode, where it just takes a copy of the data, or in Change Data Capture mode. 

Database Migration Service offers many automated features; however, it is not a fully-automated service. You do need to set up the process, and that requires an understanding of DMS components and processes. It is generally good for situations where downtime is not a real possibility for your application. 

Schema Conversion Tool and Data Migration Service work together. You can use SCT to author the conversion scripts required to change your database, then use it as part of your DMS job. DMS can only do basic creations and doesn't support all the triggers, procedures, and foreign key relationships, for which you would need SCT. You can also choose not to use SCT prior to using DMT, but that would require you to manually convert the source schema and code objects. Some users have reported that Schema Conversion Tool can be finicky to configure and may corrupt your data when transforming, which is why thorough testing is recommended. 

Post Data Transfer

Just like every database migration, switching from SQL Server to PostgresSQL requires a significant amount of post-transfer testing. You need to consider that your indexes that were previously tuned to SQL Server would not match one-to-one on Postgres, so you should keep an eye on any slow-performing queries and adjust the indexes to catch up with their old performance levels. Moreover, Postgres does not utilize the same amount of memory, so some tuning on your configuration might be needed to achieve better performance. 

Conclusion 

Database migration is a complex, multiphase process, that includes assessment, database schema conversion, script conversion, application code changes, data migration, functional testing, performance tuning, and many other steps. 

Just like every other migration, moving from SQL Server to PostgreSQL is not a trivial task. So before you undertake this endeavour, make sure the migration is justified rather than doing it for the sake of following the latest industry trend. 

Get Started

Ready to get started? Download SQLpipe or browse our tutorials