AWS DMS with source PostgreSQL - ERROR: column "lsn" does not exist


I recently ran into this error, while trying to replicate data from a PostgreSQL RDS database to a Redshift cluster via AWS DMS:

[SOURCE_CAPTURE  ]E:  RetCode: SQL_ERROR  SqlState: 42703 NativeError: 1 Message: ERROR: column "lsn" does not exist; No query has been executed with that handle [1022502]  (ar_odbc_stmt.c:3932)

I searched all over the internet for documentation to understand why this is happening - hopefully this will help others who may run into the issue.

Using pglogical for replication, I first made sure all the appropriate parameters were set on the source database - which was running on AWS RDS, engine postgresql v10.18. AWS documentation is pretty clear on the topic: link, and StackOverflow has lots of info on replication slots, and fine-tuning the supported parameters.

Still, the same error kept occuring, over and over:

[SOURCE_CAPTURE  ]E:  RetCode: SQL_ERROR  SqlState: 42703 NativeError: 1 Message: ERROR: column "lsn" does not exist; No query has been executed with that handle [1022502]  (ar_odbc_stmt.c:3932)

[SOURCE_CAPTURE  ]E:  Unable to create slot 'sctxw6eqafdnvey4_00016521_b0c97c3c_bbd5_4a57_8691_7163a4c585a5' (on create_statement(...) phase) [1020101]  (postgres_pglogical.c:498)

[SOURCE_CAPTURE  ]E:  RetCode: SQL_ERROR  SqlState: 42P01 NativeError: 1 Message: ERROR: relation "pglogical.replication_set" does not exist; No query has been executed with that handle [1022502]  (ar_odbc_stmt.c:3932)

[SOURCE_CAPTURE  ]E:  Could not find any supported plugins available on source or there was an error with the existing plugin.  (postgres_plugin.c:271)

[SOURCE_CAPTURE  ]E:  Could not resolve default plugin [1020490]  (postgres_plugin.c:67)

[SOURCE_CAPTURE  ]E:  Could not assign a postgres plugin to use for replication [1020490]  (postgres_endpoint_util.c:411)

Weird. Very weird. Tried resetting all the parameters values, rebooted the database a few times, while also running some SQL commands directly to verify that everything is set correctly. Still, pglogical was in a zombie state.

The answer finally came from an equally weird-looking pdf document: link. This seems to be written in Spanish (I don’t speak Spanish…) but Google Translate can help.

Anyways, in page 140 (page 133 of the document) we read:

After you have upgraded your AWS DMS version to 3.3.0 or later, follow these steps:
1. Remove the reference `fnRenames` of the statement `ALTER USER` which you use to set your path source PostgreSQL 10.x configuration lookup.
2. Delete the schema `fnRenames` of the PostgreSQL database.

If you don't follow the steps below after upgrade, you will see the following message error in the registry when the `fnRenames` schema is accessed:

---
RetCode: SQL_ERROR SqlState: 42703 NativeError: 1 Message: ERROR: column "lsn" does not exist;
---

If AWS DMS is using a non-master user account with the database, you must also set certain permissions to access these wrapper functions with PostgreSQL databases 10.x source. To set these permissions, run the following grants:

---
GRANT USAGE ON SCHEMA fnRenames TO dms_superuser;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA fnRenames TO dms_superuser;
---

Right. So, basically the only resource I found to help says to drop this fnRenames schema. But why? I tried searching for this document in its English version, since it looks like an official AWS document. The closest I found was this one: link. But still it did not contain this same text.

However, what it does contain is this:

PostgreSQL version 10.x and later databases have numerous changes in function names and folder names from previous PostgreSQL versions. These changes make certain migration actions not backward compatible.

Because most of the name changes are superficial, AWS DMS has created wrapper functions that let AWS DMS work with PostgreSQL version 10.x and later. The wrapper functions are prioritized higher than functions in pg_catalog. In addition, we ensure that schema visibility of existing schemas isn't changed so that we don't override any other system catalog functions such as user-defined functions.

To use these wrapper functions before you perform any migration tasks, run the following SQL code on the source PostgreSQL database. Use the same AWS DMS user account as you are using for the target database.

{...SQL statements...}

Note

If you do not invoke this preparatory code on a source PostgreSQL 10.x database, an error is raised like this.
---
2018-10-29T02:57:50 [SOURCE_CAPTURE ]E: RetCode: SQL_ERROR SqlState: 42703
NativeError: 1 Message: ERROR: column "xlog_position" does not exist;, No query has been executed with that handle [1022502] (ar_odbc_stmt.c:3647)
---

Promising, but not exactly what I get. But still, our version is 10.18 so this makes sense.

So, let’s try running the SQL commands as they are pasted in this document (page 118). Then let’s run the two commented commands that also the Spanish document mentioned:

DROP SCHEMA fnRenames CASCADE;
ALTER USER PG_User SET search_path TO DEFAULT;

Restarted the DMS task, praying for an end to this 3-hour long hunt, and amazingly the replication started working!

If this helped you, please feel free to give me a tip! https://ko-fi.com/kostavro