libpq coming with PostgreSQL 10 has support for multiple hostnames and IP addresses and iterating over them until it finds a good endpoint to connect to. It also supports resolving a single hostname with multiple address records backing it and iterating over those. However target_session_attrs needs to be passed to the postgres backend in order to always connect to the master instance, not slaves.
# without target_session_attrs python -c "import psycopg2; psycopg2.connect('user=postgres password=postgres host=pg1,pg2,pg2andpg3 port=5432,5432,5432 dbname=maas connect_timeout=3')" # with target_session_attrs python -c "import psycopg2; psycopg2.connect('user=postgres password=postgres host=pg1,pg2,pg2andpg3 port=5432,5432,5432 dbname=maas connect_timeout=3 target_session_attrs=read-write')"
This explains that target_session_attrs=read-write needs to be passed in order to avoid connecting to read-only slaves:
“any”, meaning that any kind of servers can be accepted. This is as well the default value.
“read-write”, to disallow connections to read-only servers, hot standbys for example.
“If a failover happens and a standby is promoted and switches to be a primary, target_session_attrs can be used in read-write mode with the addresses of all the nodes of the cluster to allow the application to connect to a primary for read-write actions or any nodes for read-only actions.”
MAAS relies on django and psycopg2 backend to connect to postgres.
However, it does not allow passing OPTIONS which could contain “target_session_attrs=read-write”
“Extra parameters to use when connecting to the database. Available parameters vary depending on your database backend.”
Using the default setting target_session_attrs=any should also work but if we know that there is only one master in the cluster it is better to be specific.
This simplifies the logic at application level: there is no need for it to know exactly which node is the primary and which ones are the standbys. The cost though, is an increase in connection failures when using the read-write mode, but that may be acceptable if the cluster is in a low-latency environment."
On failover handling:
From the perspective of using a VIP and gratuitous ARP type of failover on a single subnet (with Pacemaker managing the VIP and GARP), it doesn’t look like there is a big difference with the multi-endpoint setup:
We do not have any TCP connection state synchronization as in LVS http://www.linuxvirtualserver.org/docs/sync.html or other connection state replication relevant to PostgreSQL between PostgreSQL nodes - there is only data replication;
on failover a client recreates a TCP connection to the same VIP endpoint.
In the case of multiple endpoints we would just try several of them before the client lib would declare the connection as failed when creating a new connection or during failover handling.
The bulk of the client logic is in PQconnectPoll and the rest should be handled in the client library (psycopg2 that uses libpq).
https://docs.djangoproject.com/en/2.1/ref/databases/#connection-management (unrecoverable errors seem to affect only one request in django which is used by MAAS).