However, this can be problematic when distributing the database solution to multiple computers, because the standard PostgreSQL server can only run on a single computer. In this article, we will study different extension schemes and their implementation in PostgreSQL.
Replication can be used in many expansion scenarios. Its main purpose is to create and maintain a backup database when the system fails. This is especially true for physical replication. However, replication can also be used to improve the performance of PostgreSQL-based solutions. Sometimes third-party tools can be used to implement complex expansion schemes.
PgBouncer is used as the connection pool for Postgres. It is a lightweight tool, known for its small footprint and minimal overhead. The program can easily store connections to various database servers. The ability to manage database connection pools limits the actual number of connections to each Postgres instance. Improve overall performance in heavy traffic.
Although Apache PgBouncer is a lightweight tool and can do one of its jobs well, it can be limited to doing one job well and not many others. This is in stark contrast with other tools provided by the community (such as pgpool-ii). pgpool-ii can handle replication and load balancing in addition to pooling, but they are heavier and less efficient-usually used in the field, but also Used for on-site pooling.
Pgpool- is now in the second major iteration, so Pgpool-II- is more like a Swiss army knife with Postgres middleware capabilities. In addition to connection pooling, it also performs load balancing, parallel query processing, replication and failover processing-these are the basic functions of any scalable web application database. All client access to the PostgreSQL instance (or multiple instances) is managed through Pgpool middleware.
Replication and load balancing are essential for any web application that receives large amounts of traffic. Load balancing allows Pgpool-II to seamlessly distribute SELECT statements among multiple replicated database servers. As a result, the throughput of the entire system is improved. As a middleware, it mimics the Postgres API, so client applications connect to Pgpool-II in the same way as any PostgreSQL server.
The fault-tolerant system not only deploys a single PostgreSQL server, but also implements a PostgresSQL cluster.
A PostgreSQL cluster consists of a master PostgreSQL server and one or more replication slaves. The write message is sent to the master server, and all slave servers can be used to process read requests. If the master server fails, you can upgrade a slave server to a new master server.
Software such as Pgpool can be deployed to take advantage of the cluster:
Send write operations (create, update, delete) to the master load balance read on all servers.
PostgreSQL 9.0 introduced built-in streaming replication (SR). SR sends changes from WAL to multiple slave PostgreSQL servers. The slave station applies the stream to its data, keeps an exact copy of the data, and maintains the "hot standby" mode. If the master server fails, it can be upgraded to the master server in the blink of an eye.
In addition, due to the relatively low latency, the slave server can also handle read-only requests from the database, so it can be used for load balancing.
In order to be used in a PostgreSQL cluster, the PgPool server is located between the client and the master and slave servers. The client connects to Pgpool instead of the database server, and sends database requests to the servers in the cluster through Pgpool. Use "connection pool" for configuration. Pgpool sends all data mutation requests (update, create, delete, etc.) to the master server, and sends read requests (selection) to any available server (master server or slave server). Use "Load Balancing" for configuration. When Pgpool detects that the master server is down, it can issue a command to upgrade one slave server to the next master server. (Regmgr has better features to manage this feature.) Use "failover" for configuration.
PgPool replication sends all write requests to all servers in the cluster. This is an alternative to streaming replication and can be used without SR. However, it has higher overhead, and larger write transactions will reduce performance.
With PgPool replication, all servers in the cluster operate in normal mode, and if a server fails, no failover events are required. Other servers bear the load.
For high-volume applications, streaming replication is still recommended.
PgBouncer
PgBouncer is an alternative connection pool (re-caching) middleware for Pgpool. It occupies a small space and only pools, so it can save resources and improve efficiency. It can cache connections to different databases, servers or clusters (using Pgpool or Loadbalancer).
In PgBouncer, configure pgbouncer as a postgresql connector. It maps your locally connected database name to a real database that can reside on multiple hosts in the system.
The database connection configuration is:
When the client connects to "app_db", it will run the agent and forward the request to the actual location of the database. After the client closes the connection, it will remain open for other clients to reuse or time out. Pgbouncer can also maintain an actual connection pool for each database entry, thereby limiting the number of outbound connections to the database.
If you want to merge connections to multiple postgresql servers, PgBouncer is an excellent choice. If Pgpool load balancing, replication or failover functions are still needed, you can run two middleware in series.
- The application client connects to PgBouncer
- PgBouncer forwards the request to the PgPool of the cluster
- PgPool forwards the request to the PostreSQL server (master server or slave server)
- PostgreSQL responds to requests
How to load-balance queries between several servers?
PgBouncer does not have an internal multi-host configuration. It is possible via external tools:
DNS round-robin. Use several IPs behind one DNS name. PgBouncer does not look up DNS each time a new connection is launched. Instead, it caches all IPs and does round-robin internally. Note: if there are more than 8 IPs behind one name, the DNS backend must support the EDNS0 protocol. See README for details.
Use a TCP connection load-balancer. Either LVS or HAProxy seem to be good choices. On the PgBouncer side it may be a good idea to make server_lifetime smaller and also turn server_round_robin on: by default, idle connections are reused by a LIFO algorithm, which may work not so well when load-balancing is needed.
Use PgBouncer in the following situations:
You only need the connection pool.
Your resources are limited. Pgbouncer does not spawn new processes (it is event-based).
You want to connect to a PostgreSQL cluster and also connect to other PostgreSQL servers or clusters
You want to move the database connection credentials from the application configuration to the middleware
Allows you to move the database more transparently without changing the configuration of each application
You want to query it for statistical testing.
Use Pgpool in the following situations:
You want to use the cluster for load balancing and failover
You want Pgpool's replication or parallel query capabilities
You still need to establish a connection pool on top of this.
If used at the same time:
You have deployed a cluster, but want to control multiple database access
There are a lot of connections to the pool (use PgBouncer to reduce resource usage) and want to connect to the Pgpool-oriented cluster.
Repmgr
Repmgr Set up cluster replication and provide a daemon that monitors the failure of nodes in the cluster.
First, create a replicated node from the original master node. It copies the basic file from this server to another file that runs as a slave server. You can specify the master node and backup node (or slave node).
When a failure occurs, it can upgrade a slave node to the next master node, remove the old master node from the cluster until it can be repaired, and tell other slave nodes to follow the newly upgraded master node.
You can re-provision the new node (or the old master node) to the cluster and introduce it to the cluster.
HAProxy
HAProxy stands for High Availability Proxy and is a great software-based TCP/HTTP load balancer. It distributes a workload across a set of servers to maximize performance and optimize resource usage. HAProxy is built with sophisticated and customizable health checks methods, allowing a number of services to be load balanced in a single running instance.
One possible setup is to install an HAProxy on each web server (or application server making requests on the database). This works fine if there are only a few web servers, so the load introduced by the health checks is kept in check. The web server would connect to the local HAProxy (e.g. making a psql connection on 127.0.0.1:5432), and can access all the database servers. The Web and HAProxy together form a working unit, so the web server will not work if the HAProxy is not available.
With HAProxy in the load balancer tier, you will have the following advantages:
All applications access the cluster via one single IP address or hostname. The topology of the database cluster is masked behind HAProxy.
PostgreSQL connections are load-balanced between available DB nodes.
It is possible to add or remove database nodes without any changes to the applications.
Once the maximum number of database connections (in PostgreSQL) is reached, HAProxy queues additional new connections. This is a neat way of throttling database connection requests and achieves overload protection.
Cons of Using HAProxy for PostgreSQL
HAProxy does not provide query filtering nor does query analysis to identify the type of statements being requested. It lacks the ability to perform a read/write split on a single port. Setting a load balancer on top of HAProxy requires that you have to at least set up different ports for your writes and different ones for your reads. This requires application changes to suit your needs.
Zo dus replicatie en load balancing is niet bepaald 1 oplossing maar verschillende die geldig zijn voor eigen.
Installations in practice
All of our PostgreSQL instances run in a master-replica setup using Streaming Replication, and we use tartarus plain SQL to take frequent backups of our systems.
To connect to our databases from our app servers, we made early on that had a huge impact on performance by using Pgbouncer to pool our connections to PostgreSQL. We found Christophe Pettus’s blog to be a great resource for Django, PostgreSQL and Pgbouncer tips.
https://instagram-engineering.com/what-powers-instagram-hundreds-of-instances-dozens-of-technologies-adf2e22da2ad
https://archive.org/details/pyvideo_418___disqus-serving-400-million-people-with-python
We will configure everything with pgpool but without watchdog and automatic switching as we only have 2 instances. If we had 3 then we could run completely automatically via Virtual IP now there is too great a risk that two servers will be used as primary and then data will be ruined so better just run two everything on one new server and copy if necessary to the other and load balancing on the two .. From local server only read queries so
I found quite a few people complaining about pgpool replication may not be used properly .. So I'll just use standard replication then manage it with Repmgr and see … I describe Pgpool replication as one option.
Further in study PgBouncer found in FAQ that it is not possible to configure a multi host .. So I have to use pgpool for pooling via Virtual IP.
Not all ubuntu systems have pgpool and it is necessary for streaming replication.
At this time I recommend 20.10 minimal.
Update packaging system
apt-get update
Install postgresql
apt-get install postgresql
Install Pgpool
apt-get install pgpool2
Transfer a backup to put on the server.
scp -r backup_fr.sql.gz root@159.69.45.65:/home/backup
gzip -d vindazo_fr.sql.gz
Create a user
sudo -u postgres createuser -D -A -P username
Create a database
sudo -u postgres createdb -O dabasename username
Set up data
psql -Uusername -Wpassword -hlocalhost -p 5433 -f backup.sql
If you have problems with full backup from database like sometimes happening some tables are corrupt. You can export all tables separately and restore them manually.
For example
pg_dump -h localhost -p 5432 -U vindazo_de -d vindazo_de -t spontaneous_spontaneousapplication > spontaneousapplication.sql
scp spontaneousapplication.sql root@138.201.8.118:/home/backup
So you can move several tables at the same time by zip file.
psql 'host=localhost user=vindazo_de dbname=vindazo_de connect_timeout=2'
Show table content and if it is equal then do nothing but if it is different then export it in the separate file.
select count(*) from auth_group_permissions;
Another way to backup a server is if the server has to continue running and cannot be turned off for synchronization.
https://www.postgresql.org/docs/10/app-pgbasebackup.html
To test whether everything has been copied correctly, we can connect the database via pgpool to another dev app and see if everything is working properly. You can still test it and add something in the primary master database and see if this object is also present in another slave server and everything is properly synchronized.
Configuration file for postgres can you use from this site.
https://www.pgconfig.org/
Only provide correct settings for your hardware.
For example
Vim /etc/postgresql/12/main/postgresql.conf
# Memory Configuration
shared_buffers = 30GB
effective_cache_size = 90GB
work_mem = 1GB
maintenance_work_mem = 2GB
# Checkpoint Related Configuration
min_wal_size = 512MB
max_wal_size = 2GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
# Network Related Configuration
listen_addresses = '*'
max_connections = 100
# Storage Configuration
random_page_cost = 1.1
effective_io_concurrency = 200
# Worker Processes
max_worker_processes = 12
max_parallel_workers_per_gather = 6
max_parallel_workers = 12
# Logging configuration for pgbadger
logging_collector = on
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
log_temp_files = 0
lc_messages = 'C'
# Adjust the minimum time to collect data
log_min_duration_statement = '10s'
log_autovacuum_min_duration = 0
# 'csvlog' format configuration
log_destination = 'csvlog'
Pgpool configuration
You have 2 most usable practical configurations whit 2 and 3 postgres and pgpool services.
Example replica configuration
https://github.com/rdio/pgpool2/blob/master/pgpool.conf.sample-stream
https://www.pgpool.net/pgpool-web/contrib_docs/watchdog_master_slave_3.3/en.html
In Ubuntu standard configuration locatie is
vim /etc/pgpool2/pgpool.conf
Video on youtube with configuration divided reads and writes .. This example I should have actually ..
https://www.youtube.com/watch?v=oc_FQHq2dPI
If error appears
no pg_hba.conf entry for host
vim /etc/postgresql/9.5/main/pg_hba.conf
host all all 159.69.XX.XX/24 trust
Ip mask is verplicht anders zal de server niet kunnen opstarten.
unable to read data from DB node 0
In my case, this error means that slave authentication is not working. I removed all replication configurations and was able to boot without issue and everything worked back.
ERROR: failed to authenticate
Feb 23 10:26:51 vindazo pgpool[28865]: 2021-02-23 10:26:51: pid 28865: DETAIL: invalid authentication message response type, Expecting 'R' and received 'E'
Feb 23 10:26:51 vindazo pgpool[28865]: 2021-02-23 10:26:51: pid 28865: LOG: find_primary_node: checking backend no 1
Note that there are two files that are similar but have completely different influence on the configuration of the server postgres and the pgpool
vim /etc/postgresql/12/main/pg_hba.conf
vim /etc/pgpool2/pool_hba.conf
So if you see an authentication error then you have to add ip with trus to the postgres and not to the pool.
ERROR: MD5 authentication is unsupported in replication, master-slave and parallel modes.
HINT: check pg_hba.conf
Unable to get password, password file descriptor is NULL
Add or remove users and test at database level.
su - postgres
postgres@vindazo:~$ psql -U postgres -p 5432
test=# show pool_nodes;
\copy (Select ip, job_id, title, raw_text, city, cv, motivation, firstname, lastname, email, phone, added, user_agent, country, geo_data, activation_ip, has_activation, unsubscribe, has_person, has_profile From spontaneous_spontaneousapplication where added >= '2021-02-25';) To '/tmp/spontaneousapplication.csv' With CSV;
Select ip, job_id, title, raw_text, city, cv, motivation, firstname, lastname, email, phone, added, user_agent, country, geo_data, activation_ip, has_activation, unsubscribe, has_person, has_profile From spontaneous_spontaneousapplication where added >= '2021-02-25';
Import not working use this one
\copy (Select * From spontaneous_spontaneousapplication where added >= '2021-02-25') To '/tmp/spontaneousapplication.csv' With CSV;
\COPY spontaneous_spontaneousapplication FROM '/home/backup/spontaneousapplication.csv' WITH CSV HEADER;
ERROR: duplicate key value violates unique constraint "spontaneous_spontaneousapplication_pkey"
DETAIL: Key (id)=(117466) already exists.
CONTEXT: COPY spontaneous_spontaneousapplication, line 2
\d+ tablename
So for next time you have to turn off the web server until all data is in sync ..
Install Postgresql 12 op oude systeem.
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
echo "deb http://apt.postgresql.org/pub/repos/apt/ `lsb_release -cs`-pgdg main" |sudo tee /etc/apt/sources.list.d/pgdg.list
apt update
apt -y install postgresql-12 postgresql-client-12
vim /etc/postgresql/12/main/postgresql.conf
Test connection from Standby to Primary server
psql 'host=159.69.65.2 user=repmgr dbname=repmgr connect_timeout=2'
chown -R postgres:postgres /mnt/extradrive/postgresql/12/main/
chown postgres /mnt/extradrive/postgresql/12/main/
su - postgres
repmgr -h 159.69.65.213 -U repmgr -d repmgr -f /etc/postgresql-common/repmgr.conf standby clone --dry-run
If postgresql failed to start on old ubuntu. You can try to start it manually.
systemctl list-dependencies postgresql
systemctl start postgresql@12-main.service
journalctl -xe
/usr/lib/postgresql/12/bin/pg_ctl start -D /mnt/extradrive/postgresql/12/main/ -l /var/log/postgresql/postgresql-12-main.log -s -o -c config_file="/etc/postgresql/12/main/postgresql.conf”
https://www.postgresql.org/docs/11/hot-standby.html
Failed to start PostgreSQL Cluster 12-main.
Feb 26 15:42:13 vindazo systemd[1]: postgresql@12-main.service: Unit entered failed state.
Feb 26 15:42:13 vindazo systemd[1]: postgresql@12-main.service: Failed with result 'resources'.
hot standby is not possible because max_worker_processes = 8 is a lower
LOG: aborting startup due to startup process failure
Higher values can then be supplied and the server restarted to begin recovery again. These parameters are:
- max_connections
- max_prepared_transactions
- max_locks_per_transaction
- max_worker_processes
psql "host=159.69.65.213 user=repmgr dbname=repmgr connect_timeout=2"
su - postgres
repmgr -f /etc/postgresql-common/repmgr.conf standby register
FATAL: no pg_hba.conf entry for host "159.69.3.7", user "repmgr", database "repmgr", SSL on
FATAL: no pg_hba.conf entry for host "159.69.3.7", user "repmgr", database "repmgr", SSL off
HINT: to register a standby which is not running, provide primary connection parameters and use option -F/--force
repmgr -f /etc/postgresql-common/repmgr.conf standby register
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1)
WARNING: node "VindazoStandBy" not found in "pg_stat_replication"
ERROR: local node not attached to primary node 1
HINT: specify the actual upstream node id with --upstream-node-id, or use -F/--force to continue anyway
repmgr -F --upstream-node-id=1 -f /etc/postgresql-common/repmgr.conf standby register
WARNING: node "VindazoStandBy" not found in "pg_stat_replication"
WARNING: this node does not appear to be attached to upstream node "primeryVindazoFr" (ID: 1)
INFO: standby registration complete
repmgr -f /etc/postgresql-common/repmgr.conf cluster show
repmgr -f /etc/postgresql-common/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------------------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1 | primeryVindazoFr | primary | * running | | default | 100 | 1 | host=159.69.65.213 user=repmgr dbname=repmgr connect_timeout=2
After days of study and experiments, I have found a table with comparison.
https://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling
Very interesting information.
Because automatic tools such as repmgr do not always work as I expect. I have to try everything manually myself and see if I need it further.
https://wiki.postgresql.org/wiki/Streaming_Replication
You have to try everything and it's better where nothing breaks. Buy a few small servers and try it all there.
Backup on the primary server so I can go back and everything is more or less secured in old fashioned mode. SQL backup is the best backup i can have..
If your replication not working like described above,
Then delete everything on standby and start over from the first installation.
Backup instellen old fashion.
vim pg_backup.config
BACKUP_DIR=/home/backup/database/
./pg_backup_rotated.sh
vim /etc/environment
PGPASSFILE="/home/backup/scripts/pgpass.conf"
- Install Tartarus
http://wiki.hetzner.de/index.php/Tartarus_Backup-Konfiguration
wget http://wertarbyte.de/apt/tartarus/tartarus_0.9.8-1_all.deb
sudo dpkg -i tartarus_0.9.8-1_all.deb
- Tartarus configuration filesbackup.bash
-----------------------------------------------
Each file for all directories example
vim /etc/tartarus/generic.inc
vim /home/backup/tartarus/database.conf
# /etc/tartarus/root.conf
#
# Read main config
source /etc/tartarus/generic.inc
# Profile name
# !!!!! CHANGE NAME !!!!
NAME="database-carzando-propenda"
# Directory / Backup
DIRECTORY="/media/sdc/backups/database"
#Backup no temporary files
# separate several folders with a space
EXCLUDE="/tmp/"
# No LVM snapshot
CREATE_LVM_SNAPSHOT="no"
# incremental backup
# !!!! CREATE PATH HERE !!!!
INCREMENTAL_TIMESTAMP_FILE="/var/spool/tartarus/timestamps/database"
Backup config files /etc/ /var/ crontab and list of packeges en python lib..
Automation and upload
====================
Crontab
-----------------------------
0 0 * * * /etc/cron.daily/logrotate
0 0 * * * /home/backup/scripts/pg_backup_rotated.sh
0 4 * * mon-sat /home/backup/scripts/filesbackup.bash -i
0 4 * * sun /home/backup/scripts/filesbackup.bash
#0 12 * * * /media/sdc/backups/scripts/checkbackup.bash
Mount ftp curlftpfs
curlftpfs ftp://u54976:OUv2TXQGeX8lBxdQ@u54976.your-backup.de/ /media/ftp
Install and config repmgr Ubuntu
apt-get install repmgr
vim /etc/postgresql/12/main/postgresql.conf
max_wal_senders = 10
max_replication_slots = 10
wal_level = 'hot_standby'
hot_standby = on
archive_mode = on
archive_command = '/bin/true'
su - postgres
createuser -s repmgr
createdb repmgr -O repmgr
ALTER USER repmgr SET search_path TO repmgr, "$user", public;
vim /etc/postgresql/12/main/pg_hba.conf
local replication repmgr trust
host replication repmgr 127.0.0.1/32 trust
host replication repmgr 192.168.1.0/24 trust
local repmgr repmgr trust
host repmgr repmgr 127.0.0.1/32 trust
host repmgr repmgr 192.168.1.0/24 trust
Test connection
psql 'host= user=repmgr dbname=repmgr connect_timeout=2'
vim /etc/postgresql-common/repmgr.conf
node_id=1
node_name='node1'
conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/data'
su - postgres
repmgr -f /etc/postgresql-common/repmgr.conf primary register
repmgr -f /etc/postgresql-common/repmgr.conf cluster show
psql "host=159.69.65.213 user=repmgr dbname=repmgr connect_timeout=2"
SELECT * FROM repmgr.nodes;
Monitoring
https://graphiteapp.org/#overview
Links to official documentation and related blog posts
https://docs.hetzner.com/robot/dedicated-server/ip/failover/
https://www.pgpool.net/docs/pgpool-II-3.2.0/wd-en.html
http://dbasys.blogspot.com/2014/01/scaling-postgresql-with-pgpool-and.html
https://repmgr.org/docs/current/index.html
https://www.postgresql.org/docs/current/warm-standby.html#STREAMING-REPLICATION
https://www.pgpool.net/docs/latest/en/html/tutorial-replication.html
https://www.pgpool.net/docs/latest/en/html/admin.html
https://www.pgpool.net/docs/latest/en/html/example-cluster.html
Comments
Post a Comment