Set up Production-Grade HA PostgreSQL with CloudNativePG

Ship HA Postgres that doesn’t call you at 3AM.

In this hands-on guide, we’ll walk through setting up a PostgreSQL High Availability (HA) cluster using CloudNativePG and enabling automated backups and Point-in-Time Recovery (PITR) using MinIO as the S3-compatible storage backend.

Interested in knowing Kubernetes Operators before diving into backups and HA setups? Checkout

Aiming to Build

  • A 3-node PostgreSQL HA cluster (1 Primary + 2 Replicas)

  • MinIO is running inside the same Kubernetes cluster

  • Automated backups and WAL archiving via Barman

  • PITR (Point-in-Time Recovery) support

Prerequisites for Setup

  • Kubernetes Cluster Environment (Minikube / Killercoda)

  • Kubernetes Fundamentals

  • 📂 Looking for the full manifest?
    All Kubernetes YAMLs, Cluster configs, and CloudNativePG examples used in this post are available in our open-source repo. 👉 View on GitHub


Step 1: Create Required Secrets

App User Secret

# appuser-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: appuser-secret
  namespace: default
stringData:
  username: appuser
  password: appuser123
kubectl apply -f appuser-secret.yaml

Step 2: Deploy the PostgreSQL HA Cluster

Now we’ll create the PostgreSQL cluster with 3 instances, integrated with MinIO for automated backups and WAL archiving.

Create cluster-ha.yaml

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: cluster-example
spec:
  instances: 3  # 1 Primary, 2 Replicas

  storage:
    size: 1Gi

  bootstrap:
    initdb:
      database: appdb
      owner: appuser
      secret:
        name: appuser-secret

  superuserSecret:
    name: appuser-secret

  monitoring:
    enablePodMonitor: true

  logLevel: info

  resources:
    requests:
      memory: "512Mi"
      cpu: "250m"
    limits:
      memory: "1Gi"
      cpu: "500m"


  # Optional and safe PostgreSQL parameters
  postgresql:
    parameters:
      log_min_duration_statement: "1000"  # Log queries taking > 1s
      idle_in_transaction_session_timeout: "60000"  # 60s idle timeout

Deploy the Cluster

kubectl apply -f cluster-ha.yaml

Step 3: Verify Cluster Health

kubectl get cluster
kubectl get pods

You should see:

NAME                              READY   STATUS    RESTARTS   AGE
cluster-example-0                 1/1     Running   0          2m
cluster-example-1                 1/1     Running   0          2m
cluster-example-2                 1/1     Running   0          2m

Step 4: Identify Primary and Replica Pods

Let us do the following to simulate the Auto Failover [High Availability Mode]

kubectl desribe pods <pod-name>

kubectl exec -it cluster-example-1 -c postgres -- psql -U postgres -d appdb -c "SELECT pg_is_in_recovery();"


We will get to know that

  • If it returns fPrimary

  • If it returns tReplica

Create Cluster Status Script [Optional for Simulation]

# current_cluster_status.sh

for i in 0 1 2; do
  echo "cluster-example-$i:";
  kubectl exec -it cluster-example-$i -c postgres -- psql -U postgres -d appdb -c "SELECT pg_is_in_recovery();"
  echo "---"
done
chmod +x current_cluster_status.sh
./current_cluster_status.sh

The output will be presented as below


Step 5: Simulate Failover

kubectl delete pod cluster-example-1

CloudNativePG will:

  • Detect the failure

  • Automatically promote one of the replicas to Primary

Viola, we have simulated the automatic failover in the CloudnativePG!


Step 6: Install MinIO (S3-Compatible Backup Target)

We want to establish the automated backups using the MinIo (MinIO is a high-performance, S3-compatible object storage) and barman (Allows your company to implement disaster recovery solutions for PostgreSQL databases with high requirements of business continuity)

MinIO is a high-performance, S3-compatible object storage that serves as our backup destination.

Apply MinIO Operator

kubectl apply -k "github.com/minio/operator?ref=v5.0.18"

Check MinIO Operator Status

kubectl get pods -n minio-operator
kubectl get all -n minio-operator

AWS-Style Secret for MinIO

Even though MinIO is not AWS, it implements the same API as Amazon S3. Tools like Barman and CloudNativePG expect AWS-style credentials to communicate with S3-compatible storage.

# aws-creds.yaml
apiVersion: v1
kind: Secret
metadata:
  name: aws-creds
  namespace: default
stringData:
  AWS_ACCESS_KEY_ID: minio
  AWS_SECRET_ACCESS_KEY: minio123
kubectl apply -f aws-creds.yaml


Now, there are some of the changes that we can make in the cluster-ha.yaml

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: cluster-example
spec:
  instances: 3
  storage:
    size: 1Gi

  bootstrap:
    initdb:
      database: appdb
      owner: appuser
      secret:
        name: appuser-secret

  superuserSecret:
    name: appuser-secret

  backup:
    barmanObjectStore:
      destinationPath: s3://pg-backups/
      endpointURL: http://minio.default.svc.cluster.local:9000
      s3Credentials:
        accessKeyId:
          name: aws-creds
          key: AWS_ACCESS_KEY_ID
        secretAccessKey:
          name: aws-creds
          key: AWS_SECRET_ACCESS_KEY
      wal:
        compression: gzip
        maxParallel: 2

  monitoring:
    enablePodMonitor: true

  logLevel: info

  resources:
    requests:
      memory: "512Mi"
      cpu: "250m"
    limits:
      memory: "1Gi"
      cpu: "500m"

  postgresql:
    parameters:
      log_min_duration_statement: "1000"
      idle_in_transaction_session_timeout: "60000"

Why do we need AWS credentials in MinIO ?!

MinIO is S3-compatible but not S3; MinIO emulates the Amazon S3 API so clients can directly interact as they would with the AWS S3

Also, we can configure the backup systems like CloudnativePG with Barman

s3Credentials:
  accessKeyId:
    name: aws-creds
    key: AWS_ACCESS_KEY_ID
  secretAccessKey:
    name: aws-creds
    key: AWS_SECRET_ACCESS_KEY

Backup Tools like Barman make it convenient to talk to S3 APIs and expect to use the standard AWS-style credentials

MinIO accepts and processes these credentials even though it doesn’t validate them against AWS

Infering that you're not using real AWS credentials, but you must pretend to, because the client (like Barman) doesn’t care if the backend is AWS or MinIO, as long as the API matches


Step 7: Verify Write Ahead Logs (WAL) Archiving & Backup

In PostgreSQL, WAL (Write-Ahead Logging) makes sure changes are written to a log before touching the actual data, which keeps your data safe even if things crash. It also powers Point-In-Time Recovery, letting you rewind your database to a specific moment. Only the primary node archives WAL because it’s the only one generating it; replicas just follow along by replaying those logs.

What is WAL Archiving?!

WAL (Write-Ahead Logging) generates log records for every change in the database.
WAL archiving means saving those log files (WAL segments) to an external location (like S3 or MinIO) for:

  • Point-In-Time Recovery (PITR)

  • Disaster recovery

  • Bootstrapping new replicas

But why not from the Replicas ?!

Technically, replicas:

  • Stream WAL from the primary.

  • Do not have complete or authoritative WAL logs.

  • Might not receive logs in time (due to replication lag or network issues).

So if a replica tries to achieve:

  • You risk missing WAL segments.

  • Or duplicating incomplete ones.

  • Or archiving stale/inconsistent data.

Why WAL?

  • Only the primary generates new WAL records

  • These WALs are shipped to:

    • Replicas (for streaming)

    • Barman (for backups)

How can automated replicas be verified to be working properly ?!

Check Backup/Archiving Condition

kubectl describe cluster cluster-example | grep -A5 "Conditions"

It should show something like this:

Type:    ContinuousArchiving
Status:  True
Reason:  ContinuousArchivingSuccess

Step 8: Perform Point-In-Time Recovery (PITR)

To perform the PITR so we need to enable the WAL so that we can restore from the Backup + WAL.

Let’s say we want to recover the DB to 2025-06-24T08:12:00Z. (Z=UTC Timezone)

Create Recovery Cluster YAML

# recovery-cluster.yaml

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: recovery-cluster
spec:
  instances: 2
  storage:
    size: 1Gi

  bootstrap:
    recovery:
      source: cluster-example
      recoveryTarget:
        targetTime: "2025-06-24T08:12:00Z"

  superuserSecret:
    name: appuser-secret

  backup:
    barmanObjectStore:
      destinationPath: s3://pg-backups/
      endpointURL: http://minio.default.svc.cluster.local:9000
      s3Credentials:
        accessKeyId:
          name: aws-creds
          key: AWS_ACCESS_KEY_ID
        secretAccessKey:
          name: aws-creds
          key: AWS_SECRET_ACCESS_KEY
      wal:
        compression: gzip
        maxParallel: 2
        minFlushInterval: 5s

Deploy Recovery Cluster

kubectl apply -f recovery-cluster.yaml

Check Logs for Recovery Completion

kubectl logs -l cnpg.io/cluster=recovery-cluster

You should see:

recovery completed up to "2025-06-24T08:12:00Z"

Viola, you have now completed PITR successfully!

Links for Reference:

How we saved around 65% Cost in AWS?!

Checkout the whole story to know more ?!

EzyInfra.dev – Expert DevOps & Infrastructure consulting! We help you set up, optimize, and manage cloud (AWS, GCP) and Kubernetes infrastructure—efficiently and cost-effectively. Need a strategy? Get a free consultation now!



Share this post

Want to discuss about DevOps practices, Infrastructure Audits or Free consulting for your AWS Cloud?

Prasanna would be glad to jump into a call
Loading...