Reference architecture for self-hosting Infisical on Kubernetes (HA)
Deploying Infisical on-premise with high availability requires expertise in networking, container orchestration, and database management.
This guide serves as a reference architecture and a starting point. Actual deployments may vary depending on your organization’s existing infrastructure and capabilities.
The architecture above makes use of Kubernetes for orchestrating both stateless and stateful components.
The architecture spans multiple data centers for increased redundancy, availability and disaster recovery capabilities using an active-passive configuration.
While managing databases within Kubernetes has typically been complex, modern operators like CloudNativePG simplify this process by handling storage provisioning, persistent volume management, and backup/recovery processes.
However, if you lack deep expertise in Kubernetes operators or database management, we recommend a hybrid approach where the database is on a managed service for production deployments.
Managing stateful components like databases can be challenging without deep expertise or a dedicated in-house database management team.
To simplify operations and reduce complexity, we recommend offloading databases to managed services from AWS/GCP.
These managed services automatically handle provisioning, scaling, failover, backups and rollbacks.
Infisical is deployed on a Kubernetes cluster, which allows for container management, auto-scaling, and self-healing capabilities.
A load balancer sits in front of the Kubernetes cluster, directing traffic and making sure there is an even load distribution across the application nodes.
This is the entry point where all other services will interact with Infisical.
Redis is deployed using the Bitnami Helm chart in a simple primary configuration:
Single Redis instance per cluster without streaming replication
Regular backups to object storage
Restore from backup during failover
Infisical does not support Redis cluster mode, and since this is an active-passive setup, we use a simple Redis deployment with backup/restore for failover.
PostgreSQL is the single source of truth for nearly all application data on Infisical.CloudNativePG provides well defined backup and restore capabilities:
Continuous Backup: The operator continuously archives WAL files to object storage
Point-in-Time Recovery: Supports restoring to any point in time using WAL archiving
Regular Testing: Periodically test backup restoration to exercise the full lifecycle of this process
During failover, the latest Redis backup is restored from object storage to the passive data center. This process is manual and requires operator intervention.
Infisical can be deployed across multiple data centers in an active-passive configuration for disaster recovery. In this setup, one data center serves as the active site while others remain as passive standbys.
A global load balancer for traffic management. For on-premises deployments, this can be implemented using:
HAProxy or NGINX configured as a global load balancer
Any enterprise network routing solutions you may already have in place
Each data center should have its own ingress or load balancer
The global load balancer should be deployed in a highly available configuration across multiple locations to avoid it becoming a single point of failure.During normal operation:
The global load balancer routes all traffic to the active data center
Replica PostgreSQL clusters continuously replicate from the primary cluster
Redis backups are regularly created and stored in object storage
During failover:
A human operator must initiate the failover process
The operator promotes a replica PostgreSQL cluster in the target passive data center to become primary using CloudNativePG’s promotion process
The latest Redis backup is restored from object storage to the passive data center’s Redis instance
Once database failover is complete, the global load balancer is updated to direct traffic to the new active data center
This is an active-passive setup where failover must be initiated manually by an operator. Automatic failover between data centers is not recommended as it can lead to split-brain scenarios. The operator should verify the state of both data centers before initiating failover.