Kubernetes Secrets: Architecture & Best Practices

See how to build a Kubernetes secrets management architecture with rotation, auditing, and multi-cluster governance.

Topics Covered

Looking to improve your secret management processes?Talk to an expert

Transcript

Hey everyone, I'm Jake and I'm a developer advocate at Infisical. I spend a lot of my time talking to platform teams about how they manage secrets. How they store them, how they distribute them, how they rotate them. And honestly, how they end up in situations where nobody's quite sure how any of that works anymore.

So this talk is not a demo or a tool comparison. What I want to do is give you a framework, a way of thinking about secrets management as a platform architecture decision. Because in my experience, most teams don't make that decision deliberately. They make a dozen tactical decisions and then 18 months later, this is hardened into an architecture that nobody's really happy with. So let's talk about how to avoid that.

The Accumulation Problem

Okay, so here's a scenario I think a lot of you have lived through. You join a platform team, or maybe you've been on one for a while, and you take a look at how secrets are managed across your clusters. What you find is everything:

There are sealed secrets checked into a GitOps repo.
There's a Vault integration that one team set up, but only their namespace uses it.
Another team has credentials baked into Helm that get ingested during deploy time.
There's a shared namespace with a Kubernetes secret that was created manually by somebody who left the company and four different services are reading from it.

Nobody's sure when it was last rotated and nobody's sure if it's the same value that it is in staging.

And here's the thing, nobody chose this. Nobody sat down and said our secret strategy is going to be five different solutions duct taped together. It just accumulated. Every team solved the immediate problem that they had at the time. And the platform team either wasn't involved or didn't have a strong opinion yet.

This happens because secrets feel like a deployment detail. You need a database password, you put it somewhere the pod can read it and you move on with your day. And on day one, that's totally fine. Every approach works on day one.

The problem is that secrets aren't just a deployment detail. They sit at the intersection of security, compliance, developer experience, incident response, CI/CD and multi-cluster governance. A decision about where a database credential lives determines who can rotate it. It determines whether you can audit who has access to it. It determines what happens during an incident when you need to revoke it across 12 clusters quickly.

And those consequences don't show up on day one, they show up on month 12 or month 18. When you're onboarding your fourth team or an auditor asks for evidence. When a credential is leaked and you need to answer the question, where is this secret, who or what has access to it, and can I revoke it without taking down production?

That's where your secrets pattern becomes your secrets architecture, whether you planned for it or not.

Two Paradigms for Secrets Management

So let's take a look at the landscape. Now I want to be deliberate about this because I think one of the mistakes people make is treating secrets management like a tool decision. Should we use sealed secrets or external secrets operator or CSI driver? The architectural decision that sits one level above that answers the question, where should be the source of truth?

And there are two fundamental paradigms here.

Paradigm 1: Kubernetes as the Source of Truth

The secret lives in Kubernetes. It's stored in etcd, it gets there through Helm, Kustomize, sealed secrets, a GitOps pipeline, or maybe someone just running kubectl create secret. However it arrives, Kubernetes is the system of record and the cluster holds the canonical value.

This is the most common approach and for a lot of teams, it's completely appropriate. It's native, it's simple, there are no external dependencies to manage. If you're a small team running a single cluster, this may be all you need for a while.

But it carries structural limitations that compound over time:

Rotation is manual unless you build automation around it yourself.
Audit visibility is limited to Kubernetes API logs, which will tell you when someone read a secret, but it won't give you the type of access level auditing that a lot of compliance teams ask for.
The moment you're running more than one cluster, you have the same secret defined in multiple places with no centralized view of where it exists, what its value is, or whether it's consistent across environments.

None of these are deal breakers on their own, but they do add up.

Paradigm 2: An External System as the Source of Truth

Here, the canonical secret lives outside of Kubernetes in a centralized secrets manager. AWS Secrets Manager, Infisical, HashiCorp Vault, GCP Secrets Manager, whatever your organization has standardized on. The secret is managed, versioned, rotated, and audited in that external system, Kubernetes consumes it.

Now, not all external secrets managers are created the same in terms of what you get out of the box. Cloud native options like AWS Secrets Manager or Azure Key Vault give you secure storage. But a lot of the platform level capabilities, fine grained RBAC, audit logging across environments, versioning, automatic rotation, these require significant additional configuration.

Purpose-built secrets managers like Infisical tend to ship with those things built in. Audit trails, access controls, versioning, approval workflows. So there's meaningfully less assembly and overhead required. That distinction matters when you're evaluating total cost of adoption, not just can this store a secret.

How Secrets Get Delivered

Either way, once you've chosen an external source of truth, there's an important decision. How do those secrets actually get delivered to your workloads?

Syncing: The most common pattern in Kubernetes is syncing. An operator like the External Secrets Operator pulls secrets from that external secrets manager and writes them to native Kubernetes secrets. Pods consume them through environment variables the same as always. The secret does end up in etcd, but it's not the source of truth, it's just a synced copy.

Direct Injection: The alternative is direct injection, using something like a Secret Store CSI driver or a sidecar agent to mount secrets directly into the pod at runtime without ever writing to a Kubernetes secret object. The secret value goes from your external secrets manager directly into the pod's file system or memory without ever touching etcd.

Now, which of these approaches you choose will depend partly on your security posture and partly on how your apps actually consume secrets. But either way, direct injection comes with operational tradeoffs. Debugging is harder, disaster recovery workflows won't capture those secrets. And in my experience, it demands a more mature platform team to run reliably.

Either way, the paradigm is the same. The external system is the authority. You're making the same decision about where the truth lives and a delivery mechanism is a decision within that.

So those are our two paradigms. Same basic outcome, the pod gets the secret. But very different answers to the questions that matter 12 months from now.

Six Dimensions for Evaluating Your Approach

Okay, so let's now shift from describing these approaches to evaluating them. Because the question is never which one's best. It's which one's best for us now, given where we're at and given where we're planning to be in 12 to 24 months.

I want to walk through six dimensions that I think matter for platform teams making these decisions.

1. Rotation

How do secrets get rotated and who initiates it? If your current strategy requires a human to update a YAML file and push a deployment, that's not really a rotation strategy. You want to understand, can a source system rotate the credential and have it propagate to clusters without redeployment? And what's the exposure window? How long does a compromised credential stay live?

2. Auditability

Can you answer right now who accessed this secret or what accessed this secret, when and from where? Can you tell me what humans and machines currently have access to it? Kubernetes audit logs capture API level access, but that's the consumption layer. If your compliance posture requires you to demonstrate access controls and audit trails at the source, the audit boundary needs to live wherever the source of truth lives.

3. Incident Response

A credential leaks at 2:00 a.m. Walk through it. How do you know what's affected? How do you identify every cluster and workload that's using it? How fast can you revoke it? If your secrets are defined per cluster with no central registry, revocation becomes a scavenger hunt across namespaces and repos. If they live in a central system, you can revoke once and propagation handles the rest. Assuming your sync layer is reliable and fast enough.

4. Multi-Cluster Governance

Many of the platform teams I talk to are not running one cluster. They're running several, sometimes dozens. So how does your secrets pattern scale across those? Can you enforce naming conventions, access policies, and rotation requirements centrally, or does every cluster become its own island with its own secret sprawl?

5. Developer Experience

One that gets underestimated consistently. If your secrets experience requires devs to understand multiple different tools, or if they need to file a ticket and wait multiple days for secrets to get provisioned, they're going to find other ways. They may hard code values, share secrets in Slack DMs. The golden path has to be the path of least resistance and if it's not, it's not really a golden path, it's just a suggestion.

6. Blast Radius

Every approach has a failure mode. If your secrets manager goes down, what breaks? If etcd gets compromised, what's exposed? If a pod's temp FS is readable, what can be extracted? The question isn't can this fail because everything can fail. The question is whether you've mapped those failure modes and whether the blast radius is appropriate given your risk tolerance.

Practical Guidance

So let me offer some practical directional guidance based on these tradeoffs. I'm not going to tell you what to pick because your context matters way more than any general recommendation, but I can share patterns that we see work.

If you're a small team early in your platform journey, single cluster or maybe a small handful. Kubernetes native secrets with encryption at rest enabled or sealed secrets in your GitOps repo, that's a reasonable foundation. You don't have to over engineer for a scale that you haven't reached.

If you're operating multiple clusters, onboarding multiple teams or facing tight compliance requirements, an external secrets manager as a source of truth starts to earn its complexity cost. This is where centralized platforms provide real value, whether that's Infisical, Vault, a cloud native secrets manager. Because the governance and auditability capabilities are genuinely difficult to replicate with cluster native patterns alone.

And at this stage, high availability becomes a real consideration. Your secrets manager is now a dependency for deployments across multiple clusters. So you need to think about redundancy, failover, what happens when that system's not reachable. Some platforms handle HA natively, others require you to configure it yourself. And that's also worth factoring into your evaluation.

And regardless of which approach you choose, I really want this to land with this talk. The worst secrets pattern isn't necessarily the wrong one, it's having five of them. Inconsistency is the real risk. When multiple teams have multiple different approaches, you can't audit reliably, you can't rotate reliably, and every pattern multiplies your operational surface and subsequently your cognitive load.

Platform teams don't just pick tools, they pick defaults. Your job is to find a golden path that's good enough, consistently applied, and appropriate for your current maturity, and then make that the easiest thing to do.

Six Questions to Take Back to Your Team

All right, so let me close with something concrete that you can take back to your team. Six questions. If we can answer all six clearly, we have a secrets architecture. If we can't, we have a secret problem and now might be a good time to fix that.

1. What are we optimizing for?

What's our burning pain point right now? Is it that rotation is completely manual? Is it that we can't produce an audit trail when compliance comes knocking? Are developers waiting days for secrets to get provisioned? Start with what's hurting you today because that'll shape what you prioritize first and how you evaluate options.

2. What does our environment currently look like?

Are we running multiple clouds, multiple clusters? Is our platform highly sensitive to downtime? If you're spanning clouds or clusters, you probably need unified observability and centralized control over secrets. If uptime is critical, you need to consider whether you want a managed dependency or something you operate yourself.

3. What is our current platform maturity?

Here we have to be honest because a pattern that works beautifully for a team with dedicated platform SREs, mature GitOps pipelines is probably not going to work as well for a team that's still getting Helm charts under control.

4. Where should secrets originate?

Is the cluster the source of truth or an external system? And your answer to the first three questions are really going to inform this one, your pain points, infrastructure complexity, maturity. And this is the most consequential architectural decision in the space. And we want it to be the result of deliberate reasoning, not something that we just fell into.

5. What is our rotation model?

Can we describe it in one sentence?

6. What does incident response look like?

A credential gets leaked, who gets paged, what's the run book? How long until we've identified every affected workload and revoked access? If your secrets architecture can't support that workflow, it isn't finished. This is the question that stress tests everything else.

Closing Thoughts

Look, I think the reason secrets management ends up being so messy in many organizations is that it never feels urgent enough to get ahead of. Nobody files a P1 that says, we need a secrets architecture. The pressure always comes from somewhere else. A compliance deadline, an incident, a team that's blocked. And in that moment of pressure, the temptation is to solve the immediate problem. Add another tool, create another exception, ship it and move on.

But every exception becomes a precedent and every precedent becomes a pattern. And that's how you end up with five mechanisms and no strategy.

So the invitation is simple. Treat secrets management with the same architectural seriousness you give your service mesh, your observability pipeline, your cluster provisioning strategy. Bring it to your architecture review, write it down, make it a platform default. And then revisit it as your maturity evolves.

Because the best secrets architecture isn't the most sophisticated one. It's the one your team can operate consistently, reason about clearly, and evolve deliberately over time.

Thanks for watching. Again, I'm Jake. You can find me and the Infisical team online if you want to keep the conversation going.

Starting with Infisical is simple, fast, and free.

Get Started Get a demo