Secure Kubernetes Secrets & Disaster Recovery with SOPS, GitOps & FluxCD
In my previous blog post, we explored how to securely expose services using the Cloudflare Operator and Cloudflared tunnels (previously known as Argo Tunnels). While external security is critical, securing and managing secrets inside your cluster is just as essential—and often overlooked.
Kubernetes secrets store sensitive information like API keys, tokens, passwords, and certificates. If mismanaged or exposed, these secrets can become entry points for attackers, leading to system compromise. To safeguard your infrastructure, secrets should be securely stored, encrypted, and recoverable in the event of disaster.
Let’s dive into how to achieve this, the old challenges, and how tools like SOPS and FluxCD solve them to create a resilient, disaster-proof Kubernetes setup.
The Old Way: Manually Creating Kubernetes Secrets
Previously, secrets were manually created using a command like this:
kubectl -n cloudflare-operator-system create secret generic cloudflare-secrets \
--from-literal CLOUDFLARE_API_TOKEN=<api-token> \
--from-literal CLOUDFLARE_API_KEY=<api-key>
This command creates a secret in the cloudflare-operator-system
namespace, enabling the Cloudflare Operator to authenticate with your account. However, managing secrets manually is risky and error-prone. If secrets are lost during cluster failure, redeploying workloads becomes difficult, especially when manual configurations are scattered.
The Nightmare Scenario: What If Disaster Strikes?
Imagine a careless command like this being executed:
# WARNING: NEVER RUN THIS COMMAND
sudo rm -rf /*
This infamous command recursively deletes the entire Linux file system, potentially wiping out Kubernetes nodes and critical files, leading to cluster failures. Without a recovery mechanism, you could be facing service downtime, lost configurations, and lost secrets.
But disaster recovery doesn’t have to be a nightmare—thanks to GitOps and FluxCD.
Chaos Recovery: Enter GitOps and FluxCD
When disaster strikes, GitOps principles combined with FluxCD can get you back on track quickly. FluxCD keeps Kubernetes configurations version-controlled in Git, making recovery as simple as:
- Setting up a clean Kubernetes environment.
- Reapplying configurations from the FluxCD Git repo.
- Letting FluxCD automatically restore services and deployments.
However, it’s important to note that re-provisioning your cluster is a prerequisite to this recovery process. This means you’ll need to reinstall the operating system on your nodes and ensure a clean Kubernetes setup is in place.
To make this process easier, we rely on an Ansible playbook that helps automate the cluster re-provisioning process—except for reinstalling the OS itself. Once the nodes have a fresh OS installed, the playbook can handle:
- Installing Kubernetes components.
- Configuring networking and node roles.
- Bootstrapping the cluster, making it ready for FluxCD to take over.
The Secrets Problem: Solved with SOPS
This is where SOPS (Secrets Operations) shines. SOPS allows you to encrypt secrets directly in Git repositories, ensuring they’re safe and recoverable during disaster recovery. Here’s why it’s a game-changer:
- Strong Encryption: Protects secrets using AES, AWS KMS, GCP KMS, or Azure Key Vault.
- Automated Decryption: FluxCD decrypts secrets automatically at deployment.
- Easy Cluster Migration: Secrets are stored in Git, making migration or scaling a breeze.
With SOPS, you’ll eliminate manual secret creation and streamline recovery, minimizing downtime during a cluster failure.
Persistent Data: The Final Piece of the Puzzle
While SOPS handles secrets, persistent data—like databases, logs, or locally stored files—requires its own protection. If persistent data is lost, even restored services may not function properly. Here’s my plan to address this:
- Integrate a Synology NAS as a Container Storage Interface (CSI): This will provide reliable external storage for workloads.
- Cloud Backups: Use services like Azure Storage Accounts, AWS S3, or Google Cloud Storage to back up critical data.
- Kubernetes Backup Tools: Tools like Velero can back up cluster states and persistent volumes.
By combining these approaches, you ensure that secrets and persistent data are both protected and recoverable.
Putting It All Together: A Resilient Kubernetes Environment
By combining GitOps, SOPS, and persistent storage solutions, we can create a robust Kubernetes setup that recovers quickly from failures and is portable across different environments. Here’s the workflow:
- Use FluxCD to reapply configurations and automatically redeploy workloads.
- Encrypt secrets with SOPS and restore them seamlessly during deployment.
- Protect persistent data using a Synology NAS, cloud storage, or backup tools like Velero.
This approach ensures that your cluster can recover with minimal manual intervention and that secrets and critical data are safeguarded against even the worst-case scenarios.
In future iterations, integrating network-attached storage and regular backups will further harden the setup, ensuring that your cluster is not only secure but also resilient to disaster and easily migratable to new environments.
Step-by-Step Guide: Securing Secrets with SOPS and FluxCD
Ready to get started? Here’s a step-by-step guide to encrypt and deploy secrets using SOPS.
Step 1: Install the Required Tools
Step 2: Generate an Age Key
Create a public and private key with the following command:
age-keygen -o age.agekey
It would result in a similar output:
Public key: age16ag8xf3zsmqy7jexq2030es20nsldyaa208ywpfv6n8anl478dtqyyccfc
Let's see what this file is all about:
➜ cat age.agekey
# created: 2025-02-08T08:44:27-06:00
# public key: age16ag8xf3zsmqy7jexq2030es20nsldyaa208ywpfv6n8anl478dtqyyccfc
AGE-SECRET-KEY-1MRRUWPANQ6GDKVGEKE6DD9RAF4EP5T09GUUN4EV0ZQKXKHCEGMES942WZ5
We have a public and private key. Before you try to decrypt the secrets of my FluxCD repo, know that this key is not my production key. In your case, you should take extra care of your private key—store it securely, such as in a password manager, to avoid unauthorized access.
Step 3: Create and Encrypt Secrets
- Generate the YAML manifest: First, let's create our Cloudflare secrets file with the following command:
kubectl -n cloudflare-operator-system create secret generic cloudflare-secrets --from-literal CLOUDFLARE_API_TOKEN=<YourApiToken> --from-literal CLOUDFLARE_API_KEY=<YourAPiKey> --dry-run=client -o yaml > cf-secret.yaml
This command will create that secret as a YAML file with the values already encoded. However, these secrets should not be committed to your Git repository in their current state. Let me show you why:
Here is our secret manifest file:
apiVersion: v1
data:
CLOUDFLARE_API_KEY: aGFja2VkIQ==
CLOUDFLARE_API_TOKEN: SW1fbm90X3RoYXRfc2VjcmV0
kind: Secret
metadata:
creationTimestamp: null
name: cloudflare-secrets
namespace: cloudflare-operator-system
The values in this YAML file are base64-encoded, not encrypted. Base64 encoding is easily reversible, meaning anyone with access to this file can decode your secrets using a simple command:
➜ yq '.data | to_entries | .[] | "\(.key): \(.value | @base64d)"' cf-secret-demo.yaml
"CLOUDFLARE_API_KEY: hacked!"
"CLOUDFLARE_API_TOKEN: Im_not_that_secret"
If you push this file to your Git repo, your secrets are exposed. Let's fix this by encrypting the secrets using SOPS.
Create a SOPS Config File
To simplify the encryption process, create a configuration file for SOPS. This allows you to define encryption rules, making future encryption tasks more compact. Create a .sops.yaml
file at /clusters/kthulu
:
creation_rules:
- path_regex: .*\.(yaml|yml)$
encrypted_regex: "^(data|stringData)$"
age: age16ag8xf3zsmqy7jexq2030es20nsldyaa208ywpfv6n8anl478dtqyyccfc
Explanation:
path_regex
: Matches YAML files that need encryption.encrypted_regex
: Specifies which fields (likedata
) should be encrypted.age
: Defines the public key for encryption.
Encrypt Your Secrets File
Use the configuration to encrypt the secret, first change directory to clusters/kthulu :
cd clusters/kthulu
sops --encrypt --in-place ../../../infrastructure/kthulu/cloudflare-operator/cf-secret.yaml
If we open the file again, it will look completely different:
apiVersion: v1
data:
CLOUDFLARE_API_KEY: ENC[AES256_GCM,data:J3IEb5jZzWDdVEP1ewBMpl6y0rFvtYTezxRZXA==,iv:yXNgh5SHAmk9X0bzYH9QZ9NT6M51nhN3QpgT2hxbTwk=,tag:mY3qC3Qo3+UNh7Dey29OsA==,type:str]
CLOUDFLARE_API_TOKEN: ENC[AES256_GCM,data:hkaHgQ/+roY4nzAZjK22R1qNb5y62RDAilOXpDXhOlzNy4oA0OsbXklj9w2HxMDuXfNp3pkuRFo=,iv:P0CE6GQWg8Cxak42aN9Ftf+YKpBNaDGyTFqPyaxmBp8=,tag:2VRvqpKs99muxkw4Zmzh5g==,type:str]
kind: Secret
metadata:
creationTimestamp: null
name: cloudflare-secrets
namespace: cloudflare-operator-system
sops:
kms: []
gcp_kms: []
azure_kv: []
hc_vault: []
age:
- recipient: age13h3sp2pzmc73qevth3ax2zq4ndzp2d9s7aqt79cc7l2lqycyzyls6nj52k
enc: |
-----BEGIN AGE ENCRYPTED FILE-----
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSA4WGlSblRzTDVqNncwWG9p
ZkFId3FUbFAvZzBudkZ2ajR5S2I4WTdDQUhBCmptZlpnRW0vdmtmaUh5czBWWUZU
YjlPMUdRY0c0WGpBcnJqcUpqSjI5UFkKLS0tIEp4RWorZkdCQ0xjaVg1cDQwMHZE
WGVTam9LN0JWSDh4SnJKZXRXMTV1TjAKmUt1ORCS5/t3R52SgpLfErboV1hyNrwu
BDXVw1z3zJhMD1yKbNiCk25WENyLxsbbMPmwpjX1Kdzqu00KXugeNA==
-----END AGE ENCRYPTED FILE-----
lastmodified: "2025-02-08T15:15:08Z"
mac: ENC[AES256_GCM,data:nJyWm0RZxzCP9yJb2UOHUwuKJXnbt05C+f8aaF7TTStzb5wJOflfb4Brc49i893pNd0ub77omaKQ0neM6ACy6NPH05dhH+MLxqdr1hprOnMxuuqD5D4Z0pNyAP9jJWP9wXDgc1oeOCBPMWKUpQFODI+1mI4H5kVQDqTtmF6Txio=,iv:jxzsuqZG98LoQvc349GlkLiCwW6th/hpHjdYcWi9sZE=,tag:UiLu0MBjjJyaoRoH2i1KNA==,type:str]
pgp: []
encrypted_regex: ^(data|stringData)$
version: 3.9.4
This file is now encrypted. Even if someone accesses it, they won't be able to decrypt the secrets without the private key. You can safely commit this file to your Git repository.
Step 4: Decrypting Secrets in the Cluster
We need to let FluxCD know that we have a private key and that it can be used to decrypt secrets. First, add the key to the cluster:
cat age.agekey |
kubectl create secret generic sops-age \
--namespace=flux-system \
--from-file=age.agekey=/dev/stdin
Now, tell FluxCD to use this secret for decryption by adding the following to clusters/kthulu/infrastructure.yaml
:
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: infrastructure
namespace: flux-system
spec:
interval: 10m0s
sourceRef:
kind: GitRepository
name: flux-system
path: ./infrastructure/kthulu
prune: true
wait: true
timeout: 5m0s
decryption:
provider: sops
secretRef:
name: sops-age
And add the cf-secret.yaml
file into your kustomization.yaml
file under infrastructure/khtulu/cloudflare-operator
:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: cloudflare-operator-system
resources:
- cf-secret.yaml
- ../../base/cloudflare-operator
- cluster-tunnel.yaml
Step 5: Testing!
First, delete the manually created secret:
kubectl delete -n cloudflare-operator-system secret/cloudflare-secrets
Then, commit and push your changes to the repository:
git add . && git commit -m "feat: Adding SOPS and first encrypted secret for Cloudflare operator" \
&& git push origin main
Wait a few minutes and verify that it was synced:
➜ kubectl get secrets -n cloudflare-operator-system
NAME TYPE DATA AGE
cloudflare-secrets Opaque 2 114s
k3s-cluster-tunnel Opaque 1 10d
Finally, delete the existing pods in the cloudflare-operator-system
to get them recreated and verify that everything is working by logging in to linkding
or any service exposed through Cloudflare tunnels.
Wrapping It Up: Building a Resilient, Secure Kubernetes Foundation
Kubernetes has transformed the way we manage applications, but without robust security and disaster recovery mechanisms, even the most sophisticated clusters can crumble. By leveraging GitOps, SOPS, and persistent storage solutions, you can achieve a Kubernetes environment that is not only secure but also resilient and scalable.
Here’s a quick recap of what we covered:
- Eliminate manual secret management with SOPS, encrypting secrets directly in Git and ensuring automatic decryption during deployment.
- Build a disaster recovery plan using GitOps principles with FluxCD to restore configurations and services efficiently.
- Safeguard persistent data with NAS systems, cloud backups, and Kubernetes backup tools like Velero.
With these strategies, your Kubernetes cluster won’t just survive failures—it will recover faster, migrate smoothly, and remain secure. Whether you’re scaling across regions or simply hardening your current setup, these tools provide the foundation for long-term reliability and growth.
As Kubernetes environments evolve, continuously refining backup and security practices will ensure that your infrastructure can withstand the unexpected while maintaining high availability. Stay tuned for more deep dives as we tackle advanced techniques in future posts! 🌐🚀