53 Commits

Author SHA1 Message Date
claude
17e55ae0c9 fix: allow Kyverno egress to Gitea external for registry token exchange
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 9s
After changing Gitea ROOT_URL to https://git.georgepet.duckdns.org,
the registry V2 auth challenge redirects to the external URL.
Kyverno needs to reach 185.47.204.231:443 for token exchange.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 21:29:00 +01:00
claude
d670d880af fix: allow ingress-nginx egress to port 3000 (Gitea)
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 2s
PR Checks / Validate & Security Scan (pull_request) Successful in 10s
Required for git.georgepet.duckdns.org ingress to reach Gitea backend.
2026-02-24 20:18:44 +01:00
claude
b9a84c674f feat: expose Gitea externally at git.georgepet.duckdns.org
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 3s
PR Checks / Validate & Security Scan (pull_request) Successful in 10s
Service+Endpoints pointing to 10.10.10.1:3000, Ingress with TLS.
Phase 22: Git-based PaaS deploy pipeline.
2026-02-24 20:09:28 +01:00
claude
ddc3def7c4 feat: rename naas-portal to paas-portal across all resources
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 2s
PR Checks / Validate & Security Scan (pull_request) Successful in 13s
- Helm chart: charts/naas-portal → charts/paas-portal
- ArgoCD app: naas-portal → paas-portal
- Environment values: naas-portal → paas-portal
- ClusterRole: naas-manager → paas-manager (operational-rbac)
- Tenant labels: naas.georgepet.duckdns.org → paas.georgepet.duckdns.org
- Secret: naas-portal-secrets → paas-portal-secrets
- Image: claude/naas-portal → claude/paas-portal
2026-02-24 18:24:21 +01:00
Claude
3dc6b0dd68 phase19: cleanup — remove unused ArgoCD apps, convert arch-docs to Deployment
Remove components not needed for PaaS-focused infrastructure:
- argo-rollouts: only used by arch-docs canary, convert to plain Deployment
- oauth2-proxy: was for dev/staging auth (removed in Phase 18)
- nginx-test: test deployment, not needed
- kube-bench: CIS benchmark scanner, not needed for PaaS
- trivy-operator: vulnerability scanner, not needed for PaaS
- drift-check RBAC: drift-check service being removed

arch-docs-prod: rollout.enabled=false → Helm uses Deployment template
2026-02-24 10:40:13 +01:00
08f4f56a21 feat: expand naas-manager RBAC for PaaS (deployments, services, ingresses, pod logs) 2026-02-24 06:56:03 +01:00
5d7051cec7 cleanup: remove oauth2-proxy-ingress.yaml 2026-02-24 06:52:02 +01:00
80ce5ba4fd refactor: keep only prod namespace in manifests.yaml 2026-02-24 06:51:58 +01:00
dd9e60c6fd fix: allow K8s API server to reach ingress-nginx admission webhook 2026-02-23 22:20:18 +01:00
136669db84 feat(netpol): add ports 8085/8089 egress and explicit ingress sources for ingress-nginx 2026-02-23 18:17:35 +01:00
b27c9ff252 feat(netpol): add HTTP-01 self-check port 80 to cert-manager controller egress 2026-02-23 18:16:54 +01:00
e44279dbd4 feat: add NaaS tenant-namespace Helm chart + test tenant t1 2026-02-23 13:32:28 +01:00
Claude
f21106dbad rbac: extend k8s-audit SA for deep-audit script
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 11s
Add PVC, ArgoCD Application, and Deployment read access
so deep-audit.sh can use least-privilege k8s-audit SA
instead of admin-emergency kubeconfig.
2026-02-23 10:28:45 +01:00
root
65930ceb1e sec: remove plaintext passwords from realm ConfigMap
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 10s
Use keycloak-config-cli env var substitution $(env:VAR_NAME) to inject
user passwords from K8s Secret instead of hardcoding them in ConfigMap.

- realm-configmap.yaml: passwords replaced with $(env:KC_INFRA_ADMIN_PASSWORD)
  and $(env:KC_INFRA_CLAUDE_PASSWORD)
- keycloak ArgoCD app: added keycloakConfigCli.extraEnvVarsSecret
- Secrets sourced from OpenBao via create-keycloak-secrets.sh

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 13:24:44 +01:00
root
08b0c41f45 fix: add kubernetes OIDC client + direct-grant-no-otp flow to realm config
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 5s
PR Checks / Validate & Security Scan (pull_request) Successful in 10s
The kubernetes client (Phase 15) and direct-grant-no-otp auth flow were
created via API but missing from realm-configmap.yaml. A realm re-import
would lose these configurations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 12:53:50 +01:00
root
9acb62e515 chore: remove report-generator from all environments
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 8s
Report-generator was a load testing application. Decommissioning:
- Remove ArgoCD app definitions (6 apps)
- Remove infra manifests (networkpolicy, secrets, seed-jobs)
- Remove Helm values (dev/staging/prod)

K8s resources already deleted via ArgoCD cascade delete.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 09:43:02 +01:00
Claude
c25bc6c9ce refactor: remove MinIO from all environments
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 2s
PR Checks / Validate & Security Scan (pull_request) Successful in 10s
- Remove MINIO_* env vars from dev/staging/prod values
- Remove minio-access-key and minio-secret-key from secrets
- Remove port 9000 from NetworkPolicy egress rules
- PDF stored in PostgreSQL BYTEA, MinIO no longer needed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 17:57:32 +01:00
Claude
2881b388c4 chore: increase resource quotas for VM migration
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 2s
PR Checks / Validate & Security Scan (pull_request) Successful in 9s
PG/MinIO removed from K8s, report-generator needs more CPU for
in-Go aggregation. Prod quota supports HPA up to 5 replicas.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 11:04:01 +01:00
Claude
67b69c31d5 feat: move PG/MinIO to external VM, update all manifests
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 2s
PR Checks / Validate & Security Scan (pull_request) Successful in 9s
- Delete postgresql.yaml and minio.yaml (6 files) — stateful pods removed
- NetworkPolicy: replace podSelector with ipBlock 185.47.204.228/32
- Secrets: update credentials for VM PostgreSQL and MinIO
- Values: point DB_HOST/MINIO_ENDPOINT to VM, increase resources for CPU-intensive workload
- Seed jobs: v3 targeting VM databases (reports_dev/staging/prod)
- Prod: HPA 1-5 replicas, CPU req 1/lim 4, mem req 1Gi/lim 4Gi

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 10:44:00 +01:00
Claude
56d395864b fix: rewrite seed job to avoid PL/pgSQL $$ blocks
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 9s
The $$ dollar-quoting in PL/pgSQL DO blocks gets expanded by bash
when passed via Kubernetes args to bash -c. Rewrites seed to use
individual psql -c calls and shell loop for batch inserts.

Also adds CREATE TABLE IF NOT EXISTS and idempotency check.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 00:04:06 +01:00
Claude
317f371542 fix: correct NetworkPolicy labels for report-generator pods
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 8s
web-app chart uses Release.Name as app.kubernetes.io/name, so labels are
report-generator-dev/staging/prod, not just report-generator.
2026-02-19 23:48:13 +01:00
Claude
247beaca76 feat: add report-generator app (Go + PostgreSQL + MinIO) for load testing
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 9s
- 6 ArgoCD apps (API + infra for dev/staging/prod)
- PostgreSQL StatefulSet + MinIO Deployment per namespace
- NetworkPolicies for app-to-db and app-to-minio
- Seed Job (5M orders, 100K customers, 10K products)
- HPA enabled in prod (2-5 replicas, 70% CPU target)
- Helm values with path-based ingress /reports on existing hosts
2026-02-19 23:40:34 +01:00
Claude
04be7fa15f fix(keycloak): proper passwords in realm config + remove forced TOTP
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 3s
PR Checks / Validate & Security Scan (pull_request) Successful in 10s
Root cause of recurring login failures:
- Password 'changeme' didn't meet realm password policy (12+ chars, digits, special)
- keycloak-config-cli failed HTTP 400 on every pod restart
- Failed state meant config-cli retried full import every restart
- requiredActions re-added CONFIGURE_TOTP on every restart

Fixes:
- Set proper passwords meeting password policy requirements
- Set temporary: false (no forced password change)
- Clear requiredActions on user level (realm defaultAction handles new users)
- Config-cli should now succeed and save state, preventing re-import loops

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 17:48:26 +01:00
Claude
944f00f23c fix: PostgreSQL NodePort in valid range (32432)
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 8s
NodePort 35432 was outside K8s valid range (30000-32767).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 15:54:49 +01:00
Claude
f71c583d69 Phase 16: fine-grained RBAC (infra-operators) + DB rotation prep
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 8s
- Add infra-operators group to Keycloak realm
- Add K8s RBAC: operators get full CRUD in dev/staging, readonly in prod,
  cluster-level readonly for nodes/namespaces/storage, no infra ns access
- Update ArgoCD RBAC: operators → role:readonly
- Update oauth2-proxy: allow infra-operators group
- Add PostgreSQL NodePort (35432) for OpenBao Database engine access
- Update NetworkPolicy: allow NodePort traffic from node CIDR
- Extend keycloak-secrets-manager Role: statefulset get/patch for rotation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 15:33:23 +01:00
root
ebf830bee1 Add OIDC RBAC for Keycloak groups (Phase 15)
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 3s
PR Checks / Validate & Security Scan (pull_request) Successful in 10s
2026-02-19 14:01:36 +01:00
root
dbe72075fb fix: allow ingress-nginx egress to oauth2-proxy port 4180
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 7s
PR Checks / Validate & Security Scan (pull_request) Successful in 17s
Ingress controller needs to reach oauth2-proxy for auth_request
subrequests on dev/staging arch-docs. Port 4180 was missing from
the egress rules, causing timeout on all auth-protected routes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 11:29:33 +01:00
root
3992d69c8e feat: switch Kyverno image verification to Enforce mode
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 8s
All current images in dev/staging/prod are signed with cosign.
CI pipeline signs new images automatically.
Enforce mode will block unsigned images from our registry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 10:11:26 +01:00
root
3821b508f9 fix: add gitea DNS resolution for Kyverno signature verification
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 7s
Gitea registry (ROOT_URL=http://gitea:3000) redirects V2 token auth
to http://gitea:3000/v2/token. K8s pods can't resolve 'gitea' Docker
hostname. This Service+Endpoints maps gitea to 10.10.10.1 in kyverno ns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 09:49:07 +01:00
root
f8f27657f1 fix: skip Rekor tlog and SCT verification in Kyverno policy
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 10s
Private infrastructure has no internet access from K8s nodes.
Kyverno was failing to verify signatures because it tried to
fetch Rekor TUF root from tuf-repo-cdn.sigstore.dev (timeout).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 09:39:08 +01:00
root
710be91b06 fix: set mutateDigest=false for Kyverno Audit mode policy
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 7s
2026-02-18 06:23:22 +01:00
root
4188d1dd6f feat: add Kyverno admission controller + cosign image verification
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 7s
- Deploy Kyverno v1.13.4 (chart 3.3.4) via ArgoCD Helm chart
- Add ClusterPolicy to verify cosign signatures on registry images (Audit mode)
- Add NetworkPolicy for kyverno namespace (default-deny + selective allow)
- Extend keycloak-secrets-manager RBAC to kyverno namespace for cosign key sync
- ArgoCD Application for kyverno-policies directory
2026-02-18 06:06:07 +01:00
root
b7ee0875b8 feat: add NetworkPolicy for cert-manager and ingress-nginx
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 2s
PR Checks / Validate & Security Scan (pull_request) Successful in 8s
Default-deny + selective allow policies:
- cert-manager: DNS, K8s API, ACME HTTPS, webhook ingress, Prometheus scrape
- ingress-nginx: DNS, K8s API, external HTTP/HTTPS, backend forwarding

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 21:47:50 +01:00
root
1ab66afa5f feat: add operational RBAC — scoped ServiceAccounts for scripts
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 4s
PR Checks / Validate & Security Scan (pull_request) Successful in 8s
Create least-privilege ServiceAccounts for k8s-audit, drift-check,
and keycloak-secrets-manager instead of sharing admin kubeconfig.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 21:12:02 +01:00
root
cc8dac59e8 fix(keycloak): use HmacSHA1 for TOTP (Google Authenticator compatible)
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 7s
HmacSHA256 is not supported by Google Authenticator.
SHA1 is the standard TOTP algorithm per RFC 6238.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 12:24:03 +01:00
root
239102bdfe feat(keycloak): enable mandatory TOTP MFA for all users
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 7s
- Configure OTP policy: TOTP, SHA256, 6 digits, 30s period
- Add CONFIGURE_TOTP as default required action for new users
- Force TOTP enrollment for existing users (admin, claude)
- Add password policy: min 12 chars, mixed case, digits, special chars
- Keycloak browser flow conditional OTP will enforce TOTP on every login

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 11:44:05 +01:00
root
efd636480b fix: pod-cleanup image bitnami/kubectl:1.35 → bitnamilegacy/kubectl:1.33
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 6s
bitnami removed Docker Hub images (Aug 2025). The bitnami/kubectl:1.35 tag
does not exist, causing ImagePullBackOff. Switch to bitnamilegacy/kubectl:1.33
which is the latest available tag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 10:28:34 +01:00
root
623bb9aee7 fix: oauth2-proxy ingress port 4180 → 80 (service port)
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Passed
Service exposes port 80 (→ targetPort 4180), ingress must use 80.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 21:49:42 +01:00
root
0aba0e7a87 feat(keycloak): move to localhost:30880 via SSH tunnel
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 8s
- Disable external ingress, add NodePort 30880
- Set KC_HOSTNAME=127.0.0.1:30880 (fixed issuer for OIDC)
- oauth2-proxy: skip-oidc-discovery + explicit K8s internal URLs
- ArgoCD: remove OIDC (already behind SSH tunnel, will add Dex later)
- Realm: sslRequired=none for HTTP access via tunnel

Access: user SSH tunnel → localhost:30880 → K8s NodePort → Keycloak

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 21:37:42 +01:00
root
8ff10f8cdb fix(keycloak): remove 'openid' from defaultDefaultClientScopes
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 2s
PR Checks / Validate & Security Scan (pull_request) Successful in 7s
In Keycloak 26, 'openid' is implicit in OIDC protocol and doesn't
exist as a named client scope. This caused config-cli import failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 20:48:48 +01:00
root
2277d3592d feat: add Keycloak SSO + oauth2-proxy + ArgoCD OIDC config
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 7s
- Keycloak (Bitnami Helm chart) with PostgreSQL on Longhorn
- oauth2-proxy for arch-docs dev/staging auth
- ArgoCD OIDC integration via ConfigMap
- Realm 'infrastructure': users admin/claude, groups infra-admins/infra-bots
- 4 OIDC clients: grafana, argocd, gitea, oauth2-proxy
- NetworkPolicy: default-deny + selective allow
- oauth2-proxy ingress for dev/staging subdomains
2026-02-16 19:48:43 +01:00
root
21f5794851 feat: Longhorn S3 backup to MinIO (daily, retain 7)
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 2s
PR Checks / Validate & Security Scan (pull_request) Successful in 6s
2026-02-16 16:19:24 +01:00
root
f9a71e2fec fix: update pod-cleanup kubectl image to 1.35
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 7s
Match K8s cluster version (v1.35.1). The old bitnami/kubectl:1.31
image is no longer available in Docker Hub.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 15:11:23 +01:00
root
4a5a657e14 Remove visual test infrastructure
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 2s
PR Checks / Validate & Security Scan (pull_request) Successful in 11s
Remove visual-test-egress NetworkPolicy, allowVisualTest Helm flag,
and staging override. Visual testing proved ineffective at detecting
diagram rendering issues.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 19:02:47 +01:00
root
dc9f801e13 feat: Add visual-test NetworkPolicy support for AI screenshot testing
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 6s
- Helm chart: add allowVisualTest flag to enable ingress from visual-test pods
- Staging: enable allowVisualTest for arch-docs
- Namespaces: add visual-test-egress NetworkPolicy in staging
  (allows egress to app pods on 8080 + external HTTPS for OpenRouter/Telegram)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 15:06:33 +01:00
root
3ce69b7892 feat: Add pod-cleanup CronJob + ArgoCD app (Phase 8.4)
Some checks failed
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Failing after 5s
Daily cleanup of completed/failed/evicted pods at 03:00 UTC.
Runs on master node with proper RBAC (ServiceAccount + ClusterRole).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 19:57:41 +01:00
root
ac17a06fa7 feat: add Trivy Operator + kube-bench CronJob (Phase 8.3)
All checks were successful
AI Review / AI Code Review (pull_request) Successful in 1s
PR Checks / Validate & Security Scan (pull_request) Successful in 12s
Trivy Operator: continuous vulnerability scanning of running images
kube-bench: weekly CIS benchmark on control plane node

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 19:21:28 +01:00
root
6cf51236dc Add NetworkPolicy: allow nginx-ingress + cert-manager HTTP-01 solver
- allow-ingress-from-nginx: permit traffic from ingress-nginx namespace
- allow-cert-manager-http01: permit solver pod ingress (8089) and egress
- Applied to dev, staging, prod namespaces
2026-02-14 14:32:25 +01:00
601bfac348 Add dev/staging/prod namespaces with ResourceQuota + NetworkPolicy 2026-02-14 12:44:11 +01:00
d137b5bdf2 Enterprise: NetworkPolicy default-deny + LimitRange + ResourceQuota 2026-02-14 10:09:08 +01:00