Architecture¶
This page explains how sikifanso sets up and manages Kubernetes clusters for AI agent infrastructure.
Directory structure¶
Each cluster's state lives under ~/.sikifanso/clusters/<name>/:
~/.sikifanso/clusters/<name>/
+-- session.yaml # Cluster metadata, credentials, ports
+-- gitops/ # Local git repo (mounted into cluster)
+-- bootstrap/
| +-- root-app.yaml # ApplicationSet for custom apps
| +-- root-catalog.yaml # ApplicationSet for catalog apps
+-- apps/ # Custom user-supplied Helm apps
| +-- coordinates/
| | +-- <app>.yaml # Helm chart coordinates (repo, chart, version, namespace)
| +-- values/
| +-- <app>.yaml # Helm values overrides
+-- catalog/ # AI agent infrastructure catalog
| +-- <app>.yaml # App definition with enabled flag
| +-- values/
| +-- <app>.yaml # Helm values overrides
+-- agents/ # Agent sandbox definitions
| +-- <agent>.yaml # Agent definition (name, quotas)
| +-- values/
| +-- <agent>.yaml # Helm values for agent-template chart
+-- infra/ # Infrastructure config overrides
+-- *.yaml # Deep-merged with compiled-in defaults
How the local gitops repo works¶
The gitops/ directory is a regular git repository on your filesystem. During cluster creation, it is scaffolded from a bootstrap template repo.
This directory is mounted into the k3d cluster at /local-gitops via a hostPath volume. ArgoCD's repo-server reads from it directly -- no remote git server needed.
AI Agent Infrastructure Catalog¶
The catalog contains curated tools organized by function:
| Category | Tools | Purpose |
|---|---|---|
| gateway | LiteLLM Proxy | LLM API routing, cost tracking, multi-provider support |
| observability | Langfuse, Prometheus+Grafana, Loki, Tempo | LLM tracing, metrics, logs, distributed tracing |
| guardrails | Guardrails AI, NeMo Guardrails, Presidio | Output validation, safety rails, PII redaction |
| rag | Qdrant, Text Embeddings Inference, Unstructured | Vector storage, embeddings, document parsing |
| runtime | Temporal, External Secrets, OPA | Workflow orchestration, secrets management, policy |
| models | Ollama | Local LLM inference and model management |
| storage | PostgreSQL, Valkey (Redis) | Supporting data stores for other tools |
Each tool is a self-contained YAML file with Helm chart coordinates and an enabled flag. ArgoCD's ApplicationSet deploys only enabled entries.
Root ApplicationSets¶
The bootstrap template includes two ApplicationSet manifests that manage two tracks of apps:
Custom apps (root-app.yaml)¶
Uses the git file generator to watch apps/coordinates/*.yaml. Each coordinate file defines a Helm chart source:
name: podinfo
repoURL: https://stefanprodan.github.io/podinfo
chart: podinfo
targetRevision: 6.10.1
namespace: podinfo
The ApplicationSet creates a multi-source ArgoCD Application for every matching coordinate file, pairing it with the corresponding values file at apps/values/<name>.yaml. Adding or removing a coordinate file is all it takes to deploy or undeploy.
Catalog apps (root-catalog.yaml)¶
Uses the git file generator to watch catalog/*.yaml, but only generates Applications for entries where enabled: true. Each catalog entry has additional metadata:
name: litellm-proxy
category: gateway
description: LLM API gateway with multi-provider routing, cost tracking, and rate limiting
repoURL: https://litellm.github.io/helm-charts
chart: litellm
targetRevision: "0.2.1"
namespace: gateway
enabled: false
Setting enabled: true and committing causes ArgoCD to create and sync the Application. Setting it back to false causes ArgoCD to prune/delete it. Values overrides live at catalog/values/<name>.yaml.
How app sync works¶
ArgoCD's default reconciliation interval is 180 seconds. The sikifanso app sync command bypasses this by sending a webhook push event (mimicking a GitHub push notification) to two endpoints:
- ArgoCD server -- invalidates the repo-server's git revision cache, causing it to re-read the local gitops repo
- ApplicationSet controller -- triggers immediate re-evaluation of the git generator, picking up new or removed coordinate files
The ApplicationSet controller webhook is reached via the Kubernetes API server proxy, so no extra ports are exposed.
Cluster components¶
When you run sikifanso cluster create, the following happens in order:
- k3d cluster is created as a single-node cluster running k3s v1.29. The flannel CNI and kube-proxy are disabled (Cilium replaces both).
- Cilium is installed via Helm as a full kube-proxy replacement with ingress controller and Hubble UI enabled.
- ArgoCD is installed via Helm, configured to use the hostPath-mounted gitops repo as its source.
- GitOps repo is cloned from the bootstrap template and both root ApplicationSets are applied.
- Ports are mapped from the cluster to your host (ArgoCD UI, Hubble UI).
State management¶
Cluster metadata is persisted to ~/.sikifanso/clusters/<name>/session.yaml and includes:
- Cluster state (running / stopped)
- ArgoCD URL, username, password
- Hubble UI URL
- GitOps repo path
- k3d configuration (image, node counts)
- Port mappings
- Bootstrap template URL and version
This file is read on every CLI command to locate and interact with the cluster.
Snapshot storage¶
Snapshots are stored at ~/.sikifanso/snapshots/<name>.tar.gz. Each archive contains the session metadata and the full gitops repo directory, allowing a cluster's configuration to be captured and restored independently of the running infrastructure.
Port allocation¶
Each cluster gets its own set of ports. Defaults are:
- 30080 -- ArgoCD UI
- 30081 -- Hubble UI
If defaults are taken by another cluster, sikifanso automatically finds free ports. Port assignments are stored in session.yaml.
The dashboard listens on :9090 by default, independent of cluster port mappings (configurable via --addr).
Profile system¶
Profiles are named presets that map to a set of catalog apps. When --profile is passed to cluster create, the CLI enables each app in the profile's set after the cluster is bootstrapped. Profiles are composable -- --profile agent-dev,rag takes the union of both app sets with no duplicates.
Profile definitions are compiled into the binary (in internal/profile/), not stored in the gitops repo. This means profiles are versioned with the CLI, and the same profile name always produces the same set of apps for a given CLI version.
Agent sandbox architecture¶
Agent sandboxes use the sikifanso-agent-template Helm chart to create isolated namespaces. Each sandbox includes:
- Namespace --
agent-<name>, dedicated to one agent workload - ResourceQuota -- limits CPU, memory, and pod count (configurable per-agent)
- NetworkPolicy -- Cilium-enforced rules: default-deny egress, allowlisted access to LiteLLM Proxy, Qdrant, PostgreSQL, and Valkey; no cross-agent traffic; no Kubernetes API access
- ServiceAccount -- dedicated identity for the agent workload
Agent definitions live in the gitops repo at agents/<name>.yaml with Helm values at agents/values/<name>.yaml. The root ApplicationSet picks up these files and creates ArgoCD Applications that deploy the agent-template chart.
MCP server architecture¶
The MCP server (internal/mcp/) exposes 25 tools across 6 categories: cluster management, catalog/profiles, agents, ArgoCD, Kubernetes, and health. It uses the Model Context Protocol Go SDK with stdio transport.
Each MCP tool calls the same internal functions that the CLI commands use -- there are no separate code paths or elevated privileges. The server is designed to be launched as a subprocess by MCP clients (Claude Code, Claude Desktop, Cursor, etc.) via sikifanso mcp serve.
Infrastructure config¶
Infrastructure configuration (internal/infraconfig/) controls the versions and settings of cluster components (Cilium, ArgoCD). The system uses a deep-merge approach:
- Compiled-in defaults (
internal/infraconfig/defaults/) -- shipped with the binary - User overrides (
gitops/infra/*.yaml) -- optional per-cluster customizations
Users only need to specify the fields they want to change. The merge produces the final configuration used during cluster creation and upgrades.