Skip to main content

System Architecture

This chapter maps the Cirrus CDN system architecture end-to-end, covering deployment topology, component responsibilities, runtime dependencies, and data flows between subsystems. The descriptions below are derived directly from the source modules, including control-plane/src/cirrus/app.py, control-plane/src/cirrus/celery_app.py, control-plane/src/cirrus/cname/, and openresty/.

High‑Level Architecture

Redis acts as the central state store and pub/sub bus for all tiers. OpenResty reads per-domain configuration (cdn:dom:{domain}) and TLS assets (cdn:cert:{domain}) from Redis and subscribes to cdn:purge for cache invalidation.

Control vs Data Plane

Control PlaneData Plane
Configuration store (Redis)Edge cache and routing
Pub/sub + task queue (Redis + Celery)Health checks and self-heal
DNS scheduler/zone builderLogs and Prometheus metrics exporter

High Availability

  • Multi-region active-active with independent edge nodes and DNS replicas
  • Zero-downtime rollouts via templated Compose/Ansible (blue/green or rolling)
  • Automated failover: NOTIFY-driven secondaries and health-aware node activation
  • Self-healing loops: Celery health checks adjust active state and trigger zone rebuilds

Deployment Topology

docker-compose.yml orchestrates services:

  • redis: Primary data store with AOF persistence.
  • api: FastAPI server (serves REST + static Next.js export).
  • openresty: Edge proxy on host network exposing 8080/8443; Prometheus on 9145.
  • worker and beat: Celery worker/beat using Redis broker/backend.
  • caddy: Local ACME directory (development CA).
  • acmedns: acme-dns authority for DNS-01 challenges.
  • nsd: Authoritative DNS slave receiving NOTIFY from hidden master.
  • prometheus, grafana, fakedns: ops support.

Services api, worker, beat, caddy, acmedns, and nsd share the acmetest bridge network with static IPs for predictable trust and routing.

Build & Runtime Artefacts

Python & Frontend Image (Dockerfile)

  • Stage 1 builds the Next.js frontend with pnpm build, caching .next artifacts.
  • Stage 2 (ghcr.io/astral-sh/uv:python3.13-trixie-slim) installs Python deps via uv sync (respects uv.lock), bundles the static export under /app/static, and uses docker/entrypoint.sh to patch CA trust and launch Uvicorn.

OpenResty Image (openresty/Dockerfile)

  • Builder installs lua-resty-http, nginx-lua-prometheus, and compiles nginx_cache_multipurge.
  • Templater renders nginx.conf from openresty/conf/nginx.conf.j2 (ports, resolver, cache sizing, metrics ACL).
  • Runtime seeds a dummy TLS cert and loads Lua: access_router.lua (routing/caching), ssl_loader.lua (SNI-time cert load from Redis), redis_subscriber.lua (purge listener).

Prometheus Image (prometheus/Dockerfile)

Build selects prometheus.dev.yml or prometheus.prod.yml; default scrape includes OpenResty at 127.0.0.1:9145 and FastAPI /metrics.

Configuration & State Management

Redis is the single source of truth. Key namespaces include:

Key PatternPurpose
cdn:domains (set)All managed domain names.
cdn:dom:{domain}JSON encoded domain configuration (DomainConf).
cdn:nodes (set)Known edge node IDs.
cdn:node:{id}Hash containing node IPs, health counters, and active flag.
cdn:cert:{domain} (hash)TLS fullchain and private key for SNI load in OpenResty.
cdn:acme:{domain}ACME registration state, including acme-dns credentials.
cdn:acme:lock:{domain} / cdn:acme:task:{domain}Concurrency locks for issuance tasks.
cdn:tokens, cdn:token:{id}, cdn:token_hash:{hash}Service token registry and lookups.
cdn:acmeacct:globalShared ACME account key material.
cdn:acmecertkey:{domain}Stored certificate private keys for reuse across renewals.

Pub/sub channels include cdn:cname:dirty (for DNS zone rebuilds) and cdn:purge (cache invalidation).

Data Flows

DNS & Traffic Engineering

  • Hidden master (HiddenMasterServer) serves authoritative responses and supports AXFR to NSD secondaries.
  • Zone generation picks replicas_per_site nodes per domain using rendezvous hashing (rendezvous_topk).
  • OpenResty routes per request via access_router.lua using domain config from Redis; optional slice caching and rule-based cache controls.

Dependencies & Integrations

  • Redis (redis.asyncio in Python, resty.redis in Lua) – configuration store, cache, and pub/sub.
  • Celery – asynchronous task execution with Redis broker/backend; periodic health checks and ACME renewal scans.
  • acme-dns – DNS-01 validation authority for ACME issuance.
  • Caddy – development-only ACME CA; workers trust it via entrypoint CA bundle.
  • NSD – authoritative DNS secondaries receiving NOTIFY from HiddenMasterServer.
  • Prometheus & Grafana – metrics scraping and visualization for API and OpenResty.

Integration Interfaces (Third‑party)

  • REST API – All control interactions are exposed under /api/v1/* with cookie sessions or bearer service tokens. See Control Plane API & Data Model for endpoints and schemas. The default FastAPI docs are available at /docs when enabled.
  • Metrics – Prometheus scrapes FastAPI (/metrics) and OpenResty (9145/metrics). Use remote write or data source plugins to integrate with external monitoring/BI platforms.
  • Logs – OpenResty emits access/error logs to stdout/stderr and the Loki Docker logging driver forwards them. Query via Grafana/LogQL or export through the Loki API for downstream analysis.
  • Redis Pub/Sub – The edge layer subscribes to cdn:purge. External producers can publish purge events by writing JSON payloads {domain, path} to this channel (observe access controls).
  • DNS – Hidden master emits NOTIFY to NSD secondaries. External DNS stacks can slave from the hidden master using AXFR (allow list enforced in server settings).
  • ACME – Integrates with acme-dns for DNS‑01 and a local CA (Caddy) in development. Production environments can point ACME_DIRECTORY to a public CA.

Environments

The Docker composition targets local development; production deployments leverage Ansible playbooks referenced by the just deploy recipe (see ansible/). Environment variables (such as CNAME_BASE_DOMAIN, ACME_DIRECTORY, DNS_MASTER_PORT) must be tuned per environment. See the Appendices for a consolidated list.

Scalability Considerations

  • API/server processes are stateless aside from in-memory session cache; they can scale horizontally if the session mechanism is migrated to Redis.
  • OpenResty scales via additional nodes registered through the /api/v1/nodes API; rendezvous hashing keeps per-domain assignment stable under churn and avoids hot-spotting.
  • DNS hidden master remains single-instance; NSD can scale out for regional redundancy.
  • Redis is a critical dependency; consider managed or clustered (or multi-AZ) deployments for production workloads.
tip

Use just up for local orchestration, just pytest for backend tests, and pnpm dev inside control-plane/frontend/ for UI iteration. Prefer uv run for Python entry points to respect uv.lock.