Skip to main content

Executive Overview

Cirrus CDN delivers a programmable content delivery network oriented toward operators who require tight control over origin routing, TLS automation, and operational telemetry. The control plane is built around FastAPI services, Redis-backed configuration, and a Next.js administration portal, while the data plane relies on OpenResty to serve and cache traffic. This chapter summarizes the mission, stakeholders, differentiators, and guiding principles that shape the platform.

Background & Goals

As global content delivery and edge computing rapidly expand, traditional CDN architectures face bottlenecks in latency, flexibility, and cost control. When dealing with multi-region traffic steering, multi-cloud ingress, security compliance, and intelligent operations, enterprises increasingly need a CDN control system that is self-hostable, extensible, and evolvable over time.

The Cirrus CDN control plane ("Cirrus") aims to provide a unified configuration center, a scheduling engine, and an observability layer; it supports cross-cloud and cross-region node governance to deliver highly available, highly autonomous content delivery.

Platform Differentiators

  • End-to-end automation: ACME issuance and renewals are baked into the control plane and executed via Celery workers, eliminating ad-hoc scripts.
  • Integrated DNS authority: The system synthesizes authoritative DNS zones (hidden master + NSD slave) to steer clients toward healthy edge nodes using rendezvous hashing.
  • Config-in-Redis: All mutable state—domains, nodes, certificates, purge queues—is stored in Redis. This provides atomic updates, pub/sub notifications, and easy inspection.
  • Composable frontend: The Next.js portal consumes the same REST APIs as external clients, ensuring equivalence between manual and automated workflows.
  • Observability-first: OpenResty emits Prometheus metrics, health checks update node status, and logs are rotated automatically; these are first-class citizens, not afterthoughts.

Guiding Principles

  1. Operational clarity: Each subsystem (API, DNS, edge, automation) favors explicit Redis keys and observable actions over hidden behavior.
  2. Deterministic behavior: Rendezvous hashing, lock-based ACME flows, and idempotent APIs are used to avoid race conditions.
  3. Security by default: Passwords are Argon2 hashed, tokens are SHA-256 hashed, TLS certificates never touch disk outside Redis/edge memory, and master tokens gate privileged operations.
  4. Extensibility: Docker-based deployment and modular Python/Lua components allow teams to swap integrations (e.g., external DNS secondaries, additional metrics sinks).
  5. Developer productivity: A justfile codifies common workflows (just up, just pytest, just deploy) to encourage consistent practice across contributors.

Value Proposition

DimensionTraditional CDN PainCirrus Value
Architectural flexibilityFragmented configs, centralized controlModular design + self-hostable control plane
Steering intelligenceStatic policies, coarse geographiesHealth-aware rendezvous hashing and dynamic node assignment
ObservabilityLimited end-to-end visibilityPrometheus metrics + structured logs across API, DNS, and edge
Security & complianceData sovereignty risksPrivate deployments, local certificate custody, hashed tokens
Operational efficiencyManual runbooks and toilAutomated ACME, deterministic purges, templated rollouts

Applicable Scenarios & Target Users

  • Scenarios:
    • Private/enterprise CDN (finance, healthcare, public sector)
    • Cross-border and multi-cloud ingress
    • IoT and edge applications
    • Content delivery and video acceleration
  • Target users:
    • Enterprise IT teams
    • Organizations with strict data security requirements
    • Vendors building hybrid cloud/edge apps
    • Infrastructure providers

Competitive Positioning

Cirrus is positioned as a self-hosted CDN control plane for engineering-led teams requiring sovereignty, automation, and extensibility. It complements managed CDNs by enabling private deployments and tight operational control while remaining API-first.

DimensionCirrus CDNManaged CDNs (Cloudflare, AWS, Alibaba)
DeploymentSelf-hosted or hybridFully managed
Control plane ownershipFullVendor
Data sovereigntyLocal custodyCloud/regional
AutomationBuilt-in ACME; API-first workflowsVaries by product
ObservabilityPrometheus + structured logsProduct-specific tooling
Cost modelInfra + bandwidth, controllablePer-GB/request billing

Sample KPIs (reference values)

Reference values aggregated from public NGINX/OpenResty benchmarks (see nginx-openresty_performance.csv).

MetricReference value
HTTP QPS (1 KB, cached static)≈ 1.31M rps (32–36 workers)
HTTPS QPS (1 KB, cached static)≈ 1.24M rps (36 workers)
TLS TPS (new handshake per req)≈ 58k tps (Ingress, 24 workers)
1 MB object throughput≈ 8.8 Gbps (≥ 4 workers)
How to interpret these sample metrics
  • Values are reference figures from public benchmarks; validate on your hardware and workload.
  • Compare p95 latency with and without cache to quantify benefit.
  • Track cache status distribution trends week-over-week.

Additional Scale Indicators (environment-dependent)

IndicatorHow to measureNotes
QPS per node (cached)Prometheus nginx_http_requests_totalCPU-bound; verify on target hardware.
Concurrent connectionsworker_processes * worker_connectionsDefault auto * 1024; tune at build time.
Latency reductionp95 nginx_http_request_duration_seconds vs origin RTTHITs should drop origin RTT from request path.
TLS stabilitynginx_ssl_handshake_errors_totalShould remain near zero.

Document Roadmap

Subsequent sections drill into each subsystem. For quick navigation:

Sections cross-reference code paths (for example, control-plane/src/cirrus/app.py:21 for API bootstrap) to ensure accuracy against the repository.

Future Iteration Roadmap

  • Celery metrics export: Expose worker/beat task durations and outcomes via Prometheus to close observability gaps.
  • Session store in Redis: Migrate in-memory sessions to Redis to enable stateless API scaling (referenced in scalability notes).
  • Managed/clustered Redis: Support high-availability configurations and failover testing guidance.
  • Extended DNS features: Optional geo overrides and additional record types in zone builder while preserving rendezvous stability.
  • Webhooks & audit trails: Outbound webhooks for domain/node changes; append-only audit logs for compliance.
  • Edge features: Origin shield tiers, per-path prefetch, richer cache rule predicates.
  • Security hardening: Token scoping/expiration, optional client TLS, configurable CORS policies in production.