Executive Overview
Cirrus CDN delivers a programmable content delivery network oriented toward operators who require tight control over origin routing, TLS automation, and operational telemetry. The control plane is built around FastAPI services, Redis-backed configuration, and a Next.js administration portal, while the data plane relies on OpenResty to serve and cache traffic. This chapter summarizes the mission, stakeholders, differentiators, and guiding principles that shape the platform.
Background & Goals
As global content delivery and edge computing rapidly expand, traditional CDN architectures face bottlenecks in latency, flexibility, and cost control. When dealing with multi-region traffic steering, multi-cloud ingress, security compliance, and intelligent operations, enterprises increasingly need a CDN control system that is self-hostable, extensible, and evolvable over time.
The Cirrus CDN control plane ("Cirrus") aims to provide a unified configuration center, a scheduling engine, and an observability layer; it supports cross-cloud and cross-region node governance to deliver highly available, highly autonomous content delivery.
Platform Differentiators
- End-to-end automation: ACME issuance and renewals are baked into the control plane and executed via Celery workers, eliminating ad-hoc scripts.
- Integrated DNS authority: The system synthesizes authoritative DNS zones (hidden master + NSD slave) to steer clients toward healthy edge nodes using rendezvous hashing.
- Config-in-Redis: All mutable state—domains, nodes, certificates, purge queues—is stored in Redis. This provides atomic updates, pub/sub notifications, and easy inspection.
- Composable frontend: The Next.js portal consumes the same REST APIs as external clients, ensuring equivalence between manual and automated workflows.
- Observability-first: OpenResty emits Prometheus metrics, health checks update node status, and logs are rotated automatically; these are first-class citizens, not afterthoughts.
Guiding Principles
- Operational clarity: Each subsystem (API, DNS, edge, automation) favors explicit Redis keys and observable actions over hidden behavior.
- Deterministic behavior: Rendezvous hashing, lock-based ACME flows, and idempotent APIs are used to avoid race conditions.
- Security by default: Passwords are Argon2 hashed, tokens are SHA-256 hashed, TLS certificates never touch disk outside Redis/edge memory, and master tokens gate privileged operations.
- Extensibility: Docker-based deployment and modular Python/Lua components allow teams to swap integrations (e.g., external DNS secondaries, additional metrics sinks).
- Developer productivity: A
justfilecodifies common workflows (just up,just pytest,just deploy) to encourage consistent practice across contributors.
Value Proposition
| Dimension | Traditional CDN Pain | Cirrus Value |
|---|---|---|
| Architectural flexibility | Fragmented configs, centralized control | Modular design + self-hostable control plane |
| Steering intelligence | Static policies, coarse geographies | Health-aware rendezvous hashing and dynamic node assignment |
| Observability | Limited end-to-end visibility | Prometheus metrics + structured logs across API, DNS, and edge |
| Security & compliance | Data sovereignty risks | Private deployments, local certificate custody, hashed tokens |
| Operational efficiency | Manual runbooks and toil | Automated ACME, deterministic purges, templated rollouts |
Applicable Scenarios & Target Users
- Scenarios:
- Private/enterprise CDN (finance, healthcare, public sector)
- Cross-border and multi-cloud ingress
- IoT and edge applications
- Content delivery and video acceleration
- Target users:
- Enterprise IT teams
- Organizations with strict data security requirements
- Vendors building hybrid cloud/edge apps
- Infrastructure providers
Competitive Positioning
Cirrus is positioned as a self-hosted CDN control plane for engineering-led teams requiring sovereignty, automation, and extensibility. It complements managed CDNs by enabling private deployments and tight operational control while remaining API-first.
| Dimension | Cirrus CDN | Managed CDNs (Cloudflare, AWS, Alibaba) |
|---|---|---|
| Deployment | Self-hosted or hybrid | Fully managed |
| Control plane ownership | Full | Vendor |
| Data sovereignty | Local custody | Cloud/regional |
| Automation | Built-in ACME; API-first workflows | Varies by product |
| Observability | Prometheus + structured logs | Product-specific tooling |
| Cost model | Infra + bandwidth, controllable | Per-GB/request billing |
Sample KPIs (reference values)
Reference values aggregated from public NGINX/OpenResty benchmarks (see nginx-openresty_performance.csv).
| Metric | Reference value |
|---|---|
| HTTP QPS (1 KB, cached static) | ≈ 1.31M rps (32–36 workers) |
| HTTPS QPS (1 KB, cached static) | ≈ 1.24M rps (36 workers) |
| TLS TPS (new handshake per req) | ≈ 58k tps (Ingress, 24 workers) |
| 1 MB object throughput | ≈ 8.8 Gbps (≥ 4 workers) |
How to interpret these sample metrics
- Values are reference figures from public benchmarks; validate on your hardware and workload.
- Compare p95 latency with and without cache to quantify benefit.
- Track cache status distribution trends week-over-week.
Additional Scale Indicators (environment-dependent)
| Indicator | How to measure | Notes |
|---|---|---|
| QPS per node (cached) | Prometheus nginx_http_requests_total | CPU-bound; verify on target hardware. |
| Concurrent connections | worker_processes * worker_connections | Default auto * 1024; tune at build time. |
| Latency reduction | p95 nginx_http_request_duration_seconds vs origin RTT | HITs should drop origin RTT from request path. |
| TLS stability | nginx_ssl_handshake_errors_total | Should remain near zero. |
Document Roadmap
Subsequent sections drill into each subsystem. For quick navigation:
- High-level overview: Overview and System Architecture
- Automation, DNS, edge: Automation & Certificates, DNS & Traffic Engineering, Data Plane
- Frontend: Frontend Experience
- Security & compliance: Security & Compliance
- Operations: Operations & Observability
- References: Appendices
Sections cross-reference code paths (for example, control-plane/src/cirrus/app.py:21 for API bootstrap) to ensure accuracy against the repository.
Future Iteration Roadmap
- Celery metrics export: Expose worker/beat task durations and outcomes via Prometheus to close observability gaps.
- Session store in Redis: Migrate in-memory sessions to Redis to enable stateless API scaling (referenced in scalability notes).
- Managed/clustered Redis: Support high-availability configurations and failover testing guidance.
- Extended DNS features: Optional geo overrides and additional record types in zone builder while preserving rendezvous stability.
- Webhooks & audit trails: Outbound webhooks for domain/node changes; append-only audit logs for compliance.
- Edge features: Origin shield tiers, per-path prefetch, richer cache rule predicates.
- Security hardening: Token scoping/expiration, optional client TLS, configurable CORS policies in production.