Player metrics
The player exposes real-time metrics viagetMetrics():
| Metric | Type | What it means |
|---|---|---|
timeToFirstFrame | number (ms) | Time from load() to first video frame rendered. Includes key fetch + first segment decrypt. Target: under 2s on broadband. |
avgDecryptTime | number (ms) | Average time per segment decryption. Should be under 5ms on modern hardware. |
avgFragLoadTime | number (ms) | Average time per fragment download. High values indicate CDN or bandwidth issues. |
avgKeyFetchTime | number (ms) | Average time per key server request. Target: under 100ms p95. |
keyFetchCount | number | Total key requests made. |
qualitySwitches | number | Number of ABR quality level changes. Frequent switches indicate unstable bandwidth. |
fragmentsLoaded | number | Total segments loaded so far. |
stallCount | number | Number of playback stalls (buffer underruns). Should be 0 under normal conditions. |
Reporting metrics
Send metrics to your analytics backend on playback end or periodically:What to alert on
| Condition | Possible cause |
|---|---|
timeToFirstFrame > 5s (p95) | Key server latency, slow CDN, or large first segment |
stallCount > 0 (frequent) | CDN throughput issues or client CPU overloaded from decryption |
avgKeyFetchTime > 500ms (p95) | Key server overloaded or network issue |
Key server metrics
Health check
GET /health returns 200 OK when the server is running. Use this for:
- Docker health checks:
HEALTHCHECK CMD curl -f http://localhost:4100/health - Load balancer probes
- Uptime monitoring (Pingdom, Better Uptime, etc.)
Request logging
The key server logs each request to stdout in JSON format. Pipe to your log aggregator (Datadog, CloudWatch, etc.) and monitor:| Metric | How to measure | Alert threshold |
|---|---|---|
| Key fetch latency | p50/p95 of GET /keys/:contentId response time | p95 > 100ms |
| Error rate (4xx) | Count of 401 + 403 responses / total requests | > 5% |
| Error rate (5xx) | Count of 500 responses / total requests | > 0.1% |
| Lease creation rate | Count of POST /keys/leases per minute | Unusual spike (>2x baseline) |
| Lease revocation rate | Count of POST /keys/leases/revoke per minute | Unusual spike |
Prometheus metrics (optional)
If you run a reverse proxy (nginx, Caddy) in front of the key server, use its built-in Prometheus exporter to track request rates, latency histograms, and error codes.Database monitoring
Postgres
If using Postgres for lease storage:| Metric | What to watch |
|---|---|
| Connection count | Should stay well below max_connections |
| Query latency | Lease queries should be under 10ms |
| Table size | leases table grows over time — monitor row count |
| Dead tuples | Run VACUUM if dead tuple ratio is high |
SQLite
If using SQLite (single-instance deployments):| Metric | What to watch |
|---|---|
| Database file size | Monitor /data/blindcast.db size |
| WAL file size | Large WAL files indicate slow checkpointing |
| Disk space | SQLite needs free disk space for journaling |
Infrastructure
| Component | Health check | What to monitor |
|---|---|---|
| Key server container | GET /health | CPU, memory, restart count |
| Postgres | pg_isready | Connections, replication lag, disk |
| S3 / R2 | AWS Health Dashboard | 4xx/5xx error rates on GET requests |
| CDN | Provider dashboard | Cache hit ratio, bandwidth, error rates |
CDN cache hit ratio
Target: >95% cache hit ratio for segment requests. Low hit ratios mean the CDN is fetching from origin on most requests, adding latency and cost.Dashboard template
Build a monitoring dashboard with these panels:- Player experience — Time to first frame (p50, p95, p99), stall rate
- Key server — Request rate, latency (p50, p95), error rate (4xx, 5xx)
- Leases — Active lease count, creation rate, revocation rate
- Infrastructure — Container CPU/memory, DB connections, CDN cache hit ratio
Debugging playback issues
When a viewer reports playback problems:- Check key server logs — Was the key request successful? Look for 401 (auth), 403 (lease revoked/expired), 500 (server error).
- Check player metrics — If
avgKeyFetchTimeis high, the issue is key server latency. IfstallCountis high, the issue is CDN or bandwidth. - Check CDN logs — Are segments being served? Look for 403 (CORS) or 404 (missing segments).
- Check lease state — If using leases, query the database:
SELECT * FROM leases WHERE viewer_id = 'user-123' ORDER BY created_at DESC LIMIT 5;