Gateway uptime
100.00%
Reliability report
RelayOrb has been running in production continuously since Feb 28, 2026, 2:15 AM. This page reports system behavior from internal instrumentation over 2026-05-02T00:00:00Z through 2026-06-01T23:59:59Z. The load is synthetic monitoring and control-plane traffic, not public user adoption, and the numbers reflect how the system performed under sustained automated load.
Gateway uptime
100.00%
Gateway requests served
86,450
Gateway p95 latency
31.57 ms
Gateway + registry uptime
99.996%
Gateway
Production-ready
100.00% uptime over the 30-day window, 86,450 requests served, and p95 latency of 31.57 ms.
Registry
Production-ready
99.99% uptime over the 30-day window with 120,978 internal control-plane requests served.
Rag worker
Resolved configuration regression
Most relayorb-rag-prod 5xx responses came from internal GET /metrics scrapes, not public invoke traffic. The worker repeatedly failed startup because capability registration returned 403 Forbidden for the default compute identity instead of the allowed relayorb-rag-sa service account.
The headline numbers on this page focus on gateway and registry because those are the production-grade control-plane components. The worker remained deployed, but this 30-day snapshot still includes the pre-fix synthetic monitoring period when a startup identity regression inflated worker errors without representing public invoke traffic.
These numbers come from Cloud Monitoring and Cloud Logging on the surviving prod services: gateway, registry, and worker. Availability is computed as non-5xx requests divided by total requests, latency comes from Cloud Run request latency percentiles, utilization comes from the container utilization distributions, and the cost profile is reconstructed from Cloud Run billable instance time plus public Cloud Billing SKU prices because billing export was not enabled.
The signal is still useful even though the traffic was internal. It shows how the healthy components behaved under constant heartbeat, scrape, and health traffic, and it also surfaced a real worker configuration regression. The previous deployment burned money by keeping warm services alive for synthetic traffic. The current deployment keeps the same endpoints live while letting the control plane sleep at zero traffic.
Gateway stays public at the edge. Registry and worker sit behind the control plane, registry owns capability state and heartbeats, and the gateway records invocation state in SQL-backed stores configured through secret-backed DATABASE_URLvalues.
Public gateway - private registry - worker - SQL state
Raw per-service numbers stay visible here. Gateway and registry are the components represented by the headline reliability cards above. Worker metrics remain included for honesty, with the failed `/metrics` startup loop called out separately instead of averaged into the front-door story.
| Service | Uptime | Requests | p50 / p95 / p99 | CPU avg | Memory avg | 5xx rate | Error logs |
|---|---|---|---|---|---|---|---|
Gateway relayorb-gateway-prod | 100.00% | 86,450 | 8.51 / 31.57 / 32.37 ms | 0.323% | 3.130% | 0.000% | 0 |
Registry relayorb-registry-prod | 99.99% | 120,978 | 7.00 / 12.85 / 13.97 ms | 0.092% | 3.589% | 0.007% | 9 |
Worker relayorb-rag-prod | 19.22% | 86,436 | 5.27 / 9.58 / 10.06 ms | 3.628% | 1.262% | 80.780% | 79,081 |
Most relayorb-rag-prod 5xx responses came from internal GET /metrics scrapes, not public invoke traffic.
The worker repeatedly failed startup because capability registration returned 403 Forbidden for the default compute identity instead of the allowed relayorb-rag-sa service account.
Cloud Run startup logs show failed readiness probes and worker registration errors for capability rag.search@v1 before later scrape retries succeeded.
Resolved 2026-06-02: relayorb-rag-prod now runs as relayorb-rag-sa@relayorb-prod.iam.gserviceaccount.com. The 30-day window on this page still reflects the pre-fix synthetic-monitoring period.
Result: the worker issue is real, but the 30-day 5xx volume mostly measures synthetic monitoring noise on the worker path, not public control-plane reliability.
Worker uptime
19.22%
Worker 5xx rate
80.78%
Worker error logs
79,081
Reference deployment
Modeled from the 30-day pre-shutdown topology: four always-warm demo services, the prod worker, the prod metrics scraper, and the request-driven prod edge services. This is the fixed warmth-burn profile that came from internal traffic.
Scale-to-zero current
The surviving prod services now run with minScale=0. With no traffic, fixed Cloud Run spend falls to zero and variable cost only appears when a real caller hits the gateway.
Cost is modeled from Cloud Monitoring billable instance time plus public Cloud Billing Catalog SKU prices for us-central1. Billing export was not enabled, so this is a reconstructed Cloud Run cost profile rather than an invoice export.
| Service | Billing mode | Min instances | Requests | Billable seconds | Estimated monthly cost |
|---|---|---|---|---|---|
Demo gateway relayorb-demo - relayorb-gateway-demo | instance-based | 1 | 13,677 | 2,599,229 | $49.39 |
Demo metrics scraper relayorb-demo - relayorb-metrics-scraper-demo | instance-based | 1 | 0 | 2,591,637 | $49.24 |
Demo worker relayorb-demo - relayorb-rag-demo | instance-based | 1 | 5 | 2,592,224 | $49.25 |
Demo registry relayorb-demo - relayorb-registry-demo | instance-based | 1 | 129,492 | 2,592,582 | $49.26 |
Gateway relayorb-prod - relayorb-gateway-prod | request-based | 0 | 86,450 | 8,654 | $0.22 |
Registry relayorb-prod - relayorb-registry-prod | request-based | 0 | 120,978 | 12,215 | $0.31 |
Worker relayorb-prod - relayorb-rag-prod | instance-based | 0 | 86,436 | 511,309 | $9.71 |
Prod metrics scraper relayorb-prod - relayorb-metrics-scraper-prod | instance-based | 1 | 0 | 2,591,816 | $49.24 |