All zero-GPU spaces consume time even when no GPU is obtained. When a zero-GPU space is in the state “Waiting for a GPU to become available,” the user’s GPU time is already being charged. If you eventually get a GPU, it works as expected. However, if you don’t and the process ends with “No GPU was available for you,” the user’s time continues to be consumed for 60 seconds without ever reaching a GPU.
Hmm. Recently, I feel like we have been seeing a number of errors that look somewhat network-related, though not necessarily on the public Internet side. The tricky part is that the trigger and root cause are hard to pin down. @hysts
I do not think this is best explained as one single root cause.
The pattern looks more like several different managed-service boundaries producing similar user-visible symptoms:
- a build job does not start or does not produce logs;
- a Space is stuck in Paused / Building / Starting / 503;
- a server is running, but some requests do not reach it;
- a ZeroGPU job waits for a GPU, fails to obtain one, but still appears to consume quota;
- API/custom/frontend paths behave differently from the normal Hub page;
- large uploads/downloads stall in a Hub/Xet/CDN path;
- external API calls from a Space fail in a DNS/egress/policy-looking way.
Those are all “network-ish” from the outside, but they are not necessarily the same layer.
Short version
I would group the recent clues like this:
| Boundary | User-visible symptom | My current read |
|---|---|---|
| Spaces builder / scheduler / control-plane | Empty build logs, build never triggers, restart/factory rebuild 503, paused state | Stronger signal than I initially weighted. |
| Spaces proxy / routing | App/server is running, but some external requests 503 or never reach the app | Distinct from an app exception. |
| ZeroGPU reservation / refund / quota settlement | GPU is never assigned, but requested time appears consumed | Still suspicious; separate from known xlarge cost behavior. |
| ZeroGPU request identity | Browser/API/custom frontend/Space-to-Space quota behavior differs | Partly known; X-IP-Token and auth path matter. |
| Hub transfer backend | Large upload/download stalls, Xet/HTTP fallback/cache/CDN weirdness | Adjacent, probably separate from Spaces. |
| External egress / abuse-control | DNS/external API failures, Cloudflare/VPN/shared-IP/keepalive effects | Separate path, but also network-looking. |
| Blackwell / ZeroGPU runtime churn | sm_120, CUDA wheel, FlashAttention/xFormers/Triton failures |
Relevant background, not the main theory for most reports here. |
| Industry capacity pressure | More queueing, quotas, routing, transfer backends, abuse controls | Background context only; not proof of direct causality. |
So I would not call this simply “ZeroGPU is broken” or “the public Internet is broken.”
A more useful description might be:
Several user-visible failures appear network-related, but the evidence points to multiple service boundaries: Spaces build/control-plane, proxy/routing, ZeroGPU scheduling/accounting, request identity, Hub transfer backend, and external egress/security policy.
Known or partly explained pieces
Some parts already seem publicly understood or at least partially handled.
| Area | Public signal | Why I would not treat it as unexplained |
|---|---|---|
| Free-user ZeroGPU run-count / quota-message confusion | In Free Account ZeroGPU Quota Issue, hysts explained that Free users had a request-count limit in addition to time quota, and that the error message was initially misleading. |
This explains some “quota exceeded” reports, but not all quota/accounting symptoms. |
xlarge 2× quota behavior |
In Getting quota exceeded even though requested seconds is less than what’s left, hysts explained the xlarge fallback / 2× quota cost and a display-side issue. |
This is important, but it is different from “no GPU was assigned but time was consumed.” |
| ZeroGPU Blackwell-backed sizing | Current ZeroGPU docs describe RTX PRO 6000 Blackwell-backed large / xlarge behavior and quota implications. |
Runtime context changed, but CUDA/kernel mismatch has a different symptom shape. |
Custom frontend / gr.Server quota identity |
gradio-app/gradio#13209 describes missing ZeroGPU x-ip-token behavior in custom gr.Server frontends; gradio-app/gradio#13210 fixed it. |
This is a known request-identity class, not necessarily a GPU allocator/accounting bug. |
| Space-to-Space / API identity | Gradio documents ZeroGPU client behavior and X-IP-Token forwarding in Using ZeroGPU Spaces with the Clients. HF also documents Spaces as API endpoints. |
Browser path, API path, custom frontend path, and Space-to-Space path can behave differently. |
This matters because it narrows the remaining question.
The remaining suspicious reports are not simply “any quota error” or “any ZeroGPU delay.” They are more specific.
Still suspicious: ZeroGPU time charged before assignment
The clearest current ZeroGPU-specific unresolved-looking report is:
The reported flow is roughly:
Waiting for a GPU to become available
→ user GPU time is already being charged
→ no GPU is obtained
→ process ends with "No GPU was available for you"
→ 60 seconds appear to be consumed anyway
That does not look like a normal CUDA/Blackwell/kernel issue.
A Blackwell compatibility failure usually looks like:
sm_120
no kernel image is available
invalid device function
old CUDA wheel
old PyTorch wheel
FlashAttention / xFormers / Triton failure
The “time charged before assignment” symptom looks more like:
ZeroGPU scheduler
→ duration reservation
→ GPU allocator
→ no worker assigned
→ failure
→ settlement/refund questionable
I would keep this separate from already-explained xlarge quota cost.
Known/expected:
durationmatters for scheduling.xlargecosts more quota thanlarge.- remaining quota can affect queue behavior.
- authenticated vs unauthenticated paths can affect quota pool.
Still suspicious:
- no GPU worker is assigned;
- user code may not even enter the
@spaces.GPUfunction; - the job ends with “No GPU was available”;
- the requested duration still appears consumed.
If that is accurate, the interesting question is not “why did the model fail?” It is:
Is a reserved duration fully refunded when the ZeroGPU allocator fails to assign a GPU worker?
Stronger than expected: Spaces builder / scheduler / control-plane symptoms
I would give this category more weight than ZeroGPU-only explanations.
There are several reports where the issue seems to happen before the app can do anything useful.
Examples:
- huggingface_hub#3452: Docker Space not triggering a build despite a valid Dockerfile and correct SDK; build logs empty.
- Docker spaces stuck in building with empty logs — all Docker builds affected: Docker SDK builds stuck with empty logs, while another SDK path works.
- Stuck at space problem: long build/container wait and little/no logging; a reply said a similar issue had already been reported internally and infra was working on it.
- Space stuck in Paused state — 503 on restart and factory rebuild: restart and factory rebuild both returning 503.
Those do not sound like ordinary Python exceptions.
The rough shape is:
Repo / Space settings
→ build scheduler
→ builder
→ runtime state machine
→ logs
→ app start
If logs are empty or the build never really starts, the failure is likely before the app’s Python code.
This is also why “just change requirements.txt” or “restart again” often feels random in these cases. The failure may be in the orchestration path rather than the application path.
Also suspicious: running server, request not reaching it
Another important class is:
- 503 shown for website for some request, while server is running and requests are being served
- Docker Spaces returning
{"data":[]}/ proxy not forwarding requests
The key phrase in the 503 report is that the request is not reaching the server.
That is a different failure class from:
request reaches app
→ app raises exception
→ app returns 500
It is closer to:
browser/API
→ HF edge/proxy
→ route/backend selection/health state
→ container
→ app
If the app is alive but failed requests never appear in the app’s access logs, the interesting layer is before user code.
This is not an exotic failure pattern in managed cloud systems. Load balancers and proxies can return 5xx before the backend application sees the request. General references:
- Google Cloud Load Balancing: troubleshooting 5xx errors
- AWS Application Load Balancer troubleshooting
- Kubernetes Service 503 troubleshooting
That does not prove the HF issue is the same mechanism. It only shows that “server process is alive” and “external proxy can route to it reliably” are separate facts.
Hub upload/download instability: probably separate, but relevant context
I would not merge Hub upload/download instability with Spaces runtime issues.
Still, it is relevant as an adjacent pattern: HF-facing managed transfer layers can also produce “network-looking” failures.
Examples:
- HF status currently shows a resolved large-file download incident where downloads stalled/hung via US Central CDN endpoints, especially with XET protocol, due to a Google Cloud us-central1 infrastructure issue: Hugging Face Status.
- huggingface_hub#4085: large downloads via HF Hub stuck in Colab.
- huggingface_hub#3266:
HF_HUB_DISABLE_XETreportedly not disabling Xet in one setup. - huggingface_hub#3868: HTTP download path can be too large, requiring
hf_xet. - hf-xet on PyPI describes
hf-xetas the transfer layer used byhuggingface_hubfor Xet storage, with chunk-based deduplication and local disk caching.
Different path:
Spaces request path:
browser/API
→ Spaces proxy
→ runtime/container
→ app
ZeroGPU path:
Gradio request
→ ZeroGPU queue
→ quota reservation
→ GPU allocator
→ worker
→ settlement/refund
Hub transfer path:
huggingface_hub / hf CLI / datasets / transformers
→ auth/account tier
→ Xet or HTTP fallback
→ CAS/range/chunking
→ cache/filesystem
→ CDN/cloud route
So I would mention Hub UL/DL only as a separate track.
It does not explain a Space restart 503 by itself. It does not explain ZeroGPU quota settlement by itself. But it supports a broader observation: several HF-facing systems now involve more managed transfer/routing/cache/quota layers than a simple “HTTP request to one server” mental model.
External egress and abuse-control: another separate network-looking class
Another separate class is outbound traffic from Spaces:
Space container
→ DNS / egress policy / external API
This is different from inbound Spaces proxy routing.
There have been scattered reports around external APIs, DNS, Cloudflare-like traffic, keepalive behavior, and abuse-control decisions. The exact causes may differ case by case. But this is another example where the user sees “network failure” while the actual boundary may be:
- outbound DNS;
- egress policy;
- blocked or classified target domain;
- shared IP reputation;
- VPN / Cloudflare / Worker / bot-like traffic classification;
- abuse-handler / pause-state logic.
So I would keep a separate mental bucket for:
inbound route fails
outbound egress fails
build/control-plane fails
quota/scheduler fails
transfer backend fails
abuse/security state changes
They are not the same problem.
Blackwell / ZeroGPU runtime churn: useful background, not the center
I would not make Blackwell the main theory here.
The reason is not only that the specific symptoms look different. It is also that the broader report pattern seems to involve more Spaces/control-plane/proxy/routing reports than unresolved ZeroGPU-only reports.
Still, Blackwell is useful background.
Publicly visible facts:
- Current ZeroGPU docs describe RTX PRO 6000 Blackwell-backed
largeandxlargeZeroGPU sizes. - NVIDIA RTX PRO 6000 instead of H200 for ZeroGPU discusses the runtime change and compatibility symptoms.
- hub-docs PR #2474 updated the ZeroGPU docs for Blackwell.
This is a large visible runtime-contract change.
I cannot verify from the outside whether there was a coordinated internal backend rollout. But a runtime change of that size likely touches more than the displayed GPU label: supported PyTorch versions, GPU sizing, quota behavior, validation, allocation pools, and compatibility assumptions can all move.
So I would phrase it carefully:
Blackwell / ZeroGPU runtime churn is relevant context, not a proven root cause.
It may explain CUDA/kernel-shaped failures. It should not be used as a blanket explanation for builder logs, proxy routing, or quota refund behavior.
Broader industry context: capacity pressure may increase boundary-state failures
This is also background, not proof.
The AI infrastructure industry is under real pressure: GPU supply, data-center capacity, power, cooling, network/storage infrastructure, and cost.
Useful public context:
- AWS: Navigating GPU challenges — cost optimizing AI workloads
- JLL Global Data Center Outlook
- Reuters: US AI boom faces electric shock
- OpenAI status: reduced ChatGPT availability for Free users due to limited capacity
- Anthropic status history
- Cloudflare outage on February 20, 2026
This does not mean “HF is failing because GPUs are scarce.”
A safer interpretation is:
When compute, power, bandwidth, and cost pressure rise,
platforms tend to add or tighten:
- queueing
- quotas
- duration budgets
- request-count limits
- regional routing
- fallback pools
- transfer backends
- caching
- abuse controls
- plan-based capacity rules
Those layers are often necessary. They also create more boundary states.
A user might see:
No GPU available
Quota exceeded
503
Request did not reach server
Download stuck at 99%
Upload stalls
Space paused
Factory rebuild fails
But the underlying reason may be very different in each case.
What I would infer, cautiously
My current working hypothesis is:
There may be no single outage. There may be a cluster of boundary-state issues appearing around the same time.
The most likely buckets are:
| Bucket | Confidence | Why |
|---|---|---|
| Spaces builder / scheduler / control-plane | High | Empty logs, stuck builds, restart/factory rebuild 503, paused-state reports. |
| Spaces proxy / routing / backend selection | High | Reports that a server is running but some requests do not reach it. |
| ZeroGPU reservation / refund / settlement | Medium-high | “No GPU assigned but time consumed” is specific and not explained by normal duration behavior. |
| ZeroGPU request identity | Medium | Known X-IP-Token / auth path issues exist; some are fixed or documented. |
| Hub transfer backend | Medium | Public XET/CDN/GCP incident and several large-transfer reports exist, but this is separate from Spaces runtime. |
| Blackwell runtime churn | Background | Important timing/context, but most suspicious reports are not CUDA/kernel-shaped. |
| Industry resource pressure | Background | Makes quota/queue/routing/cache layers more plausible, but does not prove causality. |
A cleaner way to talk about this
Instead of saying:
HF network is broken.
or:
ZeroGPU is broken.
I would say something like:
Several recent reports look network-related from the outside, but the evidence points to different managed-service boundaries: Spaces build/control-plane, Spaces proxy/routing, ZeroGPU reservation/accounting, request identity, Hub transfer backend, and external egress/security policy. Some ZeroGPU pieces are already explained or fixed, while the remaining suspicious cases seem to involve reservation/refund and request-not-reaching-container behavior.
That keeps the claim narrow.
It also helps separate facts from speculation:
| Fact-like observation | Interpretation |
|---|---|
Free-user ZeroGPU request-count limit was explained by hysts. |
Do not treat all quota errors as unexplained. |
xlarge 2× cost and misleading requested-time display were explained. |
Do not confuse this with failed-GPU refund behavior. |
gr.Server custom frontend ZeroGPU identity issue had a Gradio fix. |
Some API/custom frontend quota behavior is known. |
| A report says no GPU was assigned but 60s consumed. | Possible reservation/refund/settlement issue. |
| A report says server is running but request does not reach it. | Possible proxy/routing/backend-state issue. |
| Docker build logs can be empty despite valid files. | Possible builder/scheduler/control-plane issue. |
| HF status shows an XET/CDN/GCP large-file download incident. | Transfer path can fail below the user’s code. |
| Blackwell migration changed the ZeroGPU runtime contract. | Background churn, not universal cause. |
Bottom line
I would focus less on reporting templates and more on this factual split:
-
Already explained / partly fixed ZeroGPU issues
- Free run-count vs quota message.
xlargequota cost / display mismatch.gr.Server/ custom frontendx-ip-tokenpath.
-
Still suspicious ZeroGPU issue
- GPU not assigned, but requested time appears consumed.
- This looks like reservation/refund/settlement, not CUDA.
-
Broader Spaces issues
- Empty build logs.
- Build not triggered.
- Paused state.
- Restart/factory rebuild 503.
- Server running but requests not reaching it.
-
Separate transfer/egress context
- Xet/CDN large-file transfer incidents.
- Upload/download stalls.
- External API/DNS/egress problems.
-
Background pressure
- Blackwell runtime churn.
- Industry-wide compute/network/power/cost pressure.
- More queueing, quota, routing, cache, transfer, and abuse-control layers.
My current guess is that the interesting part is not “one bug” but where evidence disappears:
build logs disappear
→ builder / scheduler / control-plane
request logs disappear
→ proxy / routing / backend selection
GPU function entry log disappears but quota changes
→ ZeroGPU scheduler / reservation / settlement
download progress disappears
→ Hub transfer / Xet / CDN / cache
external API DNS disappears
→ egress / DNS / policy / abuse-control
That is probably the cleanest way to make the discussion useful without overclaiming.