Time charged before GPU assignment in zerogpu spaces

All zero-GPU spaces consume time even when no GPU is obtained. When a zero-GPU space is in the state “Waiting for a GPU to become available,” the user’s GPU time is already being charged. If you eventually get a GPU, it works as expected. However, if you don’t and the process ends with “No GPU was available for you,” the user’s time continues to be consumed for 60 seconds without ever reaching a GPU.

Hmm. Recently, I feel like we have been seeing a number of errors that look somewhat network-related, though not necessarily on the public Internet side. The tricky part is that the trigger and root cause are hard to pin down. @hysts


I do not think this is best explained as one single root cause.

The pattern looks more like several different managed-service boundaries producing similar user-visible symptoms:

  • a build job does not start or does not produce logs;
  • a Space is stuck in Paused / Building / Starting / 503;
  • a server is running, but some requests do not reach it;
  • a ZeroGPU job waits for a GPU, fails to obtain one, but still appears to consume quota;
  • API/custom/frontend paths behave differently from the normal Hub page;
  • large uploads/downloads stall in a Hub/Xet/CDN path;
  • external API calls from a Space fail in a DNS/egress/policy-looking way.

Those are all “network-ish” from the outside, but they are not necessarily the same layer.

Short version

I would group the recent clues like this:

Boundary User-visible symptom My current read
Spaces builder / scheduler / control-plane Empty build logs, build never triggers, restart/factory rebuild 503, paused state Stronger signal than I initially weighted.
Spaces proxy / routing App/server is running, but some external requests 503 or never reach the app Distinct from an app exception.
ZeroGPU reservation / refund / quota settlement GPU is never assigned, but requested time appears consumed Still suspicious; separate from known xlarge cost behavior.
ZeroGPU request identity Browser/API/custom frontend/Space-to-Space quota behavior differs Partly known; X-IP-Token and auth path matter.
Hub transfer backend Large upload/download stalls, Xet/HTTP fallback/cache/CDN weirdness Adjacent, probably separate from Spaces.
External egress / abuse-control DNS/external API failures, Cloudflare/VPN/shared-IP/keepalive effects Separate path, but also network-looking.
Blackwell / ZeroGPU runtime churn sm_120, CUDA wheel, FlashAttention/xFormers/Triton failures Relevant background, not the main theory for most reports here.
Industry capacity pressure More queueing, quotas, routing, transfer backends, abuse controls Background context only; not proof of direct causality.

So I would not call this simply “ZeroGPU is broken” or “the public Internet is broken.”

A more useful description might be:

Several user-visible failures appear network-related, but the evidence points to multiple service boundaries: Spaces build/control-plane, proxy/routing, ZeroGPU scheduling/accounting, request identity, Hub transfer backend, and external egress/security policy.

Known or partly explained pieces

Some parts already seem publicly understood or at least partially handled.

Area Public signal Why I would not treat it as unexplained
Free-user ZeroGPU run-count / quota-message confusion In Free Account ZeroGPU Quota Issue, hysts explained that Free users had a request-count limit in addition to time quota, and that the error message was initially misleading. This explains some “quota exceeded” reports, but not all quota/accounting symptoms.
xlarge 2× quota behavior In Getting quota exceeded even though requested seconds is less than what’s left, hysts explained the xlarge fallback / 2× quota cost and a display-side issue. This is important, but it is different from “no GPU was assigned but time was consumed.”
ZeroGPU Blackwell-backed sizing Current ZeroGPU docs describe RTX PRO 6000 Blackwell-backed large / xlarge behavior and quota implications. Runtime context changed, but CUDA/kernel mismatch has a different symptom shape.
Custom frontend / gr.Server quota identity gradio-app/gradio#13209 describes missing ZeroGPU x-ip-token behavior in custom gr.Server frontends; gradio-app/gradio#13210 fixed it. This is a known request-identity class, not necessarily a GPU allocator/accounting bug.
Space-to-Space / API identity Gradio documents ZeroGPU client behavior and X-IP-Token forwarding in Using ZeroGPU Spaces with the Clients. HF also documents Spaces as API endpoints. Browser path, API path, custom frontend path, and Space-to-Space path can behave differently.

This matters because it narrows the remaining question.

The remaining suspicious reports are not simply “any quota error” or “any ZeroGPU delay.” They are more specific.

Still suspicious: ZeroGPU time charged before assignment

The clearest current ZeroGPU-specific unresolved-looking report is:

The reported flow is roughly:

Waiting for a GPU to become available
→ user GPU time is already being charged
→ no GPU is obtained
→ process ends with "No GPU was available for you"
→ 60 seconds appear to be consumed anyway

That does not look like a normal CUDA/Blackwell/kernel issue.

A Blackwell compatibility failure usually looks like:

sm_120
no kernel image is available
invalid device function
old CUDA wheel
old PyTorch wheel
FlashAttention / xFormers / Triton failure

The “time charged before assignment” symptom looks more like:

ZeroGPU scheduler
→ duration reservation
→ GPU allocator
→ no worker assigned
→ failure
→ settlement/refund questionable

I would keep this separate from already-explained xlarge quota cost.

Known/expected:

  • duration matters for scheduling.
  • xlarge costs more quota than large.
  • remaining quota can affect queue behavior.
  • authenticated vs unauthenticated paths can affect quota pool.

Still suspicious:

  • no GPU worker is assigned;
  • user code may not even enter the @spaces.GPU function;
  • the job ends with “No GPU was available”;
  • the requested duration still appears consumed.

If that is accurate, the interesting question is not “why did the model fail?” It is:

Is a reserved duration fully refunded when the ZeroGPU allocator fails to assign a GPU worker?

Stronger than expected: Spaces builder / scheduler / control-plane symptoms

I would give this category more weight than ZeroGPU-only explanations.

There are several reports where the issue seems to happen before the app can do anything useful.

Examples:

Those do not sound like ordinary Python exceptions.

The rough shape is:

Repo / Space settings
→ build scheduler
→ builder
→ runtime state machine
→ logs
→ app start

If logs are empty or the build never really starts, the failure is likely before the app’s Python code.

This is also why “just change requirements.txt” or “restart again” often feels random in these cases. The failure may be in the orchestration path rather than the application path.

Also suspicious: running server, request not reaching it

Another important class is:

The key phrase in the 503 report is that the request is not reaching the server.

That is a different failure class from:

request reaches app
→ app raises exception
→ app returns 500

It is closer to:

browser/API
→ HF edge/proxy
→ route/backend selection/health state
→ container
→ app

If the app is alive but failed requests never appear in the app’s access logs, the interesting layer is before user code.

This is not an exotic failure pattern in managed cloud systems. Load balancers and proxies can return 5xx before the backend application sees the request. General references:

That does not prove the HF issue is the same mechanism. It only shows that “server process is alive” and “external proxy can route to it reliably” are separate facts.

Hub upload/download instability: probably separate, but relevant context

I would not merge Hub upload/download instability with Spaces runtime issues.

Still, it is relevant as an adjacent pattern: HF-facing managed transfer layers can also produce “network-looking” failures.

Examples:

  • HF status currently shows a resolved large-file download incident where downloads stalled/hung via US Central CDN endpoints, especially with XET protocol, due to a Google Cloud us-central1 infrastructure issue: Hugging Face Status.
  • huggingface_hub#4085: large downloads via HF Hub stuck in Colab.
  • huggingface_hub#3266: HF_HUB_DISABLE_XET reportedly not disabling Xet in one setup.
  • huggingface_hub#3868: HTTP download path can be too large, requiring hf_xet.
  • hf-xet on PyPI describes hf-xet as the transfer layer used by huggingface_hub for Xet storage, with chunk-based deduplication and local disk caching.

Different path:

Spaces request path:
browser/API
→ Spaces proxy
→ runtime/container
→ app

ZeroGPU path:
Gradio request
→ ZeroGPU queue
→ quota reservation
→ GPU allocator
→ worker
→ settlement/refund

Hub transfer path:
huggingface_hub / hf CLI / datasets / transformers
→ auth/account tier
→ Xet or HTTP fallback
→ CAS/range/chunking
→ cache/filesystem
→ CDN/cloud route

So I would mention Hub UL/DL only as a separate track.

It does not explain a Space restart 503 by itself. It does not explain ZeroGPU quota settlement by itself. But it supports a broader observation: several HF-facing systems now involve more managed transfer/routing/cache/quota layers than a simple “HTTP request to one server” mental model.

External egress and abuse-control: another separate network-looking class

Another separate class is outbound traffic from Spaces:

Space container
→ DNS / egress policy / external API

This is different from inbound Spaces proxy routing.

There have been scattered reports around external APIs, DNS, Cloudflare-like traffic, keepalive behavior, and abuse-control decisions. The exact causes may differ case by case. But this is another example where the user sees “network failure” while the actual boundary may be:

  • outbound DNS;
  • egress policy;
  • blocked or classified target domain;
  • shared IP reputation;
  • VPN / Cloudflare / Worker / bot-like traffic classification;
  • abuse-handler / pause-state logic.

So I would keep a separate mental bucket for:

inbound route fails
outbound egress fails
build/control-plane fails
quota/scheduler fails
transfer backend fails
abuse/security state changes

They are not the same problem.

Blackwell / ZeroGPU runtime churn: useful background, not the center

I would not make Blackwell the main theory here.

The reason is not only that the specific symptoms look different. It is also that the broader report pattern seems to involve more Spaces/control-plane/proxy/routing reports than unresolved ZeroGPU-only reports.

Still, Blackwell is useful background.

Publicly visible facts:

This is a large visible runtime-contract change.

I cannot verify from the outside whether there was a coordinated internal backend rollout. But a runtime change of that size likely touches more than the displayed GPU label: supported PyTorch versions, GPU sizing, quota behavior, validation, allocation pools, and compatibility assumptions can all move.

So I would phrase it carefully:

Blackwell / ZeroGPU runtime churn is relevant context, not a proven root cause.

It may explain CUDA/kernel-shaped failures. It should not be used as a blanket explanation for builder logs, proxy routing, or quota refund behavior.

Broader industry context: capacity pressure may increase boundary-state failures

This is also background, not proof.

The AI infrastructure industry is under real pressure: GPU supply, data-center capacity, power, cooling, network/storage infrastructure, and cost.

Useful public context:

This does not mean “HF is failing because GPUs are scarce.”

A safer interpretation is:

When compute, power, bandwidth, and cost pressure rise,
platforms tend to add or tighten:

- queueing
- quotas
- duration budgets
- request-count limits
- regional routing
- fallback pools
- transfer backends
- caching
- abuse controls
- plan-based capacity rules

Those layers are often necessary. They also create more boundary states.

A user might see:

No GPU available
Quota exceeded
503
Request did not reach server
Download stuck at 99%
Upload stalls
Space paused
Factory rebuild fails

But the underlying reason may be very different in each case.

What I would infer, cautiously

My current working hypothesis is:

There may be no single outage. There may be a cluster of boundary-state issues appearing around the same time.

The most likely buckets are:

Bucket Confidence Why
Spaces builder / scheduler / control-plane High Empty logs, stuck builds, restart/factory rebuild 503, paused-state reports.
Spaces proxy / routing / backend selection High Reports that a server is running but some requests do not reach it.
ZeroGPU reservation / refund / settlement Medium-high “No GPU assigned but time consumed” is specific and not explained by normal duration behavior.
ZeroGPU request identity Medium Known X-IP-Token / auth path issues exist; some are fixed or documented.
Hub transfer backend Medium Public XET/CDN/GCP incident and several large-transfer reports exist, but this is separate from Spaces runtime.
Blackwell runtime churn Background Important timing/context, but most suspicious reports are not CUDA/kernel-shaped.
Industry resource pressure Background Makes quota/queue/routing/cache layers more plausible, but does not prove causality.

A cleaner way to talk about this

Instead of saying:

HF network is broken.

or:

ZeroGPU is broken.

I would say something like:

Several recent reports look network-related from the outside, but the evidence points to different managed-service boundaries: Spaces build/control-plane, Spaces proxy/routing, ZeroGPU reservation/accounting, request identity, Hub transfer backend, and external egress/security policy. Some ZeroGPU pieces are already explained or fixed, while the remaining suspicious cases seem to involve reservation/refund and request-not-reaching-container behavior.

That keeps the claim narrow.

It also helps separate facts from speculation:

Fact-like observation Interpretation
Free-user ZeroGPU request-count limit was explained by hysts. Do not treat all quota errors as unexplained.
xlarge 2× cost and misleading requested-time display were explained. Do not confuse this with failed-GPU refund behavior.
gr.Server custom frontend ZeroGPU identity issue had a Gradio fix. Some API/custom frontend quota behavior is known.
A report says no GPU was assigned but 60s consumed. Possible reservation/refund/settlement issue.
A report says server is running but request does not reach it. Possible proxy/routing/backend-state issue.
Docker build logs can be empty despite valid files. Possible builder/scheduler/control-plane issue.
HF status shows an XET/CDN/GCP large-file download incident. Transfer path can fail below the user’s code.
Blackwell migration changed the ZeroGPU runtime contract. Background churn, not universal cause.

Bottom line

I would focus less on reporting templates and more on this factual split:

  1. Already explained / partly fixed ZeroGPU issues

    • Free run-count vs quota message.
    • xlarge quota cost / display mismatch.
    • gr.Server / custom frontend x-ip-token path.
  2. Still suspicious ZeroGPU issue

    • GPU not assigned, but requested time appears consumed.
    • This looks like reservation/refund/settlement, not CUDA.
  3. Broader Spaces issues

    • Empty build logs.
    • Build not triggered.
    • Paused state.
    • Restart/factory rebuild 503.
    • Server running but requests not reaching it.
  4. Separate transfer/egress context

    • Xet/CDN large-file transfer incidents.
    • Upload/download stalls.
    • External API/DNS/egress problems.
  5. Background pressure

    • Blackwell runtime churn.
    • Industry-wide compute/network/power/cost pressure.
    • More queueing, quota, routing, cache, transfer, and abuse-control layers.

My current guess is that the interesting part is not “one bug” but where evidence disappears:

build logs disappear
→ builder / scheduler / control-plane

request logs disappear
→ proxy / routing / backend selection

GPU function entry log disappears but quota changes
→ ZeroGPU scheduler / reservation / settlement

download progress disappears
→ Hub transfer / Xet / CDN / cache

external API DNS disappears
→ egress / DNS / policy / abuse-control

That is probably the cleanest way to make the discussion useful without overclaiming.