fix: handle http2 goaway race conditions #1054

bayoumi17m · 2026-01-29T12:50:14Z

Fix HTTP/2 GOAWAY Race Conditions

Problem

When using HTTPX with HTTP/2 under high concurrency, users encounter errors where connections are closed for streams beyond the last_stream_id. This forces applications to implement retry logic for what should be transparent library behavior.

The root cause is a set of race conditions between h2's state machine and httpcore's event processing:

State machine race: h2 transitions to CLOSED before httpcore processes the GOAWAY event
Disconnect without context: Server disconnects immediately after GOAWAY, raising generic RemoteProtocolError("Server disconnected") with no retry opportunity
Insufficient exception context: ConnectionNotAvailable didn't convey why the connection was unavailable, preventing informed retry decisions

A user proposed changing > to >= in the stream ID comparison (discussion #1041), but this would violate RFC 7540 Section 6.8:

"The last stream identifier in the GOAWAY frame contains the highest-numbered stream identifier for which the sender of the GOAWAY frame might have taken some action on..."

A stream matching last_stream_id may have already been processed, making automatic retry unsafe for non-idempotent operations.

Solution

This PR introduces proper GOAWAY handling that maintains RFC compliance while enabling safe automatic retries:

1. New `ConnectionGoingAway` Exception

A dedicated exception (subclass of ConnectionNotAvailable) that carries GOAWAY context:

last_stream_id / request_stream_id for RFC 7540 stream comparison
error_code to distinguish graceful shutdown (NO_ERROR) from errors
headers_sent / body_sent to track request phase at time of GOAWAY
Helper properties: is_safe_to_retry, is_graceful_shutdown, may_have_side_effects

This exception is exported publicly, allowing applications to implement idempotency-aware retry logic when needed.

2. New `DRAINING` Connection State

Connections receiving a graceful GOAWAY (NO_ERROR) transition to DRAINING rather than CLOSED:

Allows in-flight streams to complete normally
Rejects new requests (connection reports is_available() = False)
Connection expires once all streams complete

This fits naturally into the existing HTTPConnectionState enum pattern and makes state transitions easier to reason about.

3. Request Phase Tracking

Each stream tracks headers_sent and body_sent to determine potential side effects:

If GOAWAY arrives before headers are sent → definitely no side effects
If headers/body were sent → server may have processed the request

This information flows through ConnectionGoingAway.may_have_side_effects for retry decisions.

4. Connection Pool Retry Logic

The connection pool now handles ConnectionGoingAway with RFC-compliant retry semantics:

Condition	Behavior
`stream_id > last_stream_id`	Auto-retry (guaranteed unprocessed per RFC 7540)
Graceful shutdown + no side effects	Auto-retry (likely safe)
Otherwise	Raise `RemoteProtocolError` with context for application decision

Behavior Summary

Scenario	Before	After
Stream ID > last_stream_id	`ConnectionNotAvailable` (retry worked but was implicit)	Auto-retry with explicit RFC justification
Graceful shutdown, headers not sent	`RemoteProtocolError` or `ConnectionNotAvailable`	Auto-retry
Potentially processed request	Generic error	`RemoteProtocolError` with cause chain to `ConnectionGoingAway`
h2 CLOSED race condition	Generic protocol error	Proper `ConnectionGoingAway` with conservative assumptions
Server disconnect after GOAWAY	`RemoteProtocolError("Server disconnected")`	`ConnectionGoingAway` with retry context

Design Decisions

Why a new connection state vs. a boolean flag?
The DRAINING state fits the existing HTTPConnectionState pattern and makes state transitions explicit. A boolean would work but adds cognitive overhead when reasoning about valid state combinations.

Why track headers and body separately?
Both are relevant for side-effect determination. Applications can use this granular information - for example, a read-only GET that sent headers but no body might be safer to retry than a POST with body sent.

Why raise RemoteProtocolError instead of ConnectionGoingAway for potentially-processed requests?
Balances API simplicity with power. Most users catch RemoteProtocolError; advanced users can inspect __cause__ to access the full ConnectionGoingAway context. Since ConnectionGoingAway extends ConnectionNotAvailable, it's also catchable directly for those who need it.

Why conservative assumptions when h2 is CLOSED without GOAWAY event?
When the connection is closed but we haven't processed a GOAWAY event, we can't know the true last_stream_id. Assuming the current stream may have been processed is the safe default - better to surface an error than silently retry a potentially-processed request.

Related Issues

Test Plan

Unit tests for ConnectionGoingAway exception properties
Unit tests for DRAINING state transitions
Integration tests simulating GOAWAY during request
Integration tests for race condition scenarios (h2 CLOSED before event processing)
Verify existing tests still pass

Checklist

I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
I've updated the documentation accordingly.

…itions

…ce conditions

…away race conditions

…ttp2 goaway race conditions

#2)

…coverage (#3)

fix: handle http2 goaway race conditions

081ccf0

bayoumi17m closed this Jan 29, 2026

Magd Bayoumi added 2 commits January 29, 2026 07:58

fixup! fix: handle http2 goaway race conditions

98b0dd8

fixup! fixup! fix: handle http2 goaway race conditions

473d82e

bayoumi17m reopened this Jan 29, 2026

Magd Bayoumi and others added 9 commits January 29, 2026 08:15

fixup! fixup! fixup! fix: handle http2 goaway race conditions

995c04f

fixup! fixup! fixup! fixup! fix: handle http2 goaway race conditions

5cf1d0c

fixup! fixup! fixup! fixup! fixup! fix: handle http2 goaway race cond…

61afe91

…itions

fixup! fixup! fixup! fixup! fixup! fixup! fix: handle http2 goaway ra…

2fe76e7

…ce conditions

fixup! fixup! fixup! fixup! fixup! fixup! fixup! fix: handle http2 go…

83056e3

…away race conditions

fixup! fixup! fixup! fixup! fixup! fixup! fixup! fixup! fix: handle h…

efac8b8

…ttp2 goaway race conditions

test: add comprehensive tests for HTTP/2 GOAWAY handling (#1)

6d0511a

fix: resolve linting issues and regenerate sync files for GOAWAY tests (

f2e483e

#2)

test: add edge case tests for HTTP/2 GOAWAY handling to achieve 100% …

e401864

…coverage (#3)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle http2 goaway race conditions #1054

fix: handle http2 goaway race conditions #1054

bayoumi17m commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: handle http2 goaway race conditions #1054

Are you sure you want to change the base?

fix: handle http2 goaway race conditions #1054

Conversation

bayoumi17m commented Jan 29, 2026

Fix HTTP/2 GOAWAY Race Conditions

Problem

Solution

1. New ConnectionGoingAway Exception

2. New DRAINING Connection State

3. Request Phase Tracking

4. Connection Pool Retry Logic

Behavior Summary

Design Decisions

Related Issues

Test Plan

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. New `ConnectionGoingAway` Exception

2. New `DRAINING` Connection State