Skip to content

bug(coderd/externalauth): refreshed tokens discarded when post-refresh validation fails, causing permanent auth failure #23473

@eugeneotto

Description

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When RefreshToken successfully obtains new tokens from the provider but the subsequent ValidateToken call fails (e.g., provider returns 500, 502, 503, or times out), the new tokens are silently discarded. Since the provider already rotated the refresh token during the refresh call, the old refresh token in the database is now invalid. The next refresh attempt fails with bad_refresh_token, the refresh token is cleared, and the user must manually re-authenticate.

The sequence:

  1. Access token expires. An agent calls RefreshToken.
  2. TokenSource(...).Token() succeeds — the provider issues a new access token and rotates the refresh token. The old refresh token is now invalid on the provider's side.
  3. ValidateToken (GET to the provider's /user endpoint) returns a non-200 response (500, timeout, etc.). RefreshToken exits at line 271 without saving the new tokens. The DB update at line 289 is only
    reached after successful validation.
  4. The database still contains the old expired access token and the old (now-consumed) refresh token.
  5. On the next refresh attempt, the provider rejects the consumed refresh token with bad_refresh_token. isFailedRefresh clears the refresh token and caches the error. The user must re-authenticate.

This is made worse when multiple agents refresh concurrently: all agents read the same expired token, one wins the refresh (consuming the refresh token), and the losers immediately get
bad_refresh_token. If the winner's validation also fails, the optimistic lock from #22904 doesn't help — the winner never updated the row, so a loser's UpdateExternalAuthLinkRefreshToken clears the
token immediately rather than on the next attempt.

Two additional issues compound this:

  • The validation retry loop (lines 264–283) only retries on !valid (401/403). A 500 or timeout from the provider exits immediately with no retry.
  • There is no server-side logging when this occurs. The error is returned in the HTTP response body to the agent at line 1991 but never logged.

Relevant Log Output

No server-side logs are produced for this failure. The only evidence is in the `external_auth_links.oauth_refresh_failure_reason` column:

> oauth2: "bad_refresh_token" "The refresh token passed is incorrect or expired."

All affected rows show oauth_expiry = updated_at + 8h, consistent with the failure occurring at token expiry time. Affected users are likely to have concurrent workspace agents.

Expected Behavior

  1. A successfully refreshed token should be persisted to the database before or regardless of post-refresh validation. Validation failure should not discard valid tokens obtained from the provider.
  2. Transient provider errors (5xx, timeouts) during validation should be retried, not just 401/403.
  3. The validation failure path should log a warning server-side so operators can diagnose the issue.

Steps to Reproduce

  1. Configure a GitHub App as an external auth provider (tokens expire after 8 hours, refresh tokens are rotated on use).
  2. Wait for the token to expire.
  3. Trigger a refresh while the provider is under enough load that its /user API endpoint occasionally returns a non-200 response.
  4. The refreshed tokens are discarded and the old (consumed) refresh token remains in the database, causing permanent auth failure on the next attempt.

Concurrent agents accelerate this — multiple agents hitting the expired token simultaneously increase the window for the validation call to overlap with provider load.

Environment

  • Coder version: v2.31.3

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions