Skip to content

feat(rpc): report why a tx-status request timed out#15950

Draft
wacban wants to merge 9 commits into
masterfrom
waclaw/rpc
Draft

feat(rpc): report why a tx-status request timed out#15950
wacban wants to merge 9 commits into
masterfrom
waclaw/rpc

Conversation

@wacban

@wacban wacban commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Currently tx_status returns timeout error, while hiding the actual inner error. This changes the semantics to provide fuller information about what actually led to the timeout.

Possible timeout causes:

  • not_observed: the transaction was never seen on chain
  • pending: the transaction was observed but is below the requested finality
  • error: the node could not produce a status before the timeout

Also refactors tx_status_fetch into focused helpers (poll_tx_status, detect_invalid_tx, tx_status_on_timeout).

Transaction-status requests (`tx`, `EXPERIMENTAL_tx_status`, and the
`wait_until` path of `send_tx`/`broadcast_tx_commit`) that time out before
reaching the requested `wait_until` finality previously returned a
context-free `TIMEOUT_ERROR`. They now return a `TIMEOUT_ERROR` whose
`reason` explains what happened:

- `pending`: the transaction was observed but is still below the requested
  finality; carries the last-known status so callers can re-poll.
- `not_observed`: the transaction was never seen on chain.
- `error`: the node could not produce a status before the timeout.

Also refactors `tx_status_fetch` into focused helpers (`poll_tx_status`,
`detect_invalid_tx`, `tx_status_on_timeout`).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 79.27928% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.50%. Comparing base (6f16724) to head (50e8c98).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
chain/jsonrpc/src/lib.rs 77.14% 15 Missing and 1 partial ⚠️
chain/jsonrpc/src/api/transactions.rs 0.00% 5 Missing ⚠️
chain/jsonrpc-primitives/src/types/transactions.rs 97.14% 0 Missing and 1 partial ⚠️
chain/jsonrpc/openapi/src/main.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #15950      +/-   ##
==========================================
+ Coverage   72.46%   72.50%   +0.03%     
==========================================
  Files         946      946              
  Lines      204323   204322       -1     
  Branches   204323   204322       -1     
==========================================
+ Hits       148071   148144      +73     
+ Misses      51299    51218      -81     
- Partials     4953     4960       +7     
Flag Coverage Δ
pytests-nightly 1.10% <0.00%> (+<0.01%) ⬆️
unittests 69.48% <79.27%> (+0.03%) ⬆️
unittests-nightly 69.43% <79.27%> (+0.03%) ⬆️
unittests-spice 66.87% <79.27%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

wacban and others added 3 commits June 19, 2026 15:47
Bump the spec version to 1.2.12 and regenerate it to include the new
`TimeoutErrorReason` schema (`pending`/`not_observed`/`error`) on the
`TIMEOUT_ERROR` of `RpcTransactionError`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@wacban wacban requested review from Wiezzel and jancionear June 19, 2026 14:07
@wacban wacban marked this pull request as ready for review June 19, 2026 14:07
@wacban wacban requested review from a team and frol as code owners June 19, 2026 14:07
Copilot AI review requested due to automatic review settings June 19, 2026 14:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the JSON-RPC transaction status APIs to preserve and return why a tx-status request timed out by attaching a structured reason to TIMEOUT_ERROR (not_observed, pending with last-known status, or error with debug info). It also refactors tx_status_fetch into smaller helpers and updates the OpenAPI spec accordingly.

Changes:

  • Extend RpcTransactionError::TimeoutError to include reason: TimeoutErrorReason and add unit tests to validate serialization/round-tripping.
  • Refactor tx_status_fetch polling into poll_tx_status, validate_tx, and tx_status_on_timeout while recording a more informative timeout reason.
  • Update OpenAPI version + schema to document the new TIMEOUT_ERROR.info.reason payload, and document the behavior in the changelog.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
CHANGELOG.md Documents new TIMEOUT_ERROR semantics for tx-status-related timeouts.
chain/jsonrpc/src/lib.rs Implements timeout-reason reporting and refactors tx-status polling logic.
chain/jsonrpc/src/api/transactions.rs Maps TxStatusError::TimeoutError into the richer TimeoutErrorReason::Error form.
chain/jsonrpc/openapi/src/main.rs Bumps OpenAPI spec version.
chain/jsonrpc/openapi/openapi.json Updates schema to require TIMEOUT_ERROR.info.reason and defines TimeoutErrorReason.
chain/jsonrpc-primitives/src/types/transactions.rs Adds TimeoutErrorReason and makes TimeoutError carry reason; adds serde tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread chain/jsonrpc/src/lib.rs Outdated
Comment on lines +917 to +927
near_jsonrpc_primitives::types::transactions::RpcTransactionError::TimeoutError
near_jsonrpc_primitives::types::transactions::RpcTransactionError::TimeoutError {
reason: TimeoutErrorReason::NotObserved,
}
Comment thread chain/jsonrpc/openapi/openapi.json
Comment thread chain/jsonrpc/src/lib.rs Outdated
near_jsonrpc_primitives::types::transactions::RpcTransactionError::TimeoutError
let debug_info = format!("tx_exists timeout, last error: {:?}", last_error);
let reason = TimeoutErrorReason::Error { debug_info };
near_jsonrpc_primitives::types::transactions::RpcTransactionError::TimeoutError { reason }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do have to fully-qualify this? Looks pretty ugly to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I hate it too but it is the convention in jsonrpc

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can fix that, I don't see any reason why we're using fully qualified ids there 🤔

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: This ugliness was introduced there so we eventually come back and do something about it in a constructive way.

There are too many types in nearcore that are named almost the same way which leads to problems when nearcore engineers start using internal types instead of JSON RPC ones and then introduce unexpected breaking changes by accident while changing seemingly internal types.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing the context! I had no clue about the original motivation. It's not clear to me how fully qualified types prevent using internal types though. Anyway, what would be a constructive solution? A lint that only allows imports from a certain whitelist - currently it would be sth like {std, near_jsonrpc_*, near_primitives, near_client, ... }

Comment thread chain/jsonrpc/src/lib.rs Outdated
ControlFlow::Break(Ok(result.into()))
} else {
ControlFlow::Continue(TimeoutErrorReason::Pending {
status: Box::new(result.into()),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a big deal, but we're calling result.into() and Box::new() every block just to keep the reason that might or might not be used. Maybe we could stash the raw result instead and convert it in tx_status_on_timeout?

Comment thread chain/jsonrpc/src/lib.rs
) -> ControlFlow<Result<RpcTransactionResponse, RpcTransactionError>, TimeoutErrorReason> {
let (tx_hash, account_id) = tx_info.to_tx_hash_and_account();
let request = TxStatus { tx_hash, signer_account_id: account_id.clone(), fetch_receipt };
match self.view_client_send(request).await {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that when the RPC node doesn't track the relevant shard, ViewClientActor::get_tx_status will return Ok(TxStatusView { execution_outcome: None, status: TxExecutionStatus::None }) not an Err variant, which in turn will be converted to Pending instead of NotObserved in poll_tx_status. That seems contrary to the not_observed/pending semantics described here. Some not observed transactions will be treated as observed, but pending finalization.

Comment thread chain/jsonrpc/src/lib.rs Outdated
ProcessTxResponse::NoResponse => Self::TimeoutError,
ProcessTxResponse::NoResponse => Self::TimeoutError {
reason: TimeoutErrorReason::Error {
debug_info: "no response from the node".to_string(),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this message intentionally different than "the node timed out fetching the transaction status" in transactions.rs? If yes, I don't quite understand the difference. If no, maybe add some constructor with a standardized message.

@jancionear jancionear left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the change makes sense, but it doesn't solve the biggest UX problems that we have. I remember there were two big ones:

  1. The RPC node sends the transaction to the chunk producer, which might reject the transaction based on its state (validity period, congestion, mempool full, etc), but it doesn't let the RPC node know that the transaction was rejected. In this case the user thinks that the transaction was successfully submitted and waits for the result, even though it was rejected early on. Ideally we would notify the user that the transaction was rejected and why.

  2. When the RPC node doesn't track the shard it responds with a TimeoutError, which makes no sense.

I think the RPC layer deserves some sort of hollistic approach where we would go through every Err and NoResponse in the json rpc handler and the view client and make sure that the error mapping is sane.

Not sure about the backwards compatibility as well. We need to figure out how to handle this in OpenAPI and OpenRPC.

Comment thread chain/jsonrpc/src/lib.rs
Err(RpcTransactionError::TimeoutError { reason })
}

/// Send a transaction idempotently (subsequent send of the same transaction will not cause

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also refactors tx_status_fetch into focused helpers (poll_tx_status, detect_invalid_tx, tx_status_on_timeout).

nit: there's no detect_invalid_tx

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably renamed to validate_tx. But I think that detect_invalid_tx is actually a better name, as it sometimes is unable to determine if tx is valid or not.

@wacban

wacban commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

I think the change makes sense, but it doesn't solve the biggest UX problems that we have.

Sure, I wasn't aiming or claiming to solve the UX, I just wanted to improve the UI since the lack of context in the tx_status timeouts surfaces every now and then in user reports.

Not sure about the backwards compatibility as well. We need to figure out how to handle this in OpenAPI and OpenRPC.

Thanks, that's a good point and Adam raised the same issue. Let me check with DevEx team what are our options.

@jancionear

Copy link
Copy Markdown
Contributor

lack of context in the tx_status timeouts surfaces every now and then in user reports.

The timeouts might also be coming from this code which converts any error inside of process_tx_internal into a ProcessTxResponse::NoResponse, which later gets converted to a timeout error (AFAIR)

#[must_use]
    pub fn process_tx(
        &self,
        tx: SignedTransaction,
        is_forwarded: bool,
        check_only: bool,
    ) -> ProcessTxResponse {
        unwrap_or_return!(self.process_tx_internal(&tx, is_forwarded, check_only), {
            let signer = self.validator_signer.get();
            let me = signer.as_ref().map(|signer| signer.validator_id());
            tracing::debug!(target: "client", ?me, ?tx, "dropping tx");
            ProcessTxResponse::NoResponse
        })
    }

Comment thread chain/jsonrpc-primitives/src/types/transactions.rs
@wacban wacban marked this pull request as draft June 26, 2026 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants