--source-uri wss://rococo-rpc.polkadot.io \
--target-uri wss://bridge-hub-westend-rpc.dwellir.com \
--source-uri wss://rococo-rpc.polkadot.io
--source-uri wss://rococo-xyz1-rpc.polkadot.io
--source-uri wss://rococo-xyz2-rpc.polkadot.io
--target-uri wss://bridge-hub-westend-rpc.dwellir.com
--target-uri wss://bridge-hub-westend-xyz2-rpc.dwellir.com
--target-uri wss://bridge-hub-westend-xyz2-rpc.luckyfriday.com
So, if one node is overloaded, we just try another one.
This "load balancer" would do routing to the live and not overloaded node, instead of handling this in our code.
Polkadot client has failed to return its sync status: FailedToGetSystemHealth { chain: "Polkadot", error: RpcError(RestartNeeded(Transport(connection closed
2024-07-15T08:05:06Z {} [Polkadot_to_BridgeHubKusama_Parachains_1002] 2024-07-15 08:05:06 +00 WARN bridge Failed to read best Polkadot block: ChannelError("Background task of BridgeHubKusama client has exited with result: Err(ChannelError(\"Mandatory best headers subscription for BridgeHubKusama has finished\"))")
2024-07-15T03:17:36Z {} [Polkadot_to_BridgeHubKusama_Parachains_1002] 2024-07-15 03:17:36 +00 WARN bridge Failed to read head of Polkadot parachain ParaId(1002) at BridgeHubKusama: FailedToReadStorageValue { chain: "BridgeHubKusama", hash: "0x181d…2a58", key: StorageKey([243, 240, 56, 234, 7, 239, 168, 105, 144, 9, 71, 27, 60, 48, 159, 184, 100, 28, 243, 91, 238, 116, 177, 147, 83, 37, 172, 214, 89, 235, 25, 203, 127, 32, 114, 84, 61, 57, 196, 82, 229, 51, 84, 40, 99, 135, 86, 81, 234, 3, 0, 0]), error: RpcError(RestartNeeded(Transport(connection closed
2024-07-15T00:47:50Z {} [Polkadot_to_BridgeHubKusama_Parachains_1002] 2024-07-15 00:47:50 +00 WARN bridge Polkadot client has failed to return its sync status: FailedToGetSystemHealth { chain: "Polkadot", error: RpcError(RestartNeeded(Transport(connection closed
2024-07-14T23:18:10Z {} 2024-07-14 23:18:10 +00 ERROR bridge [BridgeHubKusama-to-BridgeHubPolkadot-on-demand-parachain] Failed to read relay data from BridgeHubPolkadot client: ChannelError("Background task of BridgeHubPolkadot client has exited with result: Err(ChannelError(\"Finalized headers subscription for BridgeHubPolkadot has finished\"))")
2024-07-14T23:04:57Z {} [Polkadot_to_BridgeHubKusama_Sync] 2024-07-14 23:04:57 +00 INFO bridge Call of PolkadotFinalityApi_free_headers_interval at BridgeHubKusama has failed with an error: FailedStateCall { chain: "BridgeHubKusama", hash: "0x8551…5ec9", method: "PolkadotFinalityApi_free_headers_interval", arguments: Bytes([]), error: RpcError(Call(ErrorObject { code: ServerError(4003), message: "Client error: Execution failed: Other: Exported method PolkadotFinalityApi_free_headers_interval is not found", data: None })) }. Treating as `None`
2024-07-14T23:04:57Z {} [Polkadot_to_BridgeHubKusama_Sync] 2024-07-14 23:04:57 +00 ERROR bridge Finality sync loop iteration has failed with error: Target(FailedToGetSystemHealth { chain: "BridgeHubKusama", error: RpcError(RestartNeeded(Transport(connection closed
2024-07-14T23:04:57Z {} 2024-07-14 23:04:57 +00 ERROR bridge [Polkadot-to-BridgeHubKusama-on-demand-headers] Failed to read best finalized source header from target: FailedToGetSystemHealth { chain: "BridgeHubKusama", error: RpcError(RestartNeeded(Transport(connection closed
2024-07-14T20:27:45Z {} 2024-07-14 20:27:45 +00 WARN bridge [Polkadot-to-BridgeHubKusama-on-demand-headers] Failed to scan mandatory Polkadot headers range ((21644741, 21647633)): FailedToReadHeaderHashByNumber { chain: "Polkadot", number: "21647633", error: RpcError(RestartNeeded(Transport(i/o error: Connection reset by peer (os error 104)
2024-07-14T00:35:31Z {} [Kusama_to_BridgeHubPolkadot_Parachains_1002] 2024-07-14 00:35:31 +00 WARN bridge Kusama client has failed to return its sync status: FailedToGetSystemHealth { chain: "Kusama", error: RpcError(RestartNeeded(Transport(connection closed
2024-07-12T22:50:53Z {} [BridgeHubPolkadot_to_BridgeHubKusama_MessageLane_00000001] 2024-07-12 22:50:53 +00 ERROR bridge Error retrieving state from BridgeHubKusama node: FailedToGetSystemHealth { chain: "BridgeHubKusama", error: RpcError(RestartNeeded(Transport(connection closed
2024-07-12T22:50:53Z {} [BridgeHubKusama_to_BridgeHubPolkadot_MessageLane_00000001] 2024-07-12 22:50:53 +00 ERROR bridge Error retrieving state from BridgeHubPolkadot node: FailedToGetSystemHealth { chain: "BridgeHubKusama", error: RpcError(RestartNeeded(Transport(connection closed
2024-07-12T22:42:56Z {} [Polkadot_to_BridgeHubKusama_Parachains_1002] 2024-07-12 22:42:56 +00 WARN bridge Polkadot client has failed to return its sync status: FailedToGetSystemHealth { chain: "Polkadot", error: RpcError(RestartNeeded(Transport(connection closed
2024-07-12T21:03:49Z {} [Kusama_to_BridgeHubPolkadot_Parachains_1002] 2024-07-12 21:03:49 +00 WARN bridge Kusama client has failed to return its sync status: FailedToGetSystemHealth { chain: "Kusama", error: RpcError(RestartNeeded(Transport(connection closed
2024-07-12T20:37:38Z {} [Kusama_to_BridgeHubPolkadot_Parachains_1002] 2024-07-12 20:37:38 +00 WARN bridge Failed to read best Kusama block: ChannelError("Background task of BridgeHubPolkadot client has exited with result: Err(ChannelError(\"Mandatory best headers subscription for BridgeHubPolkadot has finished\"))")
2024-07-12T20:13:04Z {} [Polkadot_to_BridgeHubKusama_Parachains_1002] 2024-07-12 20:13:04 +00 WARN bridge Failed to read best Polkadot block: ChannelError("Background task of BridgeHubKusama client has exited with result: Err(ChannelError(\"Mandatory best headers subscription for BridgeHubKusama has finished\"))")
2024-07-12T19:58:39Z {} 2024-07-12 19:58:39 +00 ERROR bridge [Polkadot-to-BridgeHubKusama-on-demand-headers] Failed to read best finalized source header from source: ChannelError("Background task of Polkadot client has exited with result: Err(ChannelError(\"Mandatory best headers subscription for Polkadot has finished\"))")
Investigate/check
RestartNeededdoes it stop loop or restart? Or the only solution is to restart substrate-relay?Possible improvement 1:
Now we are connected to the one exact node uri, e.g.:
If the node is down, or has some problem, we could configure
listofuris, so whenRestartNeeded, we rotate and try another uri, e.g.:So, if one node is overloaded, we just try another one.
Possible improvement 2 - connect substrate-relay to some "load balancer"
This "load balancer" would do routing to the live and not overloaded node, instead of handling this in our code.
Some logs from 2024-07-12/15
https://matrix.to/#/!FqmgUhjOliBGoncGwm:parity.io/$OjKXcX4aO9lkzM46fRLKXTMi-mf9vcpdJN_RDMgIn6o?via=parity.io
e.g.: