-
Notifications
You must be signed in to change notification settings - Fork 157
Decentralize parachain relayer #1265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 10 commits
a6bc10f
2fe3c30
d9f601d
fb3c4e0
929e101
a6b2529
261fd89
e401cb8
fce791d
2d4387c
b933ae8
09a9d11
ebb257b
7620602
0deea62
a3a7f02
6dfa134
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,13 +1,16 @@ | ||||||
| package parachain | ||||||
|
|
||||||
| import ( | ||||||
| "errors" | ||||||
| "fmt" | ||||||
|
|
||||||
| "github.com/snowfork/snowbridge/relayer/config" | ||||||
| ) | ||||||
|
|
||||||
| type Config struct { | ||||||
| Source SourceConfig `mapstructure:"source"` | ||||||
| Sink SinkConfig `mapstructure:"sink"` | ||||||
| Source SourceConfig `mapstructure:"source"` | ||||||
| Sink SinkConfig `mapstructure:"sink"` | ||||||
| Relay RelayerConfig `mapstructure:"relay"` | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggest more specific naming.
Suggested change
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
| } | ||||||
|
|
||||||
| type SourceConfig struct { | ||||||
|
|
@@ -32,6 +35,18 @@ type SinkContractsConfig struct { | |||||
| Gateway string `mapstructure:"Gateway"` | ||||||
| } | ||||||
|
|
||||||
| type RelayerConfig struct { | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The naming should be more specific. "Relayer" is vague. Ie something like
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
| ID uint64 `mapstructure:"id"` | ||||||
| Num uint64 `mapstructure:"num"` | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Comments needs for these fields please
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
| } | ||||||
|
|
||||||
| func (r RelayerConfig) Validate() error { | ||||||
| if r.Num < 1 { | ||||||
| return errors.New("Number of relayer is not set") | ||||||
| } | ||||||
| return nil | ||||||
| } | ||||||
|
|
||||||
| type ChannelID [32]byte | ||||||
|
|
||||||
| func (c Config) Validate() error { | ||||||
|
|
@@ -66,5 +81,12 @@ func (c Config) Validate() error { | |||||
| if c.Sink.Contracts.Gateway == "" { | ||||||
| return fmt.Errorf("sink contracts setting [Gateway] is not set") | ||||||
| } | ||||||
|
|
||||||
| // Relay | ||||||
| err = c.Relay.Validate() | ||||||
| if err != nil { | ||||||
| return fmt.Errorf("relay config: %w", err) | ||||||
| } | ||||||
|
|
||||||
| return nil | ||||||
| } | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -24,5 +24,9 @@ | |
| "contracts": { | ||
| "Gateway": null | ||
| } | ||
| }, | ||
| "relay": { | ||
| "id": null, | ||
| "num": 2 | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some cleanup here remove the
subscribeNewMMRRoots.The reason is that when switching to a multiple relayers context with task scheduled in a round-robin manner. In case when a relayer is down,
subscribeNewMMRRootsdoes not make sense for avoid all relayers competing for the fail message at the same time.It seems a periodical scan with some randomness make more sense here, each round only one message is processed by a relayer, submit anyway when that message is timeout without anyone pick it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure i understand this - surely all relayers should still subscribe to getting new mmr roots, even if they're not actively going to send a message proof for a given message?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly for address the issue above. And I think subscribe is not required in any case, we do periodical scan for beacon also.
The pattern here is that in each round only one message will be processed by one relayer, then it will sleep for some time until next round, essentially give othere relayer chance to avoid race condition.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am open to this change, but I think we need more upfront design and analysis before writing more code. I'm worried that introducing polling & randomness could result in undefined behavior.
Specifically, I think the following would help:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.notion.so/snowfork/Decentralized-relayer-24511a7ec58b48f6911b9cf1307012af
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of randomness we should have primary and secondary slots, or N many slots.
A relayer would calculate their slots as follows for a message nonce:
Primary slot:
nonce % numRelayers == relayerIdSecondary slot:
nonce % numRelyars - 1 == relayerIdTertiary Slot:
nonce % numRelayers - 2 == relayerId... etc, for as many slots as there are relayers
If a nonce lands on the primary slot for a relayer it relays immediately.
If a nonce lands on secondary slot it sleeps 3 minutes, checks if the nonce is relayed, and if not relays it.
If a nonce lands on the tertiary slot, it sleeps 6 minutes, checks if the nonce is relayed, and if not relays it.
... etc, for as many slots as there are relayers.
This is deterministic and every relayer covers every other relayers nonce at least once in a given timeframe. If there is a misbehaving relayer we can easily identify who is missing their slots.
The code can be quite simple as well:
Then you would always sleep before relaying the message for
slot*3minutes before relaying. Whenslot == 0the sleep would be 0 minutes i.e. relay immediately. And then before relaying the nonce we would add a check if the nonce has not been relayed yet, which will always be the case for slot 0.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Though there is still concern in my mind does that mean the wait time will be linear correlation to the number of the relayers? So if we deployed 5 relayers, the worst case one relayer could wait for 15min, and 10 relayers for 30 mins. Is that waiting time too long?
Or is there any significant drawback for current impl(i.e. random polling + fixed waiting), IMHO it's not complicated and relayer also covers each other as tests demonstrated.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alistair-singh I think your idea is good, the concern above(e.g. waiting too long when there is a large number of relayers and most of them are down) is very rare and may not be a real problem.
So I add another PR in #1266 follow the above design, please review & check if that make sense.
IMO the current implementation(i.e. random polling + fixed waiting) is also not bad, it's very easy replicate to the execution relayer for the reverse direction, as there is only polling support there. With refactoring in this PR(i.e. remove subscribing) we share the same pattern for both directions.
I'll let team to make the final decision.