Skip to content

RFC-145 (former RFC-4) implementation#8641

Open
s0me0ne-unkn0wn wants to merge 166 commits into
masterfrom
s0me0ne/rfc4
Open

RFC-145 (former RFC-4) implementation#8641
s0me0ne-unkn0wn wants to merge 166 commits into
masterfrom
s0me0ne/rfc4

Conversation

@s0me0ne-unkn0wn
Copy link
Copy Markdown
Contributor

@s0me0ne-unkn0wn s0me0ne-unkn0wn commented May 25, 2025

Overview

This PR implements RFC-0145: Remove the host-side runtime memory allocator

It uses @koute's picoalloc as the runtime allocator.

CI notes

This PR disables all the Chopsticks CI tests. Being the host function provider for the tested runtime, Chopticks obviously cannot provide the new host functions introduced by the RFC. Tests can be re-enabled after this PR is merged and the Chopsticks codebase catches up.

try-runtime exposed the same problem, but it is somewhat easier to mitigate. This PR introduces a pipeline to build try-runtime binary in-place instead of downloading binary artifacts. Thus, try-runtime tests are preserved. The old behavior, which is more resource-efficient, may be restored after this PR is merged and the changes are adopted in try-runtime.

Implementation notes

This PR adopts PPP#6 and PPP#7, as per the RFC.

The PPP#6 implementation implies that:

  1. Runtimes calling storage_root_v1 always use V0 trie layout
  2. Runtimes calling storage_root_v2 always provide the correct layout version

Thus, non-confirming runtimes may break. As non-conformance to those rules is non-conformance to the protocol spec, we are okay with it. That was agreed with @bkchr.

The storage_kill call is deprecated in favor of clear_prefix with the clear_prefix logic changing at the same time, according to PPP#7. Reviewers are encouraged to provide a thorough review of this part, as I am unfamiliar with it.

Adoption plan

TBD

@s0me0ne-unkn0wn

This comment was marked as resolved.

@bkchr

This comment was marked as resolved.

@bkchr

This comment was marked as resolved.

@bkchr
Copy link
Copy Markdown
Member

bkchr commented Jun 19, 2025

  • *_clear_prefix is using prototypes introduced by PPP#7. If these functions hit the limit, they are supposed to fill the output buffer with a cursor, which is a SCALE-encoded Option<last_key_seen>. Now the problem is that it's impossible to predict the size of the buffer the runtime needs to provide, as the storage key length is theoretically unbounded.

While we implemented these methods, they never got used. So, not sure. They provide some value and we could fix it by having some extra function for getting the cursor and we buffer the cursor internally on the node.

Comment thread substrate/primitives/runtime-interface/src/pass_by.rs Outdated
@s0me0ne-unkn0wn
Copy link
Copy Markdown
Contributor Author

The difference between v1 and v2 is that the state version is passed. This is actually not required and thus, it was v2 was deprecated.

Yeah, I just wasn't sure if I could just get rid of the version and that's it, was afraid to rip something useful. It's clear now, thank you!

Not sure what you don't understand there? You have one function that returns the size of the input and the other one is for reading the data. The read function would take a mut ptr to where the input data is written.

That has already been clarified with @koute, and I think your proposal, which you made in the original RFC discussion, to have a single function instead of two, makes a lot of sense. Even more sense would be made by passing the data length as an argument (as well as passing the fat pointer right now) so the runtime knows at once how much to allocate before calling the "gimme input" function.

While we implemented these methods, they never got used. So, not sure. They provide some value and we could fix it by having some extra function for getting the cursor and we buffer the cursor internally on the node.

I'm currently trying to integrate it into the implementation as much as possible (it's in #8866 but please don't dive there yet, it's far from being ready) cuz you and @tomaka did a great job sorting that out, so why would we lose that, especially given that we're breaking host function signatures anyway. That comes with some challenges because we don't really know how much buffer a storage key would require, as it's unbounded, and we basically cannot repeat the call if the buffer was not enough, because something has already been deleted after the call, but I believe we'll be able to find some common ground here when #8866 comes to the review point.

@bkchr
Copy link
Copy Markdown
Member

bkchr commented Jun 20, 2025

That comes with some challenges because we don't really know how much buffer a storage key would require, as it's unbounded, and we basically cannot repeat the call if the buffer was not enough, because something has already been deleted after the call, but I believe we'll be able to find some common ground here when #8866 comes to the review point.

What I mean above is that you create some new extra host function get_cursor and we cache the cursor of the latest clear_prefix* call and return it there.

@s0me0ne-unkn0wn
Copy link
Copy Markdown
Contributor Author

What I mean above is that you create some new extra host function get_cursor and we cache the cursor of the latest clear_prefix* call and return it there.

A good idea, I'll give it a try, thank you!

Copy link
Copy Markdown
Contributor

@koute koute left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet another review pass.

#[cfg(not(substrate_runtime))]
#( #attrs )*
#maybe_allow_non_snake
pub fn #function_name( #( #args, )* ) #return_value {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are wrapped methods supposed to be internal? Should we make them non-pub?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question of preference, I suppose. There's nothing bad in giving developers access to "raw" methods. I doubt if it has any value, but at the same time, I can imagine someone writing their own wrapper implementing different logic / different initial limits / etc. I'd say it doesn't hurt to leave them public.

Comment thread substrate/primitives/runtime-interface/proc-macro/src/utils.rs Outdated
Comment thread substrate/primitives/io/Cargo.toml Outdated
Comment thread substrate/primitives/io/src/lib.rs Outdated
Comment thread substrate/primitives/io/src/lib.rs Outdated
Comment thread substrate/primitives/io/src/lib.rs Outdated
Comment thread substrate/primitives/io/src/lib.rs Outdated
Comment thread substrate/primitives/io/src/lib.rs Outdated
Comment thread substrate/primitives/io/src/lib.rs Outdated
// implementation, this is registered as version 4 instead.
#[version(4)]
#[wrapped]
fn clear_prefix(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related: the RFC says this:

All cursors must be deemed invalid as soon as another storage-modifying function has been called. Different usage may result in remaining storage keys or undefined behaviour.

This is kinda... wishy-washy. Can we make it more explicit in the ERRATA'd version of the RFC? Ideally the RFC should say exactly what happens, and not "may result" or "undefined behavior".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes we have to use a UB as a trade-off. i.e., what options do we have in this case? Do we really want to explicitly check/clear the last cursor on every storage operation just to introduce non-breakable determinism? Does it justify the introduced overhead?

Or maybe you have some ideas on how to formulate that, both to promote deterministic behavior and to avoid introducing overhead?

@s0me0ne-unkn0wn s0me0ne-unkn0wn requested a review from cheme as a code owner May 21, 2026 13:41
@paritytech-workflow-stopper
Copy link
Copy Markdown

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/26446139255
Failed job name: fmt

@s0me0ne-unkn0wn
Copy link
Copy Markdown
Contributor Author

/cmd update-ui

@github-actions
Copy link
Copy Markdown
Contributor

Command "update-ui" has started 🚀 See logs here

@github-actions
Copy link
Copy Markdown
Contributor

Command "update-ui" has finished ✅ See logs here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T5-host_functions This PR/Issue is related to host functions.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants