Add bytesarray support to the c-api#8009
Conversation
📝 WalkthroughWalkthroughAdds CPython-compatible C-API bytearray bindings: extern FFI functions for construction, conversion, sizing, buffer access, and resize; makes VM resize public; adds FFI pointer conversion and exports the new module. ChangesC-API bytearray bindings
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
c32543e to
05cec26
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/capi/src/bytearrayobject.rs`:
- Around line 21-24: The code path in bytearrayobject.rs that handles
bytes.is_null() calls unsafe set_len which exposes uninitialized memory; change
it to allocate and zero-initialize the buffer instead (e.g., create a Vec<u8>
filled with zeros of length len or allocate then resize/fill with 0) so the
produced PyByteArray matches CPython's zero-filled semantics; update the branch
that currently creates "let mut data = Vec::with_capacity(len); unsafe {
data.set_len(len) }" to produce a zeroed Vec and then remove the
#[allow(clippy::uninit_vec)] since uninitialized vec usage will no longer be
present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 7a1301fd-f873-44d2-a8d0-ee7b54ab6bb9
📒 Files selected for processing (4)
crates/capi/src/bytearrayobject.rscrates/capi/src/lib.rscrates/capi/src/util.rscrates/vm/src/builtins/bytearray.rs
| let mut data = Vec::with_capacity(len); | ||
| unsafe { data.set_len(len) }; |
There was a problem hiding this comment.
is this safe? to be safe, buffer must be written before read. this code doesn't seem to ensure that.
There was a problem hiding this comment.
This is unsafe indeed, but it is the responsibility of the caller to write the data before reading it.
See for example this snipped in PyO3.
https://github.com/PyO3/pyo3/blob/b2163a0916db9fe1baef3bcd3e224531dce508fe/src/types/bytearray.rs#L78-L97
pub fn new_with<F>(py: Python<'_>, len: usize, init: F) -> PyResult<Bound<'_, PyByteArray>>
where
F: FnOnce(&mut [u8]) -> PyResult<()>,
{
unsafe {
// Allocate buffer and check for an error
let pybytearray: Bound<'_, Self> =
ffi::PyByteArray_FromStringAndSize(core::ptr::null(), len as ffi::Py_ssize_t)
.assume_owned_or_err(py)?
.cast_into_unchecked();
let buffer: *mut u8 = ffi::PyByteArray_AsString(pybytearray.as_ptr()).cast();
debug_assert!(!buffer.is_null());
// Zero-initialise the uninitialised bytearray
core::ptr::write_bytes(buffer, 0u8, len);
// (Further) Initialise the bytearray in init
// If init returns an Err, pypybytearray will automatically deallocate the buffer
init(core::slice::from_raw_parts_mut(buffer, len)).map(|_| pybytearray)
}
}We could also initialise it with zero ourselves, but this writes the data 2 times, which is wasteful in my eyes.
There was a problem hiding this comment.
Please add a # Safety section to this function about this decision. The section needs to include:
- This is memory unsafe in certain conditions.
- And describe the conditions.
- Why we decide take this decision.
05cec26 to
210880b
Compare
210880b to
d95e102
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/capi/src/bytearrayobject.rs`:
- Around line 25-34: The C-API requires PyByteArray_AsString to return a buffer
with a trailing NUL at buf[len], but current implementation stores exactly len
bytes in PyBytesInner::elements and returns pointers into that Vec, breaking the
invariant (especially for len==0). Fix by ensuring the backing allocation always
has one extra sentinel byte: change PyBytesInner storage/management to allocate
elements with capacity/len+1 (store logical length separately if needed), update
PyByteArray_FromStringAndSize to allocate len+1 and set elements[len]=0, make
PyByteArray_AsString/borrow_buf_mut return pointer into that buffer
(guaranteeing ptr[len]==0), and update PyByteArray_Resize/resize to preserve or
reallocate the extra terminator (when shrinking keep
elements.len()==logical_len+1 and set terminator; when growing reserve len+1 and
set new terminator). Modify references to PyBytesInner::elements accordingly so
Python-visible length remains unchanged while the internal buffer includes the
extra NUL.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 500801e0-537b-4045-a294-b1d4580378be
📒 Files selected for processing (4)
crates/capi/src/bytearrayobject.rscrates/capi/src/lib.rscrates/capi/src/util.rscrates/vm/src/builtins/bytearray.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- crates/capi/src/lib.rs
- crates/vm/src/builtins/bytearray.rs
- crates/capi/src/util.rs
Yet another AI assisted batch of c-api functions.
Summary by CodeRabbit
New Features
Chores