Skip to content

Add bytesarray support to the c-api#8009

Merged
youknowone merged 2 commits into
RustPython:mainfrom
bschoenmaeckers:c-api-bytesarray
Jun 3, 2026
Merged

Add bytesarray support to the c-api#8009
youknowone merged 2 commits into
RustPython:mainfrom
bschoenmaeckers:c-api-bytesarray

Conversation

@bschoenmaeckers
Copy link
Copy Markdown
Contributor

@bschoenmaeckers bschoenmaeckers commented Jun 1, 2026

Yet another AI assisted batch of c-api functions.

Summary by CodeRabbit

  • New Features

    • Added C-API support for bytearray: create from raw data or Python objects, query size, access mutable buffer, and resize dynamically.
  • Chores

    • Exposed bytearray C-API module and made internal adjustments to enable safe resizing and pointer interoperability.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Adds CPython-compatible C-API bytearray bindings: extern FFI functions for construction, conversion, sizing, buffer access, and resize; makes VM resize public; adds FFI pointer conversion and exports the new module.

Changes

C-API bytearray bindings

Layer / File(s) Summary
Public resize method and FFI pointer trait
crates/vm/src/builtins/bytearray.rs, crates/capi/src/util.rs
PyByteArray::resize is made public for C-API use, and FfiResult<*mut c_char> is implemented for *mut u8 to enable mutable pointer returns in FFI.
C-API bytearray bindings and tests
crates/capi/src/bytearrayobject.rs
Extern "C" functions implement PyByteArray_FromStringAndSize (handles NULL and negative len, allocates/copies buffer), PyByteArray_FromObject (converts object to bytes), PyByteArray_Size, PyByteArray_AsString (returns mutable buffer pointer), and PyByteArray_Resize (delegates to VM). A disabled test module is included.
Module export
crates/capi/src/lib.rs
Adds pub mod bytearrayobject; to expose the new C-API module.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • youknowone
  • ShaharNaveh

🐰 I hopped through bytes with glee,
Rust cords stitched to C,
Buffers stretch and pointers flee,
Bytearrays dance for free,
A tiny rabbit's FFI spree.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add bytesarray support to the c-api' clearly and directly summarizes the main change: adding C-API functions for bytearray operations.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/capi/src/bytearrayobject.rs`:
- Around line 21-24: The code path in bytearrayobject.rs that handles
bytes.is_null() calls unsafe set_len which exposes uninitialized memory; change
it to allocate and zero-initialize the buffer instead (e.g., create a Vec<u8>
filled with zeros of length len or allocate then resize/fill with 0) so the
produced PyByteArray matches CPython's zero-filled semantics; update the branch
that currently creates "let mut data = Vec::with_capacity(len); unsafe {
data.set_len(len) }" to produce a zeroed Vec and then remove the
#[allow(clippy::uninit_vec)] since uninitialized vec usage will no longer be
present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 7a1301fd-f873-44d2-a8d0-ee7b54ab6bb9

📥 Commits

Reviewing files that changed from the base of the PR and between 885cf5c and 05cec26.

📒 Files selected for processing (4)
  • crates/capi/src/bytearrayobject.rs
  • crates/capi/src/lib.rs
  • crates/capi/src/util.rs
  • crates/vm/src/builtins/bytearray.rs

Comment thread crates/capi/src/bytearrayobject.rs
Comment on lines +22 to +23
let mut data = Vec::with_capacity(len);
unsafe { data.set_len(len) };
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this safe? to be safe, buffer must be written before read. this code doesn't seem to ensure that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unsafe indeed, but it is the responsibility of the caller to write the data before reading it.

See for example this snipped in PyO3.
https://github.com/PyO3/pyo3/blob/b2163a0916db9fe1baef3bcd3e224531dce508fe/src/types/bytearray.rs#L78-L97

    pub fn new_with<F>(py: Python<'_>, len: usize, init: F) -> PyResult<Bound<'_, PyByteArray>>
    where
        F: FnOnce(&mut [u8]) -> PyResult<()>,
    {
        unsafe {
            // Allocate buffer and check for an error
            let pybytearray: Bound<'_, Self> =
                ffi::PyByteArray_FromStringAndSize(core::ptr::null(), len as ffi::Py_ssize_t)
                    .assume_owned_or_err(py)?
                    .cast_into_unchecked();

            let buffer: *mut u8 = ffi::PyByteArray_AsString(pybytearray.as_ptr()).cast();
            debug_assert!(!buffer.is_null());
            // Zero-initialise the uninitialised bytearray
            core::ptr::write_bytes(buffer, 0u8, len);
            // (Further) Initialise the bytearray in init
            // If init returns an Err, pypybytearray will automatically deallocate the buffer
            init(core::slice::from_raw_parts_mut(buffer, len)).map(|_| pybytearray)
        }
    }

We could also initialise it with zero ourselves, but this writes the data 2 times, which is wasteful in my eyes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a # Safety section to this function about this decision. The section needs to include:

  • This is memory unsafe in certain conditions.
    • And describe the conditions.
  • Why we decide take this decision.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/capi/src/bytearrayobject.rs`:
- Around line 25-34: The C-API requires PyByteArray_AsString to return a buffer
with a trailing NUL at buf[len], but current implementation stores exactly len
bytes in PyBytesInner::elements and returns pointers into that Vec, breaking the
invariant (especially for len==0). Fix by ensuring the backing allocation always
has one extra sentinel byte: change PyBytesInner storage/management to allocate
elements with capacity/len+1 (store logical length separately if needed), update
PyByteArray_FromStringAndSize to allocate len+1 and set elements[len]=0, make
PyByteArray_AsString/borrow_buf_mut return pointer into that buffer
(guaranteeing ptr[len]==0), and update PyByteArray_Resize/resize to preserve or
reallocate the extra terminator (when shrinking keep
elements.len()==logical_len+1 and set terminator; when growing reserve len+1 and
set new terminator). Modify references to PyBytesInner::elements accordingly so
Python-visible length remains unchanged while the internal buffer includes the
extra NUL.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 500801e0-537b-4045-a294-b1d4580378be

📥 Commits

Reviewing files that changed from the base of the PR and between 05cec26 and d95e102.

📒 Files selected for processing (4)
  • crates/capi/src/bytearrayobject.rs
  • crates/capi/src/lib.rs
  • crates/capi/src/util.rs
  • crates/vm/src/builtins/bytearray.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • crates/capi/src/lib.rs
  • crates/vm/src/builtins/bytearray.rs
  • crates/capi/src/util.rs

Comment thread crates/capi/src/bytearrayobject.rs
Copy link
Copy Markdown
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@youknowone youknowone merged commit 37fd1dd into RustPython:main Jun 3, 2026
26 checks passed
@bschoenmaeckers bschoenmaeckers deleted the c-api-bytesarray branch June 3, 2026 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants