Skip to content

Support resize on PyBytes when initialised via with_new #4003

@AudriusButkevicius

Description

@AudriusButkevicius

I'm building a rust library that I expose to python via pyo3. This library is generating quite a lot of data in rust and handing it over to python (100GB for example).

One problem with how I generate the data is that I don't know exactly how much data there will be until it's generated (decompression), so I cannot allocate a correctly sized PyBytes object ahead of time, I need to first decompress the data in a rust allocated buffer, check its size, then allocate a PyBytes of the right size, and copy the data over. Given the amount of data, the copy from rust storage to PyBytes becomes a large overall cost of the api.

There also seems to be no C python api to allocate a bytes object from an existing block of memory, without copying the data, this is because the object definition has to be immediately before the memory that is the content of the object.

However, C api supports _Py_Resize
There are some constraints of when it can and cannot be used, but I think if used from within (or shortly after) the PyBytes::with_new closure, passes all the checks.

My proposal is to either:

  1. Allow returning "used" size from within the PyBytes::with_new init closure, which should be equal or less than the initial requested allocation, which if less than initial allocation, calls _Py_Resize after the with_new closure.
  2. Implement a new new_with equivalent constructor, that gets a Vec like interface which allows both growing and shrinking the PyBytes object from within the closure.

Currently I'm working around this by allocating a PyObject myself, doing object pointer lifetime juggling myself. Seems to work however I'm not sure I understand who owns the object at what time, or how to free it properly in case ownership never gets handed over to the interpreter. Tons of unsafe code I don't feel comfortable with.

I also cannot use bytearray because I genuinely do not want the data to be modified from python.

I'd be happy to implement 1, however I feel I might be too new to rust to implement 2 (could do with hand holding I guess).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions