You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 29, 2025. It is now read-only.
First of all, thanks a lot for providing this project! It makes it so much easier to work with UTF-8 data.
I'm aware that this might be out of the scope of this project, so I figured I'd just ask what you all think about this. When porting my code from std::string to tiny_utf8::string I encountered various issues, where the mismatch of size and length caused issues.
E.g., my code uses templates with std::size(...) to work on arbitrary data types. It doesn't work on tiny_utf8::strings though, since size() is the raw byte size, but operator[] expects a codepoint index. It would be nice of tiny_utf8 would be consistent with other STL containers.
Various other functions also made use of both size() and length(). Yes, it can also be fixed on my side, but (1) its difficult to get this done correctly in a large code base, and (2) it does no longer work as a "quick drop-in replacement" as advertised in the README.
So, what I'm considering is adding a new raw_size() (similar to raw_at, raw iterators, ...) that returns the byte size, and change the default behavior of size to match the length. This is obviously not a backwards compatible change, but (1) there have also been other non-backwards compatible changes and (2) there could still be a define-parameter to switch between both behaviors.
What do you think? If its out of the scope I'll come up with a different solution. :)