Skip to content

Fix drop_start on JavaScript for multi-byte strings#925

Open
jtdowney wants to merge 2 commits into
gleam-lang:mainfrom
jtdowney:fix-drop-start-js
Open

Fix drop_start on JavaScript for multi-byte strings#925
jtdowney wants to merge 2 commits into
gleam-lang:mainfrom
jtdowney:fix-drop-start-js

Conversation

@jtdowney
Copy link
Copy Markdown
Member

string_byte_slice used String.prototype.slice, which operates on UTF-16 code units, but it is called with UTF-8 byte offsets from byte_size. Encode to UTF-8, slice the byte array, then decode back.

This is a simple fix, but there is a downside: the string param to drop_start will be encoded to UTF-8 twice, once for byte_size and again for string_byte_slice. A better path might be to lift drop_start to a native implementation in JS.

Closes #924

@giacomocavalieri
Copy link
Copy Markdown
Member

Mhm encoding it twice seems to be a bit of a waste, maybe it's fine for this to have a native JS implementation?

@lpil
Copy link
Copy Markdown
Member

lpil commented Jun 3, 2026

Yes I think so. Normally I would want to avoid that, but performance is key in the standard library, so we can do things that would otherwise be unappealing.

@jtdowney
Copy link
Copy Markdown
Member Author

jtdowney commented Jun 3, 2026

I'll take a look at a JS implementation for drop_start

The JavaScript target reused unsafe_byte_slice with Erlang byte
offsets, but JS strings are indexed by UTF-16 code units, so
drop_start returned wrong results for multi-byte input. Add a
target-specific implementation that slices off the grapheme
prefix using code-unit lengths.
@jtdowney jtdowney force-pushed the fix-drop-start-js branch from fd17505 to fd6f0da Compare June 3, 2026 17:25
@jtdowney
Copy link
Copy Markdown
Member Author

jtdowney commented Jun 3, 2026

@lpil @giacomocavalieri I've pushed a new version that changes string.drop_start to a native implementation.

Comment thread src/gleam_stdlib.mjs Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

string.drop_start behaves wrong on the JS target

3 participants