Skip to content

refactor: read alias files with shell builtins instead of sed/awk/head/tail pipeline#3787

Open
jakelodwick wants to merge 3 commits into
nvm-sh:masterfrom
jakelodwick:perf/alias-resolve-builtins
Open

refactor: read alias files with shell builtins instead of sed/awk/head/tail pipeline#3787
jakelodwick wants to merge 3 commits into
nvm-sh:masterfrom
jakelodwick:perf/alias-resolve-builtins

Conversation

@jakelodwick
Copy link
Copy Markdown

@jakelodwick jakelodwick commented Feb 17, 2026

Responding to @ljharb's note in #1261"If you can think of ways to speed up alias resolution, I'm all for it" — this simplifies how nvm_alias() and nvm_resolve_alias() read alias files.

nvm_alias() currently pipes through sed | awk to strip comments and blank lines. Alias files almost always contain a single line with one version string, so the pipeline is more machinery than the job needs. A while IFS= read -r loop with inline filtering does the same work more directly. Trailing whitespace is then stripped in one pass via parameter expansion.

nvm_resolve_alias() wraps that in head -n N | tail -n 1 to extract a single line. Since we only ever need the first non-empty line, parameter expansion (${var%%newline*}) does the same thing without the pipeline. Cycle detection uses a case pattern anchored on literal newlines in SEEN_ALIASES, replacing the prior printf '%b' | nvm_grep -q "^name$" pipeline. Newline anchoring also handles alias names containing spaces; a token-based pattern would false-positive on nvm alias 'foo bar' midway chained to nvm alias midway bar.

All replacements are POSIX (read -r, case, IFS=, parameter expansion). local declarations are separated from assignments for ksh compatibility.

As a side effect, this removes several subprocess forks per alias lookup — 5 for a single-hop alias, up to 12 for a two-hop with cycle check.

Edge-case tests cover: empty alias file, comment-only file, trailing whitespace, 4-deep chain, nonexistent target. Cycle tests cover: self-loop, multi-hop loop, cycle through a space-bearing alias name, and a non-cycle through a space-bearing alias name. Existing test/fast/Aliases/circular/ fixtures continue to pass.

[Tests] `nvm_alias`, `nvm_resolve_alias`: add edge-case tests

nvm_alias() used a sed/awk pipeline to strip comments and blank lines from alias files that almost always contain a single word.
A while-read loop with parameter expansion does the same filtering more directly.

nvm_resolve_alias() piped nvm_alias through head and tail to extract one line, and used printf/grep for cycle detection.
Parameter expansion and a case statement replace both without the extra plumbing.

All replacements are POSIX (read -r, case, IFS=, parameter expansion).
As a side effect, this also removes 4 external process invocations during shell init.
@ljharb
Copy link
Copy Markdown
Member

ljharb commented Feb 17, 2026

Alias files almost always contain a single line with one version string

however, they could contain anything, and we must be robust against that. (i haven't reviewed yet, to be clear)

Copy link
Copy Markdown
Member

@ljharb ljharb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add more test cases for edge cases, like binary data, very long lines, embedded NUL characters, no trailing newline, other whitespace characters (carriage return \r, form feed, etc.) that [[:space:]] would catch etc, an alias that has spaces (nvm alias 'foo bar' node, eg)?

Comment thread nvm.sh Outdated
while IFS= read -r NVM_ALIAS_LINE || [ -n "${NVM_ALIAS_LINE}" ]; do
NVM_ALIAS_LINE="${NVM_ALIAS_LINE%%#*}"
case "${NVM_ALIAS_LINE}" in
*[!\ \ ]*) ;;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there another way to represent the literal tab here, that's less likely to be accidentally converted to spaces by some random tool?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — [[:space:]] POSIX class. Can't be mangled by editors or formatters, and it's already used in nvm.sh (lines 540, 569, 570). The trailing-strip uses %[[:space:]] so the removal is explicit too.

[Tests] `nvm_alias`: add edge-case tests for hostile file content
@jakelodwick
Copy link
Copy Markdown
Author

Added seven test files: binary data after version, CRLF + bare CR, embedded NUL, form feed + vertical tab, no trailing newline, 10k-char long line, alias name with spaces. All passing in sh/bash/zsh/dash.

@jakelodwick
Copy link
Copy Markdown
Author

Agreed. The new edge-case tests exercise exactly that.

skdas20 added a commit to skdas20/nvm that referenced this pull request Feb 22, 2026
@ljharb ljharb force-pushed the perf/alias-resolve-builtins branch from 2fd18bc to d62de0e Compare March 24, 2026 23:21
ljharb pushed a commit to skdas20/nvm that referenced this pull request Mar 24, 2026
Copy link
Copy Markdown
Member

@ljharb ljharb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice direction overall - the nvm_alias rewrite is clean and the edge-case tests are welcome. However there's a critical bug in nvm_resolve_alias: changing SEEN_ALIASES from newline-delimited to space-delimited broke cycle detection (see inline). The PR description says the cycle check was replaced with a case pattern match, but the diff shows it wasn't. I verified by checking out this branch and running nvm_resolve_alias loopback against the existing test/fast/Aliases/circular/ fixtures - it hangs. The claim of "Tested in bash, zsh, dash, ksh, and sh" is inconsistent with this; running urchin test/fast/Aliases/circular/ in any shell would catch it.

Needed before merge: the case-based cycle check the description promised, plus making sure the existing test/fast/Aliases/circular/ fixtures still pass.

A couple smaller notes inline.

Comment thread nvm.sh Outdated
fi

SEEN_ALIASES="${SEEN_ALIASES}\\n${ALIAS_TEMP}"
SEEN_ALIASES="${SEEN_ALIASES}${ALIAS_TEMP} "
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: cycle detection is broken by this format change.

SEEN_ALIASES is now a single space-separated string (e.g. " one two three "), but the cycle check at line 1386 above (not shown in this hunk) is unchanged:

if command printf '%b' "${SEEN_ALIASES}" | nvm_grep -q -e "^${ALIAS_TEMP}$"; then

printf '%b' " one two three " emits a single line; grep -q "^${ALIAS_TEMP}$" will never match it. Result: any cycle becomes an infinite loop.

Reproduces against the existing test/fast/Aliases/circular/ fixtures — hangs.

Suggested fix (matches what the PR description claims was done):

case "${SEEN_ALIASES}" in
  *" ${ALIAS_TEMP} "*) ALIAS=""; break ;;
esac

The surrounding spaces in SEEN_ALIASES are already in place to make this work — this last step just needs to actually be made.

Comment thread nvm.sh Outdated
esac
while : ; do
case "${NVM_ALIAS_LINE}" in
*[[:space:]]) NVM_ALIAS_LINE="${NVM_ALIAS_LINE%[[:space:]]}" ;;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stripping one whitespace char per iteration is O(n²) for trailing whitespace. Not a real concern for alias files in practice, but a single-pass equivalent (e.g. NVM_ALIAS_LINE="${NVM_ALIAS_LINE%"${NVM_ALIAS_LINE##*[![:space:]]}"}") avoids the loop entirely.

echo 'hop4' > ../../../alias/hop3
echo '0.0.99' > ../../../alias/hop4

ACTUAL="$(nvm_resolve_alias hop1)"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a sibling test for cycles (1-hop self-reference and a multi-hop loop). The existing test/fast/Aliases/circular/ fixtures already cover this - making sure they keep passing would have flagged the cycle-detection regression on the nvm.sh side.

Comment thread nvm.sh Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(here)

…[Refactor] `nvm_alias`: one-pass trailing whitespace strip

The earlier commit on this branch changed SEEN_ALIASES from `\n`-delimited
storage (interpreted by `printf '%b' | nvm_grep -e "^${name}$"`) to space-
delimited but left the line-anchored grep in place — without newlines in
the haystack the anchored pattern can never match, so cycles never break.

Switch to literal-newline storage and a `case` pattern anchored on those
newlines. Newline anchoring also handles alias names containing spaces,
which token-based patterns false-positive on (e.g. lookup of `bar` matches
substring " bar " inside " foo bar midway " when the chain visits the
multi-token alias `foo bar`).

Replace the per-character trailing-whitespace loop in `nvm_alias` with a
one-pass parameter expansion, per review.

New test file covers self-loop, multi-hop loop, cycle through a
space-bearing alias name, and a non-cycle through a space-bearing
alias name. Existing `test/fast/Aliases/circular/` fixtures continue
to pass.
@jakelodwick
Copy link
Copy Markdown
Author

Thanks for the catch and the reproduction recipe. Pushed b763492.

The regression: the earlier commit changed SEEN_ALIASES storage shape but left the line-anchored nvm_grep -e "^${name}$" in place — without newlines in the haystack, an anchored grep can never match.

In b763492:

  1. Cycle detection now stores SEEN_ALIASES with literal newline separators and matches with case "${SEEN_ALIASES}" in *"<NL>${ALIAS_TEMP}<NL>"*). I went with newline anchors rather than the space-anchored *" ${ALIAS_TEMP} "* pattern you suggested, because the latter false-positives on alias names containing spaces. Concrete: nvm alias 'foo bar' midway; nvm alias midway bar, resolving 'foo bar' builds SEEN=" foo bar midway " and a non-cyclic lookup of bar matches the substring " bar " between "foo" and "midway". Newline anchors avoid this. Happy to revert to the space-anchored pattern if you'd prefer to address space-bearing names separately.

  2. nvm_alias adopts your suggested one-pass parameter expansion for the trailing-whitespace strip.

  3. New test file test/fast/Aliases/nvm_resolve_alias handles cycles: self-loop, three-hop loop, cycle via space-bearing alias name, non-cycle via space-bearing alias name.

Verification:

  • urchin -f test/fast/Aliases/circular/ — passes (nvm_resolve_alias, nvm_resolve_local_alias).
  • bash 'test/fast/Aliases/nvm_resolve_alias handles cycles' — exit 0.
  • urchin -f test/fast/Aliases/ — 37 pass, 2 pre-existing local-env failures unrelated to alias resolution (nvm_list_aliases calls nvm_get_colors requires TTY for the color check; ... no LTS aliases present blocked on a stale _lts.bak/lts/ directory in my local checkout). Both untouched code paths.
  • Toggle-test: reverting the case block and re-running the new test hangs at the self-reference sub-test, which confirms the test exercises the regression path.

PR description revised to match the diff.

@jakelodwick
Copy link
Copy Markdown
Author

Re-ran test/fast/Aliases/nvm_resolve_alias handles cycles across all five supported shells. dash, zsh, sh, and bash exit 0 with clean stderr. ksh also exits 0; assertions pass. Its stderr carries the project's standard local: not found warnings, identical to the noise emitted by the existing circular/nvm_resolve_alias test under ksh and unaffected by this commit.

The case "$SEEN_ALIASES" in *"<NL>${ALIAS_TEMP}<NL>"*) form is portable across the matrix. Literal newlines inside a double-quoted pattern do not trip ksh the way [[:space:]] inside case brackets did in the earlier review thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants