refactor: read alias files with shell builtins instead of sed/awk/head/tail pipeline#3787
refactor: read alias files with shell builtins instead of sed/awk/head/tail pipeline#3787jakelodwick wants to merge 3 commits into
Conversation
[Tests] `nvm_alias`, `nvm_resolve_alias`: add edge-case tests nvm_alias() used a sed/awk pipeline to strip comments and blank lines from alias files that almost always contain a single word. A while-read loop with parameter expansion does the same filtering more directly. nvm_resolve_alias() piped nvm_alias through head and tail to extract one line, and used printf/grep for cycle detection. Parameter expansion and a case statement replace both without the extra plumbing. All replacements are POSIX (read -r, case, IFS=, parameter expansion). As a side effect, this also removes 4 external process invocations during shell init.
however, they could contain anything, and we must be robust against that. (i haven't reviewed yet, to be clear) |
ljharb
left a comment
There was a problem hiding this comment.
It would be good to add more test cases for edge cases, like binary data, very long lines, embedded NUL characters, no trailing newline, other whitespace characters (carriage return \r, form feed, etc.) that [[:space:]] would catch etc, an alias that has spaces (nvm alias 'foo bar' node, eg)?
| while IFS= read -r NVM_ALIAS_LINE || [ -n "${NVM_ALIAS_LINE}" ]; do | ||
| NVM_ALIAS_LINE="${NVM_ALIAS_LINE%%#*}" | ||
| case "${NVM_ALIAS_LINE}" in | ||
| *[!\ \ ]*) ;; |
There was a problem hiding this comment.
is there another way to represent the literal tab here, that's less likely to be accidentally converted to spaces by some random tool?
There was a problem hiding this comment.
Yes — [[:space:]] POSIX class. Can't be mangled by editors or formatters, and it's already used in nvm.sh (lines 540, 569, 570). The trailing-strip uses %[[:space:]] so the removal is explicit too.
[Tests] `nvm_alias`: add edge-case tests for hostile file content
|
Added seven test files: binary data after version, CRLF + bare CR, embedded NUL, form feed + vertical tab, no trailing newline, 10k-char long line, alias name with spaces. All passing in sh/bash/zsh/dash. |
|
Agreed. The new edge-case tests exercise exactly that. |
…d/tail pipeline (fixes nvm-sh#3787)
2fd18bc to
d62de0e
Compare
…ad/tail pipeline See nvm-sh#3787.
ljharb
left a comment
There was a problem hiding this comment.
Nice direction overall - the nvm_alias rewrite is clean and the edge-case tests are welcome. However there's a critical bug in nvm_resolve_alias: changing SEEN_ALIASES from newline-delimited to space-delimited broke cycle detection (see inline). The PR description says the cycle check was replaced with a case pattern match, but the diff shows it wasn't. I verified by checking out this branch and running nvm_resolve_alias loopback against the existing test/fast/Aliases/circular/ fixtures - it hangs. The claim of "Tested in bash, zsh, dash, ksh, and sh" is inconsistent with this; running urchin test/fast/Aliases/circular/ in any shell would catch it.
Needed before merge: the case-based cycle check the description promised, plus making sure the existing test/fast/Aliases/circular/ fixtures still pass.
A couple smaller notes inline.
| fi | ||
|
|
||
| SEEN_ALIASES="${SEEN_ALIASES}\\n${ALIAS_TEMP}" | ||
| SEEN_ALIASES="${SEEN_ALIASES}${ALIAS_TEMP} " |
There was a problem hiding this comment.
Critical: cycle detection is broken by this format change.
SEEN_ALIASES is now a single space-separated string (e.g. " one two three "), but the cycle check at line 1386 above (not shown in this hunk) is unchanged:
if command printf '%b' "${SEEN_ALIASES}" | nvm_grep -q -e "^${ALIAS_TEMP}$"; thenprintf '%b' " one two three " emits a single line; grep -q "^${ALIAS_TEMP}$" will never match it. Result: any cycle becomes an infinite loop.
Reproduces against the existing test/fast/Aliases/circular/ fixtures — hangs.
Suggested fix (matches what the PR description claims was done):
case "${SEEN_ALIASES}" in
*" ${ALIAS_TEMP} "*) ALIAS="∞"; break ;;
esacThe surrounding spaces in SEEN_ALIASES are already in place to make this work — this last step just needs to actually be made.
| esac | ||
| while : ; do | ||
| case "${NVM_ALIAS_LINE}" in | ||
| *[[:space:]]) NVM_ALIAS_LINE="${NVM_ALIAS_LINE%[[:space:]]}" ;; |
There was a problem hiding this comment.
Stripping one whitespace char per iteration is O(n²) for trailing whitespace. Not a real concern for alias files in practice, but a single-pass equivalent (e.g. NVM_ALIAS_LINE="${NVM_ALIAS_LINE%"${NVM_ALIAS_LINE##*[![:space:]]}"}") avoids the loop entirely.
| echo 'hop4' > ../../../alias/hop3 | ||
| echo '0.0.99' > ../../../alias/hop4 | ||
|
|
||
| ACTUAL="$(nvm_resolve_alias hop1)" |
There was a problem hiding this comment.
Worth adding a sibling test for cycles (1-hop self-reference and a multi-hop loop). The existing test/fast/Aliases/circular/ fixtures already cover this - making sure they keep passing would have flagged the cycle-detection regression on the nvm.sh side.
…[Refactor] `nvm_alias`: one-pass trailing whitespace strip
The earlier commit on this branch changed SEEN_ALIASES from `\n`-delimited
storage (interpreted by `printf '%b' | nvm_grep -e "^${name}$"`) to space-
delimited but left the line-anchored grep in place — without newlines in
the haystack the anchored pattern can never match, so cycles never break.
Switch to literal-newline storage and a `case` pattern anchored on those
newlines. Newline anchoring also handles alias names containing spaces,
which token-based patterns false-positive on (e.g. lookup of `bar` matches
substring " bar " inside " foo bar midway " when the chain visits the
multi-token alias `foo bar`).
Replace the per-character trailing-whitespace loop in `nvm_alias` with a
one-pass parameter expansion, per review.
New test file covers self-loop, multi-hop loop, cycle through a
space-bearing alias name, and a non-cycle through a space-bearing
alias name. Existing `test/fast/Aliases/circular/` fixtures continue
to pass.
|
Thanks for the catch and the reproduction recipe. Pushed The regression: the earlier commit changed In
Verification:
PR description revised to match the diff. |
|
Re-ran The |
Responding to @ljharb's note in #1261 — "If you can think of ways to speed up alias resolution, I'm all for it" — this simplifies how
nvm_alias()andnvm_resolve_alias()read alias files.nvm_alias()currently pipes throughsed | awkto strip comments and blank lines. Alias files almost always contain a single line with one version string, so the pipeline is more machinery than the job needs. Awhile IFS= read -rloop with inline filtering does the same work more directly. Trailing whitespace is then stripped in one pass via parameter expansion.nvm_resolve_alias()wraps that inhead -n N | tail -n 1to extract a single line. Since we only ever need the first non-empty line, parameter expansion (${var%%newline*}) does the same thing without the pipeline. Cycle detection uses acasepattern anchored on literal newlines inSEEN_ALIASES, replacing the priorprintf '%b' | nvm_grep -q "^name$"pipeline. Newline anchoring also handles alias names containing spaces; a token-based pattern would false-positive onnvm alias 'foo bar' midwaychained tonvm alias midway bar.All replacements are POSIX (
read -r,case,IFS=, parameter expansion).localdeclarations are separated from assignments for ksh compatibility.As a side effect, this removes several subprocess forks per alias lookup — 5 for a single-hop alias, up to 12 for a two-hop with cycle check.
Edge-case tests cover: empty alias file, comment-only file, trailing whitespace, 4-deep chain, nonexistent target. Cycle tests cover: self-loop, multi-hop loop, cycle through a space-bearing alias name, and a non-cycle through a space-bearing alias name. Existing
test/fast/Aliases/circular/fixtures continue to pass.