fix(sync-service): match Postgres LIKE semantics for newlines and escaped wildcards#4437
fix(sync-service): match Postgres LIKE semantics for newlines and escaped wildcards#4437sravan27 wants to merge 1 commit into
Conversation
…aped wildcards
`Casting.like?/3` compiled a SQL LIKE pattern into a regex anchored with `^..$`
and without the dotall flag. This diverges from Postgres in three ways, each of
which can silently include/exclude the wrong rows for a shape `WHERE col LIKE ..`
filter:
* `%` and `_` did not match newline characters
('a\nb' LIKE 'a%b' returned false; Postgres returns true)
* a trailing newline in the value was ignored
('ab\n' LIKE 'ab' returned true; Postgres returns false)
* escaped wildcards were matched literally including the backslash
('hell%' LIKE 'hell\%' returned false; Postgres returns true)
Compile with `:dotall`, anchor with `\A..\z` (absolute boundaries), and translate
the pattern with a backslash-aware pass so `\%`/`\_`/`\\` produce literal
characters. Adds regression tests covering all three cases.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4437 +/- ##
===========================================
+ Coverage 34.53% 57.26% +22.73%
===========================================
Files 188 293 +105
Lines 14351 30345 +15994
Branches 4897 8394 +3497
===========================================
+ Hits 4956 17377 +12421
- Misses 9381 12951 +3570
- Partials 14 17 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hey, thanks a lot for the contribution! This is seems like a valid fix, any reason the PR is draft? |
|
Thanks! It was draft only because I opened it before waiting for the full CI / sync-service validation. I moved it out of draft after the Elixir formatting, Lux integration, sync-service pg14/15/17/18, TS formatting, package typecheck/test, and Codecov checks came back clean. I checked the remaining red job as well. The failing So that looks like a fork / GitHub token / GHCR package permission issue rather than a code/test failure from this patch. One small semantic note from the PR body is still open: a trailing lone backslash in a LIKE pattern is currently treated as a literal backslash by this patch, while Postgres raises |
|
Thanks @icehaunter! No real reason — it just started as a draft while I reduced the repros at the Each of the three divergences has a function-level reduced repro and a regression test (newline/dotall wildcards, the |
Summary
Electric.Replication.PostgresInterop.Casting.like?/3— used byLIKE/ILIKEin shape
whereclauses — compiles a SQLLIKEpattern into a regex anchoredwith
^…$and without the dotall flag. That diverges from PostgresLIKEsemantics in three ways. Because
like?/3decides row membership for a shapefilter, each divergence can silently include or exclude the wrong rows
relative to Postgres.
Divergences (reduced repros, at the function level)
Casting.like?(before)like?("a\nb", "a%b")truefalselike?("a\nb", "a_b")truefalselike?("ab\n", "ab")falsetruelike?("hell%", "hell\\%")truefalselike?("hell_", "hell\\_")truefalseRoot causes:
%→.*and_→., but.does not match newlines without the dotallflag, so the wildcards don't span newlines (Postgres' do).
^…$; in PCRE/:re,$also matches beforea trailing newline, so
'ab\n' LIKE 'ab'wrongly matches. Postgres requiresthe pattern to cover the whole value.
(?<!\\)[_%]leaves escaped wildcards inside literal chunks, soRegex.escape/1escapes the backslash too —\%ends up matching aliteral
\%rather than a literal%.Fix
\X→ literalX(so
\%,\_,\\become literal%,_,\),%→.*,_→.,everything else →
Regex.escape/1.\A…\z(absolute string boundaries) instead of^…$.:dotallso the wildcards match newlines.Tests
Adds
describe "like?/2 Postgres compatibility"incasting_test.exscoveringall three cases (newlines, trailing newline, escaped wildcards) plus an
ilike?/2case. The existinglike?doctests are unchanged and still pass.Notes
reference implementation of both the old and new behaviour against Postgres'
documented semantics. Opened as a draft pending a CI /
mix testrun forthe sync-service package.
Postgres instead raises
LIKE pattern must not end with escape character.Happy to switch to raising if you'd prefer to match that exactly.