Skip to content

fix: code hardening to prevent bugs#171

Open
adbar wants to merge 4 commits into
mainfrom
fix/code_hardening
Open

fix: code hardening to prevent bugs#171
adbar wants to merge 4 commits into
mainfrom
fix/code_hardening

Conversation

@adbar

@adbar adbar commented Jun 11, 2026

Copy link
Copy Markdown
Owner
  • Correctness: NFC input normalization, prevent errors and add corresponding tests
  • Security: add token-length cap
  • Simplification: single-pass affix collapse, faster native .get() dictionary access, code simplification
  • Maintenance: replace flake8 with ruff and adjust code

Comment thread simplemma/strategies/dictionaries/dictionary_factory.py Dismissed
Comment thread simplemma/strategies/dictionaries/dictionary_factory.py Dismissed
@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.84%. Comparing base (c8d9821) to head (b07b5d0).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #171      +/-   ##
==========================================
- Coverage   97.92%   97.84%   -0.08%     
==========================================
  Files          36       36              
  Lines         626      651      +25     
==========================================
+ Hits          613      637      +24     
- Misses         13       14       +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@adbar adbar requested a review from juanjoDiaz June 11, 2026 15:28
@adbar adbar linked an issue Jun 11, 2026 that may be closed by this pull request

@juanjoDiaz juanjoDiaz left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm!

Just couple of minor comments!

SHORTER_GREEDY = {"bg", "et", "fi", "lv"}


def greedy_min_length(lang: str) -> int:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a static method of the GreedyDictionaryLookupStrategy to keep the class-based approach?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think module-level placement is easier here.

str | None: The lemma for the token, or None if no lemma is found.

"""
if not any(hyphen in token for hyphen in HYPHENS):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this faster than the regex?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the most common case is that most tokens don't contain hyphens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Establish linting and quality tools

3 participants