hanparse

A lightweight, rule-based Korean sentence parser written in TypeScript.

Overview

hanparse is designed to parse Korean sentences using a deterministic, rule-driven approach based on a classic maximum-matching (longest chunk) algorithm. It runs entirely on the client side or edge environments, with no backend or heavy AI dependencies.

Project Status

hanparse is in an early but active development stage. The parser currently implements around 30+ core grammatical rules, enabling it to handle basic Korean sentence patterns and demonstrate the underlying architecture.

A preliminary lemmatization system is included, though the tool is not yet ready for practical use. Ongoing work focuses on expanding rule coverage, refining lemmatization accuracy, and improving usability to move toward a production-ready parser.

Contributions are welcome to help extend parsing rules and enhance performance.

Features

Deterministic Parsing: Predictable behavior with 100% consistent structural analysis.
Ultra-Lightweight & Edge-Ready: Zero heavy dependencies and no external dictionary required. Perfect for frontend browsers or constrained edge runtimes (e.g., Cloudflare Workers, Vercel Edge Functions).
Extensible Rule System: Rules are centralized at the top of the codebase. You can easily add new grammatical patterns without touching the core matching logic.

System Requirements

Production

Any modern browser or edge runtime with ES6 support

Development

Bun (for dependency management and tooling)
Make (for running build, type-check, and test tasks)
Perl 5.36+ (for running tool scripts)
Carton (for managing tool script dependencies)
ChunkSpec (for grammar rules)

Install

$ cd path/to/hanparse
$ bun install
$ carton install

Usage

Library

Run make release and locate the compiled artifact in the dist/ directory.

Parser

To compile the parser, run:

$ make release

To parse a Korean sentence, run:

$ ./bin/hanparse "이것은 무엇이에요?"

You can also omit the quotation marks:

$ ./bin/hanparse 이것은 무엇이에요?

Design Goals

Deterministic behavior
Zero backend dependency
Lightweight enough to run anywhere

Non-Goals

Dictionary lookup or semantic understanding
Grammar checking or correction

🤝 Contributing

Why open-source?
hanparse exists because no single person can cover the entire Korean language. Ending particles (Eomi) and postpositions (Josa) are too rich and complex.

Rules
You don’t need to touch the core code. Just edit data/rules.md: copy an existing rule object, adapt it, and submit a PR. The rule scheme is still evolving, so focus on expanding coverage and experimenting. Major changes can be discussed before submission.

We migrated the grammar rules from JSON to ChunkSpec embedded in Markdown to eliminate redundant boilerplate.

Proper nouns
We also keep a small dictionary of proper nouns in data/proper-noun.csv. It’s minimal, meant as a proof of concept. Contributors are welcome to add common names, places, or brands—especially those found in beginner-level Korean materials.

👉 No coding required: If you can read and edit Markdown/CSV, you can already contribute.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
bin		bin
data		data
src		src
test		test
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
bun.lock		bun.lock
convert		convert
cpanfile		cpanfile
cpanfile.snapshot		cpanfile.snapshot
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hanparse

Overview

Project Status

Features

System Requirements

Production

Development

Install

Usage

Library

Parser

Design Goals

Non-Goals

🤝 Contributing

Copyright

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

hanparse

Overview

Project Status

Features

System Requirements

Production

Development

Install

Usage

Library

Parser

Design Goals

Non-Goals

🤝 Contributing

Copyright

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages