Chore: crashbox integration#31
Merged
Merged
Conversation
- crashbox: enable `memory` detector + device memoryBudgetBytes; getMemoryEstimate pull source from web-llm load progress; reportMemoryPressure on OOM/pre-flight - web-llm-memory.js: device budget (deviceMemory×0.6 / 1.5 GB iOS const), model pre-flight (vram + buffer-cap vs maxStorageBufferBindingSize), cumulative footprint across resident models - single model in memory by default: loading a web-llm chat model evicts others (unload + clearLoad), everywhere (chat + Models table); experimentalMultipleModels setting to allow stacking - web-llm: new MLCEngine()+reload() holding the handle → unloadLlmEngine() tears down even in-flight loads - 3-state model badge: Not loaded / Cached / Loaded (isLlmCached probe via context getCached) - crashes panel: warning rows show level·ratio·source; cap live warnings to newest 3; simulate-pressure debug button - dev (revert before release): import map points crashbox at local /vendor/crashbox symlink; gitignore public/vendor
…y-based model recs - memory pressure: headroom-relative estimate with a KV-cache growth term; desktop budget tuning (deviceMemory ≥ 8 → ×0.9) plus a manual override setting - web-llm loads serialize through a queue (evict-after-settle) so a second click no longer aborts the first's download; one model resident by default - unload-from-memory and delete-from-disk actions in the Models table and loading buttons, gated to memory-managing providers (chrome excluded) - reasoning models: strip `<think>` from the answer, view it via a dev-mode icon, `enable_thinking` toggle wired through to Qwen3/DeepSeek-R1 only - recommendations: rank by capability (params → quant → Qwen gen → VRAM), block unknown-VRAM models, refresh the web-llm model lineup - crashes panel: copy/toggle/reset-crashbox controls; System panel reorg + cache-backend row
… on unmount to stop leaked setState calls after the provider unmounts
…opyButton, errMessage helper - think: replace stripThinking/extractThinking/hasThinking with one single-pass parseThinking; memoize per conversation entry so streaming only re-parses the growing entry instead of re-scanning every entry each token - data/models-table: memoize fitCtx and the assessModelFit pass so fit stops recomputing across ~90 models on every load-status tick - copy-button: extract a shared CopyButton with the execCommand fallback (fixes answer-side copy silently failing in non-secure/legacy contexts) - telemetry: add errMessage() and use it for the 5 duplicated error-truncation breadcrumb sites
…RS instead of a "webLlm" literal Replace the three provider === "webLlm" checks with usesSingleModelPolicy(), backed by a SINGLE_MODEL_PROVIDERS config list (mirrors MEMORY_MANAGED_PROVIDERS). Rename evictOtherResidentModels/singleModelLoadQueue to provider-agnostic names. No behavior change — web-llm is the sole entry; future in-page providers opt in via the list.
There was a problem hiding this comment.
Pull request overview
This PR integrates crashbox telemetry/crash recovery into the local-LLM stack, adds web-llm memory estimation + eviction policies, and extends the UI with device-fit model recommendations, 3‑state model caching status/actions, and reasoning <think> handling (hidden from normal answers, viewable in dev mode).
Changes:
- Add crashbox wrapper + UI (Crashes tab/panel), plus breadcrumb/snapshot wiring across loading and routing.
- Add web-llm memory estimation (preflight + during-load + KV-cache headroom), single-model eviction (default), and unload/delete-from-disk actions.
- Add device-aware model fit scoring + “Best for this device” recommendation, plus
<think>parsing + optional “Model Thinking” setting.
Reviewed changes
Copilot reviewed 29 out of 30 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| public/styles.css | Adds cached/delete icon styling, crashes panel styling, and tabs overflow tweaks. |
| public/shared-config.js | Updates default web-llm model picks; adds provider capability flags for memory management + single-model policy. |
| public/local/data/util.js | Adds UA-based getDeviceInfo() for device-class heuristics (notably iOS). |
| public/local/data/telemetry.js | Introduces crashbox shim (bootstrap/shutdown, external-store snapshot, breadcrumbs, merged snapshots, memory budget). |
| public/local/data/recommendations.js | Adds pure fit assessment + best-model selection utilities for UI “Fit” and recommendations. |
| public/local/data/loading.js | Adds telemetry breadcrumbs/snapshot updates; single-model eviction + serialized load queue; unload/delete cache APIs. |
| public/local/data/api/search.js | Wraps extractor load/query + vector search with telemetry wrap(); attaches GPU device when available. |
| public/local/data/api/rag.js | Wraps RAG context build step with telemetry wrap(). |
| public/local/data/api/providers/web-llm.js | Reworks engine lifecycle for preflight + load telemetry, OOM reporting, unload, and cache deletion; adds thinking support knob. |
| public/local/data/api/providers/web-llm-memory.js | Implements synchronous crashbox memory estimator (load progress + KV-cache growth) + preflight checks. |
| public/local/data/api/providers/chrome.js | Adds telemetry wrapping/breadcrumbs; adds no-op unload for OS-managed provider. |
| public/local/data/api/llm.js | Adds provider-agnostic unloadLlmEngine() + deleteModelCache() APIs. |
| public/local/data/api/chat-session.js | Threads enableThinking through to provider handlers. |
| public/local/app/context/loading.js | Tracks on-disk cache state and exposes unload/delete actions to components; fixes dynamic subscription cleanup. |
| public/local/app/components/models-table.js | Adds Fit column, cached status, unload/delete actions, and fit memoization/sorting support. |
| public/local/app/components/loading/button.js | Adds cached state and unload/delete actions to the loading button UI. |
| public/index.html | Adds crashbox importmap entry and early crashbox bootstrap gated by settings; adds device info into config. |
| public/app/util/think.js | Adds parseThinking() utility to strip/extract <think> blocks efficiently. |
| public/app/pages/settings.js | Adds toggles for crashbox/multi-models/thinking and memory-budget override; supports live crashbox enable/disable. |
| public/app/pages/data.js | Adds Fit-based recommendations UI + Crashes tab/panel wiring + device profile display. |
| public/app/pages/chat.js | Uses memoized <think> parsing per entry; gates answer rendering on visible content. |
| public/app/hooks/use-settings.js | Adds new settings defaults for crashbox/memory/multi-models/thinking. |
| public/app/hooks/use-crashbox.js | Adds useSyncExternalStore hook for crashbox UI state. |
| public/app/hooks/use-chat-session.js | Passes enableThinking into chat-session creation. |
| public/app/components/layout.js | Adds route tracking into crashbox snapshot + breadcrumbs. |
| public/app/components/forms.js | Adds cached status icon support to model selector. |
| public/app/components/crashes-panel.js | Adds dev-mode Crashes panel for recovered crash details, warnings, and debug tools. |
| public/app/components/copy-button.js | Adds shared copy button with async clipboard + execCommand fallback. |
| public/app/components/answer.js | Hides <think> from rendered/copyable answer; adds dev-only reasoning viewer action. |
| .gitignore | Ignores public/vendor/ temp development directory. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…s, accurate breadcrumb field name - LoadingButton: allow click-to-retry from error state (startLoading already retries; matches models table); add retry title hint - CopyButton: only show "Copied!" on real success; route absent navigator.clipboard to the execCommand fallback instead of silently no-opping - web-llm/chrome breadcrumbs: rename tokensSoFar -> charsSoFar (it's a character count, not tokens)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Integrates crashbox (crash detection + telemetry) and adds web-llm memory management, device-fit model recommendations, and reasoning
<think>support to the local-LLM stack.Crashbox / telemetry
telemetry.js): breadcrumbs, snapshot merging, memory-pressure reporting, GPU-device attach, crash recovery.experimentalCrashbox): recovered-crash details, live warnings, session diagnostics, debug actions.navigator.deviceMemory(iOS fallback constant), overridable viamemoryBudgetMb.web-llm memory management
experimentalMultipleModelsallows stacking.Recommendations / UI
assessModelFit/pickBestModel: per-device fit tiers feeding a "Fit" column in the models table and a "Best for this device" card.<think>stripped from visible answer, surfaced via dev-mode viewer;enableThinkingsetting.CopyButton(with execCommand fallback); sharederrMessagetruncation helper.Settings
experimentalCrashbox,experimentalMultipleModels,enableThinking,memoryBudgetMb. Crashbox toggles live (no reload).