Releases · smart-models/Smart-Embedder

17 Jun 20:06

federicopalma-pro

1.2.0

2bea022

v1.2.0 Latest

Latest

Smart Embedder 1.2.0

Token truncation visibility + per-backend token limits.

✨ Added

Truncation warnings in API responses. /embed and /rerank now return a warnings array. When input exceeds the model token limit,
the response reports model, max_tokens, original_tokens, truncated_tokens, and truncation_side instead of silently truncating.
Per-backend token length limits. Token limits split per backend (BGE-M3, Qwen dense, Qwen rerank) instead of one shared payload
limit.
version field in /config response ("version": "1.2.0").

🔧 Changed

Default token limits now set to each model's documented maximum (e.g. QWEN_RERANK_MAX_LENGTH=32768, BGE-M3 8192). Launcher
VRAM-tuned values still apply when env unset.
Removed legacy shared payload-limit settings.

🐛 Fixed

Qwen rerank decode drift: re-truncate to correct token boundary.
Suppressed noisy tokenizer length warnings in logs.

✅ Tests

Runtime suite now 17 checks (18 with --token); added coverage for truncation-warning response shape.

Assets 2

08 Jun 12:00

federicopalma-pro

1.1.0

fab8660

v1.1.0

Smart Embedder 1.1.0

This release separates CPU and GPU setups into dedicated, explicit artifacts and rebrands all deployment components to
smart-embedder.

✨ Highlights

Split CPU/GPU setup

New requirements-cpu.txt (CPU-only PyTorch wheel, torch==2.7.0+cpu) alongside requirements-gpu.txt (CUDA torch==2.7.0+cu126).
Same pins otherwise, so model compatibility is unchanged.
New Dockerfile.cpu built on python:3.11-slim instead of the nvidia/cuda base — no CUDA libraries pulled in CPU mode (CPU image
~2.31 GB, verified build).
start_server.bat / start_server.sh now install the correct requirements file automatically based on the selected device (cpu /
gpu).
docker-compose.cpu.yml builds the slim CPU image with a distinct image tag and container name.

Rebrand to smart-embedder

Compose project name: smart-embedder
Image tags: smart-embedder:gpu / smart-embedder:cpu
Container names: smart-embedder-gpu / smart-embedder-cpu (now distinct, so GPU and CPU can run side by side)
Default Hugging Face cache volume: smart-embedder-hf-cache
Updated OCI image labels

Versioning & API

Application version bumped to 1.1.0 (FastAPI version, Prometheus server_info)
Device-aware Swagger title: shows Smart Embedder GPU or Smart Embedder CPU depending on detected hardware

⚠️ Breaking changes

Default file names removed. Bare commands no longer work — pass explicit files:
- GPU: docker compose -f docker-compose.gpu.yml up
- CPU: docker compose -f docker-compose.gpu.yml -f docker-compose.cpu.yml up
- Local install: pip install -r requirements-gpu.txt (or requirements-cpu.txt)
- Or just use start_server.bat / start_server.sh, which handle this automatically.
HF cache volume renamed to smart-embedder-hf-cache. To avoid re-downloading models (~3 GB), migrate the old volume:
docker volume create smart-embedder-hf-cache
docker run --rm -v bge-m3-embedder-reranker-hf-cache:/from -v smart-embedder-hf-cache:/to alpine sh -c "cp -a /from/. /to/"

Assets 2

29 May 15:17

federicopalma-pro

1.0.0

39ce424

v1.0.0

🚀 Smart Embedder v1.0.0

First stable release of Smart Embedder — a lightweight, self-hosted embedding server for hybrid search pipelines, running
entirely on local hardware with no cloud dependency.

Core Features

Hybrid search stack — dense vectors (BGE-M3 or Qwen3), sparse lexical matching (BM25/SPLADE), and ColBERT late-interaction
reranking in a single FastAPI server
Selectable embedding backend — switch between BGE-M3 and Qwen3 dense embeddings at startup via environment variable
Interactive reranker selection — choose between BGE-M3 and Qwen3 reranker at startup
GPU VRAM auto-tuning — batch sizes and model parameters automatically calibrated to available VRAM
Shared GPU executor — serialized inference across all endpoints to prevent CUDA OOM under concurrent load
CPU fallback — full functionality on CPU when no NVIDIA GPU is available

Infrastructure

Docker + Docker Compose support (GPU and CPU profiles)
Prometheus metrics endpoint
Optional Bearer token authentication
Graceful shutdown with request queue drainage
Configurable PORT binding via environment variable

Models Supported

Role	Models
Dense embeddings	`BAAI/bge-m3`, `Qwen3-Embedding`
Sparse embeddings	`BAAI/bge-m3`
Reranker	`BAAI/bge-reranker-v2-m3`, `Qwen3-Reranker`

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

🚀 Smart Embedder v1.0.0

Core Features

Infrastructure

Models Supported

Uh oh!

Uh oh!

Releases: smart-models/Smart-Embedder

v1.2.0

Uh oh!

v1.1.0

Uh oh!

v1.0.0

🚀 Smart Embedder v1.0.0

Core Features

Infrastructure

Models Supported

Uh oh!