Every paper your team shares — found and mapped.
Dashboards rebuild & redeploy automatically every Sunday (GitHub Actions cron); the landing page footer shows the last data update.
PaperTrail automatically discovers papers shared across your Slack workspace, enriches them with metadata, computes LLM semantic embeddings, and builds an interactive visual dashboard with hierarchical topic clustering, AI-powered search, and full engagement metrics.
- Landing page (lab picker): papertrail-portal.vercel.app
- Koo Lab Dashboard — Vercel · GitHub Pages
- Standard Model Bio Dashboard — Vercel · GitHub Pages
Documentation · Report Bug · Request Feature
A self-contained HTML file — no server required.
- Canvas scatter plot with UMAP/t-SNE/PCA projections (hardware-accelerated)
- Hierarchical topic clustering — LLM-generated labels at 3 zoom levels
- Topic connection lines — configurable thickness, opacity, curve, color
- 8 color modes: Cluster, Channel, Year, Citations, Engagement, Density, Contributor, Journal
- Embedding-based semantic search — cosine similarity on LLM vectors (BGE-small)
- AI chatbot — natural language queries with tool use (HuggingFace, Claude, OpenAI)
- 3D WebGL view, sortable table, leaderboard, time travel animation
- Smooth animations — papers fade in/out on filter, timeline playback
- Dark theme, CSV/XLSX export, keyboard shortcuts, shareable URL state
- Multi-strategy enrichment — page scraping → OpenAlex → Crossref → bioRxiv API → Google fallback
- LLM embeddings — HuggingFace BGE-small (384d) for projections + client-side search
- Hierarchical clustering on UMAP projections with LLM-generated topic labels
- Dead link detection, junk title filtering, URL normalization
- Automated weekly pipeline via GitHub Actions → deploys to both Vercel and GitHub Pages
Run PaperTrail across multiple Slack workspaces from a single repo:
config/
├── koolab.yml # Koo Lab workspace
├── standardmodelbio.yml # Standard Model Bio workspace
└── yourlab.yml # Add your own!
Each workspace gets its own data directory and dashboard, published at
/<workspace>/ on both the Vercel app and GitHub Pages. A generated landing
page (scripts/build_landing.py) links to every workspace dashboard.
- Fork this repository
- Create a Slack bot app (guide)
- Add your config to
config/yourworkspace.yml:title: "PaperTrail — My Lab" slack_workspace_url: "https://mylab.slack.com" channels: {} # empty = auto-discover all public channels embedding_backend: huggingface slack_token_secret: SLACK_BOT_TOKEN
- Set GitHub secret:
gh secret set SLACK_BOT_TOKEN - Trigger:
gh workflow run pipeline.yml
Dashboard deploys to both https://<project>.vercel.app/<workspace>/ and
https://<user>.github.io/PaperTrail/<workspace>/. For Vercel deployment, also
set the VERCEL_TOKEN, VERCEL_ORG_ID, and VERCEL_PROJECT_ID secrets (see
Configuration → Deployment).
pip install papertrail-lab[all]
# Full pipeline
papertrail run-pipeline -c config/myworkspace.yml -o build/myworkspace
# Or step by step
papertrail scrape --token $SLACK_BOT_TOKEN -c CHANNEL_ID -o raw.json
papertrail enrich raw.json -o enriched.json
papertrail embed enriched.json -o final.json
papertrail build final.json -o dashboard.htmlSlack Workspaces (multiple)
│
▼
┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Scraper │───▶│ Enricher │───▶│ Embeddings │───▶│ Dashboard │
│ │ │ │ │ │ │ │
│ - Slack API │ │ - Page scrape│ │ - HuggingFace│ │ - UMAP map │
│ - 30+ domains│ │ - OpenAlex │ │ - OpenAI │ │ - 3D view │
│ - Reactions │ │ - Crossref │ │ - Local ONNX │ │ - Table │
│ - Auto-join │ │ - bioRxiv API│ │ - TF-IDF │ │ - AI agent │
│ │ │ - Dead links │ │ - 3-level │ │ - Semantic │
│ │ │ - Junk filter│ │ clustering │ │ search │
└─────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
│
GitHub Actions (weekly cron)
│
┌───────────┴───────────┐
▼ ▼
Vercel GitHub Pages
/koolab/ /standardmodelbio/ (+ lab-picker landing)
PaperTrail/
├── config/ # Per-workspace configurations
│ ├── koolab.yml
│ └── standardmodelbio.yml
├── data/ # Per-workspace paper data
│ ├── koolab/papers_final.json
│ └── standardmodelbio/papers_final.json
├── papertrail/ # Python package
│ ├── scraper.py # Slack scraping + URL extraction
│ ├── enricher.py # Metadata enrichment (OpenAlex + PubMed)
│ ├── enrich_cascade.py # Multi-strategy enrichment cascade
│ ├── embeddings.py # Embedding backends
│ ├── projections.py # Projections + hierarchical clustering
│ ├── pipeline.py # Automated pipeline runner
│ ├── preview.py # Dashboard builder
│ ├── cli.py # CLI commands
│ └── templates/dashboard.html # Dashboard template (~10K lines)
├── .github/workflows/
│ ├── pipeline.yml # Weekly pipeline + deploy (Vercel + GitHub Pages)
│ ├── docs.yml # Documentation deploy
│ └── ci.yml # Tests
├── scripts/
│ └── build_landing.py # Generates the lab-picker landing page
├── docs/ # MkDocs documentation
└── pyproject.toml # Package config
git clone https://github.com/bschilder/PaperTrail.git
cd PaperTrail
pip install -e ".[all,dev]"
pytest
mkdocs serveMIT License. See LICENSE.