Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
284 commits
Select commit Hold shift + click to select a range
52ce8f1
Refactor chat history
InAnYan Nov 29, 2025
0803345
refactor chat history + start working on UI
InAnYan Nov 29, 2025
79a6df3
initroduce 1 class. idk why
InAnYan Nov 29, 2025
5c7d11a
Move out database listeners
InAnYan Nov 29, 2025
bc5ebfe
middle work
InAnYan Nov 29, 2025
cb2b08d
middle work
InAnYan Nov 29, 2025
55bb1aa
middle work
InAnYan Nov 29, 2025
ea09dc6
middle work
InAnYan Nov 29, 2025
a00b9ac
Start working on tasks
InAnYan Nov 29, 2025
02ff6ff
middle work
InAnYan Nov 30, 2025
3dc512b
Somewhat AiSummary
InAnYan Nov 30, 2025
2398f4e
middle work
InAnYan Nov 30, 2025
edb3d4e
summary almost done
InAnYan Dec 1, 2025
90a2eaa
middle work
InAnYan Dec 3, 2025
67ef0a0
Finish AI summary
InAnYan Dec 3, 2025
6f8c361
Refactor locations
InAnYan Dec 3, 2025
e40cb01
Start working on AI chat UI
InAnYan Dec 7, 2025
ad9757e
Use single chat history repository
InAnYan Dec 7, 2025
9fefac2
Remove old classes
InAnYan Dec 7, 2025
57a81de
Remove old classes x2
InAnYan Dec 7, 2025
0e2e03f
Middle work
InAnYan Dec 7, 2025
b3cbda9
middle work
InAnYan Dec 9, 2025
3e59ef9
finish chat?
InAnYan Dec 20, 2025
e638de3
Something works
InAnYan Dec 28, 2025
b346a29
bug fixing
InAnYan Dec 29, 2025
37562df
fix Ai chat
InAnYan Dec 29, 2025
6bb421b
Implement status window
InAnYan Dec 29, 2025
1b79445
Fixings
InAnYan Dec 29, 2025
d9eb817
rename identifiers
InAnYan Dec 29, 2025
6382b62
Make group window
InAnYan Dec 29, 2025
af1f4f3
Fix window close
InAnYan Dec 29, 2025
e7c66a0
Some arch changes
InAnYan Dec 30, 2025
c8a3e7d
Add ADRs
InAnYan Dec 30, 2025
e7ca964
Some changes
InAnYan Dec 30, 2025
0969f8f
mid work
InAnYan Dec 30, 2025
7e87908
Almost finished
InAnYan Dec 31, 2025
cd7a7ba
Add draft reqs
InAnYan Dec 31, 2025
a001545
fix ai chat
InAnYan Jan 1, 2026
de98d83
add ai OFT
InAnYan Jan 1, 2026
c49a904
add adr for messages
InAnYan Jan 1, 2026
42f832c
Empty messages package
InAnYan Jan 1, 2026
2ef6fa8
add ADR on messages
InAnYan Jan 1, 2026
f8ecd53
add new types
InAnYan Jan 1, 2026
5c50a6d
Migrate to new type (todo: migrate messages v1->v2, show debug, expor…
InAnYan Jan 1, 2026
e1b6eea
feat: mid work on removing
InAnYan Mar 29, 2026
248c324
refactor(ai): refactor AiDatabaseListener and AiFeature
InAnYan Apr 3, 2026
b5b404e
refactor(ai): remove AiTemplateKind
InAnYan Apr 3, 2026
1bbbe72
refactor(ai): fix AiDatabaseListeners
InAnYan Apr 3, 2026
be3eb71
refactor(ai): start AiSummarizationLogic
InAnYan Apr 3, 2026
c0d5db0
refactor(ai): petit change
InAnYan Apr 3, 2026
6e8e11a
refactor(ai): remove some user message templates
InAnYan Apr 3, 2026
85b5bb9
refactor(ai): add ai library id
InAnYan Apr 3, 2026
58bc248
refactor(ai): refactor ChatIdentifier with ai library id
InAnYan Apr 3, 2026
d402a79
refactor(ai): refactor AiSummaryIdentifier
InAnYan Apr 3, 2026
810d8b0
refactor(ai): rename AI identifiers
InAnYan Apr 3, 2026
639748e
refactor(ai): rename AI identifiers
InAnYan Apr 3, 2026
bc40a43
refactor(ai): remove AiTemplateKind.java
InAnYan Apr 3, 2026
619f22c
refactor(ai): change tokenizators parameters
InAnYan Apr 3, 2026
eb62e20
refactor(ai): remove customimplementations package
InAnYan Apr 3, 2026
2784ddb
refactor(ai): make TokenizationAiFeature
InAnYan Apr 3, 2026
051a9fb
refactor(ai): change order of the responsibility chain
InAnYan Apr 3, 2026
fb5cc8d
refactor(ai): use string templates
InAnYan Apr 3, 2026
8919106
chore(gui): remove unused file
InAnYan Apr 3, 2026
1ccfc3e
fix(ai): add system message to the chat
InAnYan Apr 3, 2026
d856055
refactor(ai): use hash for ingested documents tracking
InAnYan Apr 3, 2026
c38d529
refactor(ai): add GenerateEmbeddingsAiDatabaseListener and move files
InAnYan Apr 3, 2026
073f7d8
refactor(ai): add transfer of summaries
InAnYan Apr 3, 2026
c4550d5
refactor(ai): petit change
InAnYan Apr 3, 2026
e690a65
refactor(ai): simplify templates
InAnYan Apr 5, 2026
abfd737
refactor(ai): fix name
InAnYan Apr 5, 2026
f4b808d
refactor(ai): add factories
InAnYan Apr 6, 2026
44e6abf
refactor(ai): refactor summarization + add RAM layer
InAnYan Apr 9, 2026
1e6f8e2
refactor(ai): make migrations
InAnYan Apr 9, 2026
cf4ffb8
refactor(ai): remove features
InAnYan Apr 9, 2026
5d5e6f2
refactor(ai): remove AiChatLogic and introduce GenerateRagResponseTask
InAnYan Apr 10, 2026
4610fcb
refactor(ai): clean-up code
InAnYan Apr 10, 2026
d44de0b
Initial plan
Copilot Apr 10, 2026
7b27d51
Initial plan
Copilot Apr 10, 2026
2af3303
feat: implement AI export feature for chat and summary
Copilot Apr 10, 2026
d6a5a0a
fix: use correct BibEntryWriter API for BibTeX serialization
Copilot Apr 10, 2026
0011b49
refactor: extract helper methods to reduce duplication in AiChatView …
Copilot Apr 10, 2026
c0fa43e
Implement follow-up questions feature for AI chat
Copilot Apr 10, 2026
7be33e5
Fix magic number and duplicate regex pattern issues from code review
Copilot Apr 10, 2026
3d058dd
Fix constant placement and remove duplicate constant for code review …
Copilot Apr 10, 2026
7c2844f
Changes before error encountered
Copilot Apr 10, 2026
7830a21
Address review comments: BackgroundTask for follow-up questions, View…
Copilot Apr 11, 2026
6ca5f63
Update AiTab.java
InAnYan Apr 11, 2026
0ff6310
Update AiTab.java
InAnYan Apr 11, 2026
a72ec17
refactor(ai): update embeddings to use file hash
InAnYan Apr 11, 2026
87c6622
Update GenerateFollowUpQuestions.java
InAnYan Apr 11, 2026
42ea039
refactor: move export logic to view models, add interfaces, add markd…
Copilot Apr 11, 2026
819c2bf
refactor: improve method names per code review feedback
Copilot Apr 11, 2026
f92b21e
Merge branch 'refactor/ai-1' into copilot/add-follow-up-questions-fea…
InAnYan Apr 11, 2026
9749dde
Merge pull request #209 from InAnYan/copilot/add-follow-up-questions-…
InAnYan Apr 11, 2026
2eff7a7
refactor: add AiMetadata record, remove ExportMessage, use ChatMessag…
Copilot Apr 11, 2026
2b9ef40
style: convert /// comments to standard Javadoc in AiMetadata and AiT…
Copilot Apr 11, 2026
b2b8df7
refactor: embed AiMetadata in AiSummary; remove AiMetadata.empty()
Copilot Apr 11, 2026
d8ac13c
refactor: move export responsibility to AiSummaryShowingViewModel
Copilot Apr 11, 2026
ac6607d
style: remove trailing blank line in AiSummaryShowingView
Copilot Apr 11, 2026
0d2d661
refactor(ai): add chat in memory cache
InAnYan Apr 12, 2026
7087ef3
refactor(ai): make compile
InAnYan Apr 12, 2026
76991c4
refactor(ai): clean ups
InAnYan Apr 12, 2026
e2c3084
refactor(ai): fix migrations
InAnYan Apr 12, 2026
cc3316d
refactor(ai): quick fix for migrations
InAnYan Apr 12, 2026
d9a12e6
refactor(ai): fix
InAnYan Apr 12, 2026
d2ba0da
refactor(ai): fix migration
InAnYan Apr 12, 2026
1e2de88
Merge branch 'refactor/ai-1' of https://github.com/InAnYan/jabref int…
Copilot Apr 13, 2026
db1415d
Merge pull request #208 from InAnYan/copilot/add-export-feature-ai
InAnYan Apr 13, 2026
2d1d3aa
refactor(ai): fix a bit the follow up questions
InAnYan Apr 13, 2026
e638abd
refactor(ai): fix chat history scroll
InAnYan Apr 13, 2026
1c14365
feat: redesign AI chat - move model to status window, add export ther…
Copilot Apr 13, 2026
03d1071
refactor: extract formatChatModelLabel into named method in AiChatSta…
Copilot Apr 13, 2026
87ba960
refactor(ai): clean quickly
InAnYan Apr 13, 2026
f674697
refactor: move chat model building and export to AiChatStatusViewMode…
Copilot Apr 14, 2026
53e6324
Merge remote-tracking branch 'origin/refactor/ai-1' into copilot/rede…
Copilot Apr 14, 2026
fadb61c
refactor(ai): add embedding model download cache
InAnYan Apr 14, 2026
ba75fe2
Merge remote-tracking branch 'origin/refactor/ai-1' into copilot/rede…
Copilot Apr 14, 2026
889fc3d
Merge pull request #211 from InAnYan/copilot/redesign-ai-chat-status-…
InAnYan Apr 15, 2026
675350a
refactor(ai): clear ingested documents on embedding model change
InAnYan Apr 15, 2026
4664cb1
refactor(ai): ui change
InAnYan Apr 15, 2026
604b656
refactor(ai): ui change for Ai Tab
InAnYan Apr 15, 2026
ae3241f
refactor(ai): restore regenerate message func
InAnYan Apr 15, 2026
6417ebb
refactor(ai): fix adr order
InAnYan Apr 17, 2026
880c3c1
refactor(ai): fix adr order
InAnYan Apr 17, 2026
1646292
refactor(ai): fix chatting requirements
InAnYan Apr 17, 2026
1ba5794
refactor(ai): change future feature
InAnYan Apr 17, 2026
3abb140
refactor(ai): clean and fix file hasher tests
InAnYan Apr 17, 2026
31fe467
refactor(ai): remove persited file ingestor test
InAnYan Apr 17, 2026
422d92e
refactor(ai): refactor plain citation parsing with llm
InAnYan Apr 17, 2026
33f125f
refactor(ai): less metadata changes
InAnYan Apr 17, 2026
fbce38d
refactor(ai): improve comment
InAnYan Apr 17, 2026
e6533e5
refactor(ai): refactor EmbeddingSimilarityMetric
InAnYan Apr 17, 2026
2db56d7
refactor(ai): quick modules change
InAnYan Apr 17, 2026
e0c7d7e
refactor(ai): remove ResolvedGroup
InAnYan Apr 17, 2026
dc241a1
refactor(ai): update AI docs
InAnYan Apr 17, 2026
d1b3862
refactor(ai): Rename PredefinedEmbeddingModel
InAnYan Apr 17, 2026
4ad31ba
refactor(ai): change ChatMessage.Role
InAnYan Apr 17, 2026
b5a9498
refactor(ai): change ChatHistoryRecord
InAnYan Apr 17, 2026
a1f5b40
refactor(ai): cleanups
InAnYan Apr 17, 2026
e86f51a
refactor(ai): remove ListenersHelper.java
InAnYan Apr 17, 2026
378958c
refactor(ai): remove CitationKeyCheck.java
InAnYan Apr 17, 2026
eedff09
refactor(ai): fix AiPreferences
InAnYan Apr 17, 2026
a7ec558
refactor(ai): revert MVStoreBase
InAnYan Apr 17, 2026
4e3efc2
refactor(ai): remove BibEntryListComparatorById
InAnYan Apr 17, 2026
37d6fd4
refactor(ai): do not save if deleted
InAnYan Apr 17, 2026
a81e2b2
refactor(ai): simplify chunked summarization logic
InAnYan Apr 17, 2026
e022213
refactor(ai): clean
InAnYan Apr 17, 2026
937632a
refactor(ai): clean
InAnYan Apr 17, 2026
dc3e977
refactor(ai): refactor
InAnYan Apr 17, 2026
41336bd
refactor(ai): move exporters
InAnYan Apr 17, 2026
59e0037
refactor(ai): static class refactors
InAnYan Apr 17, 2026
c898b6c
refactor(ai): remove comment
InAnYan Apr 17, 2026
df3a541
refactor(ai): refactor answer engines
InAnYan Apr 17, 2026
020b180
refactor(ai): remove comment
InAnYan Apr 17, 2026
b9550f6
refactor(ai): cleanup + ingestion
InAnYan Apr 17, 2026
8aec688
refactor(ai): cleanup
InAnYan Apr 17, 2026
d89bc33
refactor(ai): refactor ingestion + clean ups + follow-up questions
InAnYan Apr 17, 2026
f189fc7
refactor(ai): refactor bindings
InAnYan Apr 18, 2026
884a822
refactor(ai): simplify chat message
InAnYan Apr 18, 2026
ce0022d
refactor(ai): change UI ai chat
InAnYan Apr 18, 2026
69c67c2
refactor(ai): remove PropertiesHelper.java
InAnYan Apr 18, 2026
4ee03ea
refactor(ai): block chat history during loading
InAnYan Apr 18, 2026
96893e1
refactor(ai): refactor to use one status pane
InAnYan Apr 18, 2026
396376e
refactor(ai): move files + add tooltip
InAnYan Apr 18, 2026
f90a002
refactor(ai): refactor bindings + cleanup
InAnYan Apr 18, 2026
f287dcf
refactor(ai): refactor ai default preferences
InAnYan Apr 18, 2026
cf0eef1
refactor(ai): add API key changes in Preferences
InAnYan Apr 19, 2026
5ed90c4
refactor(ai): convert javadoc to markdown
InAnYan Apr 19, 2026
abb2c3b
Revert "refactor(ai): convert javadoc to markdown"
InAnYan Apr 19, 2026
b425505
Merge branch 'main' into refactor/ai-1
InAnYan Apr 19, 2026
d4c21b9
refactor(ai): make it run
InAnYan Apr 19, 2026
2c4547a
refactor(ai): fix + fix checkstyle
InAnYan Apr 19, 2026
bd4a2ca
refactor(ai): add tests
InAnYan Apr 19, 2026
c7a9c9f
refactor(ai): convert comments
InAnYan Apr 19, 2026
8d33c1e
refactor(ai): docs
InAnYan Apr 19, 2026
cd5d26f
refactor(ai): update ADR 0057
InAnYan Apr 21, 2026
e3342a7
refactor(ai): revert checkstyle
InAnYan Apr 21, 2026
e8e0a28
refactor(ai): add MVStore comment
InAnYan Apr 21, 2026
7f1aea2
refactor(ai): update docs
InAnYan Apr 21, 2026
1ace5c9
refactor(ai): add testing resources
InAnYan Apr 22, 2026
4c0c8dc
refactor(ai): update from my review
InAnYan Apr 22, 2026
36fcedf
refactor(ai): add context menu for chat messages
InAnYan Apr 22, 2026
ad45161
refactor(ai): remove assertions
InAnYan Apr 22, 2026
1203ec4
refactor(ai): just fixes
InAnYan Apr 24, 2026
d1b687b
refactor(ai): fix tracing
InAnYan Apr 24, 2026
8ef5d11
refactor(ai): add CHANGELOG entry
InAnYan Apr 25, 2026
acfbfda
refactor(ai): add answer engine combobox
InAnYan Apr 26, 2026
9c7a535
Merge branch 'main' into refactor/ai-1
koppor Apr 27, 2026
7af78d4
docs(adr): use CUID2 for aiLibraryId
koppor Apr 27, 2026
60eaa47
refactor(ai): rename vBox/buttonsVBox in AiChatMessageView
koppor Apr 27, 2026
ebbfe9d
undo
koppor Apr 27, 2026
28dd5b0
refactor(ai): tidy AiChatMessageView
koppor Apr 27, 2026
a79abc0
refactor(ai): localize FileStatus and use Directories.getUserDirectory
koppor Apr 27, 2026
c0f9533
refactor(ai): collapse 2-value State enums to BooleanProperty
koppor Apr 27, 2026
d3b6947
Fix HTML block
koppor Apr 27, 2026
188d7e1
Merge branch 'main' into refactor/ai-1
InAnYan May 4, 2026
14fbd67
Merge branch 'main' into refactor/ai-1
InAnYan May 4, 2026
9693f40
reactor(ai): fix markdown
InAnYan May 4, 2026
fefce14
reactor(ai): fix new lines at the end
InAnYan May 4, 2026
a0d172e
reactor(ai): fix from review
InAnYan May 4, 2026
4f66e5e
reactor(ai): fix do while loop
InAnYan May 4, 2026
3b4d23b
reactor(ai): fix new lines
InAnYan May 4, 2026
8119660
Fix submodules
InAnYan May 4, 2026
5bc1ab2
reactor(ai): fix code style
InAnYan May 4, 2026
597f0fa
reactor(ai): fix annotations
InAnYan May 4, 2026
2a3a700
Merge branch 'main' into refactor/ai-1
InAnYan May 6, 2026
0ea7dbd
reactor(ai): fix links in markdown
InAnYan May 6, 2026
1666477
reactor(ai): fix CHANGELOG
InAnYan May 6, 2026
c8393b4
reactor(ai): update module-info.java
InAnYan May 6, 2026
48dafe7
reactor(ai): apply openrewrite
InAnYan May 6, 2026
7e0048f
reactor(ai): migrate to jackson 3
InAnYan May 6, 2026
5b6ba98
Merge branch 'main' into refactor/ai-1
InAnYan May 8, 2026
7a3eccc
reactor(ai): fix links
InAnYan May 8, 2026
9a328f9
reactor(ai): fix code style
InAnYan May 8, 2026
6d971c5
reactor(ai): fix arch tests
InAnYan May 8, 2026
eb27921
reactor(ai): fix arch tests x2
InAnYan May 8, 2026
039d829
reactor(ai): add check if a file is already ingested
InAnYan May 8, 2026
8ad81bc
reactor(ai): fix chat migration
InAnYan May 9, 2026
9efce94
reactor(ai): fix code style
InAnYan May 9, 2026
fcf1a08
fix(docs/requirements): add expert settings AI requirements
InAnYan May 10, 2026
accad0b
Merge branch 'main' into refactor/ai-1
InAnYan May 10, 2026
a8b1931
reactor(ai): fix module info
InAnYan May 10, 2026
40ed7d6
Merge remote-tracking branch 'origin/refactor/ai-1' into refactor/ai-1
InAnYan May 10, 2026
d95af78
fix: fix module info
InAnYan May 11, 2026
a3c88d0
fix: fix module info
InAnYan May 11, 2026
f476826
fix: build gradle
InAnYan May 11, 2026
23c43f9
Merge branch 'main' into refactor/ai-1
InAnYan May 11, 2026
5dfb52e
reactor(ai): fix localization
InAnYan May 11, 2026
2048323
Merge branch 'main' into refactor/ai-1
InAnYan May 11, 2026
1a240b9
reactor(ai): fix tests
InAnYan May 12, 2026
cccde54
Merge branch 'main' into refactor/ai-1
InAnYan May 12, 2026
51e2dde
Merge branch 'main' into refactor/ai-1
InAnYan May 14, 2026
0c15308
Merge branch 'main' into refactor/ai-1
InAnYan May 15, 2026
5965d9f
Update CHANGELOG.md
InAnYan May 15, 2026
6397848
Merge branch 'main' into refactor/ai-1
koppor May 17, 2026
779f265
reactor(ai): fix order of ADR
InAnYan May 18, 2026
1c5cb1f
reactor(ai): add new good to the adr
InAnYan May 18, 2026
d41c599
reactor(ai): refine requirements
InAnYan May 18, 2026
4c968cf
reactor(ai): refine null check
InAnYan May 18, 2026
1d4736b
reactor(ai): add qodo comments
InAnYan May 18, 2026
1a063fb
reactor(ai): add qodo comments
InAnYan May 18, 2026
ad6fe00
reactor(ai): fix formatting
InAnYan May 18, 2026
00201b4
reactor(ai): fix from qodo
InAnYan May 18, 2026
fbc5ea8
Merge branch 'main' into refactor/ai-1
InAnYan May 18, 2026
6c41ec6
Merge branch 'main' into refactor/ai-1
InAnYan May 19, 2026
36761f4
reactor(ai): move AI adrs
InAnYan May 19, 2026
75c4cca
Merge branch 'main' into refactor/ai-1
koppor May 19, 2026
474e2f4
Merge branch 'main' into refactor/ai-1
koppor May 19, 2026
1cb86ad
Merge branch 'main' into refactor/ai-1
koppor May 20, 2026
5232fa9
Merge branch 'main' into refactor/ai-1
koppor May 20, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Note that this project **does not** adhere to [Semantic Versioning](https://semv

### Added

- We added support for selecting answer engines and summarization algorithms, allowing users to change the underlying AI behavior. [#15688](https://github.com/JabRef/jabref/pull/15688)
- The citation key generator also normalizes super and subscript characters. [#15743](https://github.com/JabRef/jabref/pull/15743)
- We added automatic source groups to SLR results and fixed group merging to preserve all source groups. [#12542](https://github.com/JabRef/jabref/issues/12542)
- We enabled usage of relative or absolute file paths depending on your file directory settings. [#3590](https://github.com/JabRef/jabref/issues/3590)
Expand Down
12 changes: 12 additions & 0 deletions build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,18 @@ requirementTracing {
"jabsrv/src/test/java"
)
)

filteredArtifactTypes =
listOf(
"impl",
"utest",
"model",
"guard",
"pp",
"feat",
"req"
)

// TODO: Short Tag Importer: https://github.com/itsallcode/openfasttrace-gradle#configuring-the-short-tag-importer
}

Expand Down
90 changes: 81 additions & 9 deletions docs/code-howtos/ai.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,29 +4,101 @@ parent: Code Howtos

# AI

The AI feature of JabRef is built on [LangChain4j](https://github.com/langchain4j/langchain4j) and [Deep Java Library](https://djl.ai/).
The JabRef has next AI features:

- Chatting with entries,
- Chatting with groups,
- Summarization of entries,
- Parsing of plain citations using LLMs
- Extracting "References" section from PDFs with the help of LLMs.

The features are built on [LangChain4j](https://github.com/langchain4j/langchain4j) and [Deep Java Library](https://djl.ai/).

## Architectural Decisions

See [ADR-0037](../decisions/0037-rag-architecture-implementation.md) for the decision regarding the RAG infrastructure.
See [ADR-0037](./../decisions/0037-rag-architecture-implementation.md) for the decision regarding the RAG infrastructure.

The [ADR-0032](./../decisions/0032-store-chats-in-local-user-folder.md) and [ADR-0033](./../decisions/0033-store-chats-in-mvstore.md) are important ones, because they explain the decisions regarding the storage of AI artifacts (summaries, chat histories, embeddings, etc.).

## Requirements

See [the requirements page of AI features](./../requirements/ai.md).

## Features

### Feature "Chat with PDF(s)"

## Feature "Chat with PDF(s)"
The interface with all of the features (chat history, regeneration, follow up questions, etc.) is implemented in the class [org.jabref.gui.ai.chat.AiChatView]. From there, one will find preferences and other required infrastructure.

This is implemented mainly in the class [org.jabref.logic.ai.chatting.AiChatLogic].
From there, one will find preferences and other required infrastructure.
The RAG entry point is located in [org.jabref.logic.ai.chatting.tasks.GenerateRagResponseTask].

## Feature "Summarize PDF(s)"
### Feature "Summarize PDF(s)"

This is implemented in the class [org.jabref.logic.ai.summarization.GenerateSummaryTask].
This is implemented in the class [org.jabref.logic.ai.summarization.tasks.GenerateSummaryTask].

## Feature "BibTeX from Reference Text"
### Feature "BibTeX from Reference Text"

The general interface is [org.jabref.logic.importer.plaincitation.PlainCitationParser].
The class implementing it using AI is [org.jabref.logic.importer.plaincitation.LlmPlainCitationParser].

## Feature "Reference Extractor"
### Feature "Reference Extractor"

Extracts the list of references (Section ["References"](../glossary/references.md)) from the last page of the PDF to a List of BibEntry.

The general interface is [org.jabref.logic.importer.fileformat.pdf.BibliographyFromPdfImporter].
The class implementing it using AI is [org.jabref.logic.importer.plaincitation.LlmPlainCitationParser].

## Code organization

As every JabRef feature, AI is divided into 3 layers: GUI, logic, and model. Inside the `logic` package the AI code is split by feature (each feature has its own package).

The GUI code strongly follows [MVVM pattern](./javafx.md). Though, the GUI code is a bit complicated as:

1. Most of the core GUI components (chat and summary components) are designed as a state machine. Typical states include: loading, presenting the result, error, etc.
2. These core GUI components are also made that way so it would be possible to rebind them to another `BibEntry`. For the details, take a look at the section [How to add a new AI feature](## How to add a new AI feature).

## Internal model (v2)

There are 3 core models in the AI features:

1. Chat history.
2. Summaries.
3. Embeddings.
4. Fully ingested documents.

The code strictly follows the repository pattern, where an interface is created to access the internal storage for the purpose of abstraction. At the moment of writing, all of these models are implemented by using the [`MVStore`](https://www.h2database.com/html/mvstore.html). For the details of this decisions take a look at the [ADR 0033](./../decisions/0033-store-chats-in-mvstore.md). A helper class was made `MVStoreBase` so that it would be possible to use an in-memory `MVStore` in case there are some errors while opening on-disk storage.

A note needs to be made for embeddings: the embeddings storage is also implementing the internal LangChain4j interface for embeddings so that it could be used in LangChain4j algorithms. Additionally, there is a "fully ingested" repository, which simply contains a "list" of files that were fully ingested. This helps with checking if a file needs to be ingested or not, as there is no 1 to 1 correspondense with embeddings to file (which is many to one).

Because JabRef is not build around one global database, but rather it is a `.bib` file editor, a problem of identifying a `BibEntry` arose and it was solved in a somewhat complicated way:

- In order to uniquely identify a library, an "AI library ID" was introduced (as a metadata field), which is just a UUID. An alternative would be to use the library path, but if the library moves, the path changes, but AI library ID is not.
- In order to uniquely identify an entry, the citation key is used, but only if it is non-empty and unique.
- In some cases (that arise potentially often), the conditions above are not met (for example, a library is not saved - it does not have a path, or an entry does not have a citation key), however user is actively working on an entry. In this case the AI features have an *in-memory cache layer*. So whenever a chat or a summary is created for an entry, it is firstly interacted with the in-memory storage layer. The cache is flushed to the on-disk storage at the close of the JabRef.
- In order to uniquely identify a file, we use the file hash. An alternative would be to use the file path, but the file could be moved, or defined by a relative path. This is also useful when several libraries cite the same paper, and instead of ingesting

## [OLD] Internal model (v1)

The model v1 differs from v2 by:

1. Fields of the chat messages and summaries were differently organized in the `MVStore`.
2. A `LinkedFile#getLink()` was used to identify a file.

To migrate from v1 to v2, the classes `ChatHistoryMigrationV1` and `SummariesMigrationV2` were made.

## How to add a new AI feature

This section describes the standard pattern used for AI features. If should follow a similar plan:

1. Define the model of the artifact of your feature (for example, for summarization it is an AI summary, for chatting they are chat messages and chat history).
2. Define a repository interface (e.g. `SummaryRepository`, `ChatHistoryRepository`) and implement an `MVStore` implementation using the [org.jabref.logic.ai.util.MVStoreBase].
3. Define a logic class in the `logic` package: either a task (e.g. `GenerateSummaryTask` or a utility class for performing an AI feature. It is recommended to make it "without side-effects" (it does not change or write anything in the system). Firstly, this will help in testing the class, and, secondly, the storage is typically hanlded in *in-memory cache* layer, that will be discussed next.
4. Make an in-memory cache storage layer for your feature that has a RAM map between a `BibEntry` (or a group, or some other object that your artifact is linked to) and your model. Sometimes this can be omitted (for example, embeddings do not have the in-memory cache and always use a repository), but generally it is made in order to always have access to the AI feature even if some precondition is not satisfied (for example, storing chat history and summmaries requires that there is a database path and a non-empty unique citation key, but in-memory layer allows to work with them as is). At the close of JabRef (or a library) the in-memory cache layer will check the preconditions and only then write the data to the repository.
5. Make a `TaskAggregator` class. This is needed in order to be able to switch a component between entries and to deduplicate the tasks. So whenever you want to generate the artifact of your feature, you need to always communicate to the `TaskAggregator` class which will either create a new task or give you an already running one. The `TaskAggregator` also connects the results to the in-memory cache.

The next points are targeted to the GUI of the feature:

1. Design a component using the MVVM pattern. You need to write the interface in the FXML, then write a controller `Ai<Feature>View` and a view-model `Ai<Feature>ViewModel`.
2. A typical AI component will be a state machine: first and foremost, check if the AI features are enabled in JabRef (which equals to accepting a privacy policy of AI features). If not, then you must ensure that you component does nothing. To show the privacy policy banner, there is a dedicated component [org.jabref.gui.ai.AiPrivacyNoticeView]. The next states typically envolve checking some preconditions (for example, you can not summarize an entry, if it does not have linked files), and the final is the working state. You might find the [org.jabref.gui.util.BindingsHelper#bindEnum] useful.
3. The entry editor tabs are designed to be switchable (rebound to some other `BibEntry`), so you can have an `entryProperty` and whenver it is changed, the state machine of the component is rerun.
4. When you read an artifact for an entry (or a group, or other entity that is linked to your AI feature), the look-up should be made in 3 steps: look into the repository, look in to the in-memory cache, and only then contact the `TaskAggregator` to start a new generation task.
6 changes: 6 additions & 0 deletions docs/decisions/0033-store-chats-in-mvstore.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ parent: Decision Records
---

# Store Chats in MVStore
<!-- dsn->req~ai.summarization.general.storage~1 -->
<!-- dsn->req~ai.chat.entries.history-storage~1 -->

## Context and Problem Statement

Expand Down Expand Up @@ -51,3 +53,7 @@ Chosen option: "MVStore", because it is simple and memory-efficient.
* Good, because we have the full control
* Bad, because involves writing our own language and parser
* Bad, because we need to implement optimizations found in databases on our own (storing some data in RAM, other on disk)

## More information

For the same logic, the summaries are stored in MVStore.
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ nav_order: 0036
parent: Decision Records
---

# Use `TextArea` for Chat Message Content
# Use Markdown rendering for Chat Message Content

## Context and Problem Statement

Expand All @@ -25,10 +25,8 @@ This decision record concerns the UI component that is used for rendering the co

## Decision Outcome

Chosen option: "Use `TextArea`".
All other options require more time to implement.
Some of the options do not support text selection and copying,
which for now we value more than Markdown rendering.
Chosen option: (modified) "Use a Markdown parser and convert AST nodes to JavaFX TextFlow elements".
In JabRef there is a component `SelectableTextFlow` which allows to create a formatted text and to select it. This makes possible to use a Markdown parser that converts the content into JavaFX nodes and adds the feature selecting the text.

## Pros and Cons of the Options

Expand Down
65 changes: 65 additions & 0 deletions docs/decisions/0058-use-djl-for-embeddings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
nav_order: 0058
parent: Decision Records
---
# Use Deep Java Library for embeddings in AI features

<!-- dsn->feat~ai.answer-engines.embeddings-search~1 -->

## Context and Problem Statement

JabRef needs to use embedding models to perform Retrieval-Augmented Generation (RAG) by generating embeddings for chunks of papers.

The Java AI ecosystem is not as diverse as the Python AI ecosystem, so the choice must be careful to ensure stability and ease of use for end users.

Which library to choose?

## Decision Drivers

* The library should not require additional setup from the user side
* It should be cross-platform
* It should support a wide variety of model architectures
* It should have an easy-to-use API
* The request that the library makes should be known and controlled
* We should know how and where the library downloads and stores models

## Considered Options

* LangChain4j
* ONNX Runtime
* Deep Java Library (DJL)
* DeepLearning4j

## Decision Outcome

Chosen option: "Deep Java Library (DJL)", because it satisfies all our requirements for an all-in-one solution that handles model management and inference.

However, users have reported problems with the PyTorch engine integration and unstable behavior. Moreover, its API is a bit complex.

### Consequences

* Good, because it has an API to show available models
* Good, because it handles model downloading automatically
* Neutral, because the API is complex
* Bad, because users have reported problems with the PyTorch engine integration and unstable behavior

## Pros and Cons of the Options

### LangChain4j

* Good, because it offers a high-level abstraction for LLM workflows
* Neutral, because it actually wraps other libraries like DJL or ONNX Runtime for the embeddings
* Bad, because it is a general LLM framework

### ONNX Runtime

* Good, because it is fast and efficient
* Bad, because it is a low-level inference engine and does not provide model management or downloading features out of the box
* Bad, because it supplies all binaries for different platforms at once and also supply debugging symbols, which makes it larger than necessary (see [this issue in LangChain4j repository](https://github.com/langchain4j/langchain4j/issues/1492) and [this issue in ONNX repository](https://github.com/langchain4j/langchain4j/issues/1492))

### Deep Java Library (DJL)

* Good, because it supports multiple engines including PyTorch and ONNX
* Good, because it has a built-in model zoo for downloading models
* Neutral, because its API is a bit complex
* Bad, because of reported stability issues with certain engines
89 changes: 89 additions & 0 deletions docs/decisions/0059-use-cuid2-for-ai-library-id.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
nav_order: 0059
parent: Decision Records
---
# Use CUID2 for `aiLibraryId`

## Context and Problem Statement

JabRef stores an `aiLibraryId` in the library's metadata to associate AI artifacts (chat history, summaries, embeddings) with a specific `.bib` library across launches.
The id is serialized into the `.bib` file as `@Comment{jabref-meta: aiLibraryId:<id>;}` and is therefore visible to anyone who opens the file in a text editor.
Carrying the id inside the file content (rather than keying off the file path) is what lets AI artifacts stay correlated with the library even when the user renames or moves the `.bib` file.

Because `.bib` files are routinely shared between researchers (e.g., via Git, email, cloud drives, supplementary material of papers), the id ends up in human-facing contexts.
A v4 UUID such as `550e8400-e29b-41d4-a716-446655440000` looks alarming or "machine-y" to a researcher who is just inspecting their references file.

What identifier scheme should we use for `aiLibraryId`?

## Decision Drivers

* The id must be globally unique with negligible collision probability (multiple researchers can independently create libraries; ids must not clash when libraries are merged).
* The id must be stable across JabRef launches and cross-platform.
* The id should look reasonably unobtrusive when a researcher reads the `.bib` file in a text editor — BibTeX files are shared, and the id should not say "WTF".
* The id should be generated locally without contacting a server (consistent with [ADR-0034](0034-use-citation-key-for-grouping-chat-messages.md): no server is available).
* Prefer a modern, actively maintained scheme.

## Considered Options

* `UUID.randomUUID()` (RFC 4122 v4 UUID).
* [CUID2](https://github.com/paralleldrive/cuid2).
* Short hash of the file path / first entry.

## Decision Outcome

Chosen option: **CUID2**, because it offers the same collision-resistance guarantees as a v4 UUID while producing a shorter, lowercase, alphanumeric string that is far less jarring inside a shared `.bib` file.
The Java port `io.github.thibaultmeyer:cuid` is on the dependency graph, and its v2.x line implements the CUID2 specification.

`AiService.ensureAiLibraryIdPresent` generates the id via the CUID2 generator.
The id remains an opaque `String` from the rest of the code's perspective, so no API changes propagate beyond that call site.

### Consequences

* Good, because the id is shorter (~24 chars instead of 36) and lowercase alphanumeric, which reads better in a shared `.bib` file.
* Good, because CUID2 is explicitly designed to be collision-resistant for horizontally-distributed generation, which matches our case (every JabRef install generates ids independently).
* Good, because CUID2 is, by design, hard to guess — slightly better than v4 UUIDs against fingerprinting if an id ever leaks into a URL or log.
* Bad, because we carry a small dependency surface compared to the JDK-builtin `UUID`.
* Bad, because CUID2 is less universally recognized than UUID — a developer encountering one for the first time may need a moment to identify the format.

### Confirmation

The serialization round-trip tests (`BibDatabaseWriterTest.writeAiLibraryId`, `MetaDataParser`) treat the value as an opaque string and pass with a CUID2 value.
A code review of `AiService.ensureAiLibraryIdPresent` confirms the CUID2 generator is the only source of new ids.

## Pros and Cons of the Options

### `UUID.randomUUID()`

Example: `550e8400-e29b-41d4-a716-446655440000`.

* Good, because it is built into the JDK — no extra dependency.
* Good, because it is universally recognized.
* Neutral, because collision probability is negligible (122 random bits).
* Bad, because the canonical form (`8-4-4-4-12` hex with hyphens) is long and visually noisy in a `.bib` file shared with researchers.
* Bad, because it conveys a "this is a generated machine token" feeling that is at odds with the otherwise human-readable nature of `.bib` files.

### CUID2

Example: `tz4a98xxat96iws9zmbrgj3a`.

Java port used: [thibaultmeyer/cuid-java](https://github.com/thibaultmeyer/cuid-java).

* Good, because the textual form is shorter and lowercase alphanumeric, blending in with other identifiers researchers already see (citation keys, DOIs).
Comment thread
InAnYan marked this conversation as resolved.
* Good, because the spec is explicit about collision resistance under distributed generation.
* Good, because it is a modern, actively maintained scheme (the original CUID has been deprecated in favor of CUID2).
* Good, because already used in indexing and OpenOffice integration.
* Bad, because it is one more dependency to track.
* Bad, because it is slightly less familiar to developers than UUID.

### Short hash of the file path / first entry

Example: `a3f1c9d2` (CRC32 / truncated SHA-1 of the absolute path).

* Good, because it is deterministic — moving a `.bib` file would not orphan its AI artifacts.
* Bad, because it is not unique: two libraries can share a citation key, and file paths change.
* Bad, because if a user copies a library, both copies would point at the same AI artifacts — exactly what `aiLibraryId` is meant to prevent.
* Bad, because the id would change if the underlying input changes, breaking the stability requirement.

## More Information

Implementation site: `AiService.ensureAiLibraryIdPresent` in `jablib/src/main/java/org/jabref/logic/ai/AiService.java`.
Loading
Loading