Skip to content

fix: preserve custom node/edge attributes during merge operations#2990

Open
abc-lee wants to merge 2 commits into
HKUDS:mainfrom
abc-lee:fix/merge-preserve-custom-attributes
Open

fix: preserve custom node/edge attributes during merge operations#2990
abc-lee wants to merge 2 commits into
HKUDS:mainfrom
abc-lee:fix/merge-preserve-custom-attributes

Conversation

@abc-lee
Copy link
Copy Markdown

@abc-lee abc-lee commented Apr 27, 2026

Summary

Fix data loss bug where custom node/edge attributes are silently discarded during merge operations.

The Bug

_merge_nodes_then_upsert and _merge_edges_then_upsert reconstruct node_data/edge_data from scratch with only 7 hardcoded fields. This silently discards any custom attributes (e.g. brain_meta_*, community_id, or any user-extended fields) added by downstream users.

This is inconsistent with:

  • aedit_entity which uses {**node_data, **updated_data} pattern
  • amerge_entities which explicitly collects all keys via _merge_attributes

The Fix

Instead of building a new dict from scratch:

  1. Start from the existing node/edge data dict
  2. Update with standard fields (source_type, updated_at, chunk_id_list, etc.)
  3. Custom attributes are preserved

This approach is minimal, targeted, and maintains backwards compatibility.

Testing

  • All existing tests pass
  • The fix has been validated in the niu-agent downstream project

🤖 Generated with Claude Code

李磊 and others added 2 commits April 23, 2026 21:22
_merge_nodes_then_upsert and _merge_edges_then_upsert reconstruct
node_data/edge_data from scratch with only 7 hardcoded fields,
silently discarding any custom attributes added by downstream users.

This is inconsistent with aedit_entity (which uses {**node_data, **updated_data})
and amerge_entities (which collects all keys via _merge_attributes).

Fix: start from existing node/edge dict and update with standard fields,
so custom attributes (e.g. brain_meta_*, community_id) are preserved.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
NetworkX uses Python dict keys for node identity, making node IDs
case-sensitive. This causes duplicate nodes when LLM extraction
returns different casing (e.g., 'Brain:Region:文档库' vs
'brain:region:文档库'). In a knowledge graph, 'Apple' and 'apple'
are semantically the same entity.

Add _normalize_node_id() static method that applies .lower() to all
node IDs before they enter NetworkX. Applied consistently across all
node/edge operations: has_node, get_node, upsert_node, delete_node,
has_edge, get_edge, upsert_edge, remove_edges, and BFS entry point.

This fixes the issue at the storage layer, so all upstream paths
(LLM extraction via operate.py, custom injection via ainsert_custom_kg,
and direct API calls) automatically benefit without any changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant