Skip to content

fix(agents): require config.yaml before treating per-user agent dir as resolved#3209

Open
IECspace wants to merge 1 commit into
bytedance:mainfrom
IECspace:fix/resolve-agent-dir-skips-memory-only-dir
Open

fix(agents): require config.yaml before treating per-user agent dir as resolved#3209
IECspace wants to merge 1 commit into
bytedance:mainfrom
IECspace:fix/resolve-agent-dir-skips-memory-only-dir

Conversation

@IECspace
Copy link
Copy Markdown
Contributor

Background

When a user starts their first chat with a custom agent, the memory writer creates
{base_dir}/users/{uid}/agents/{name}/memory.json to persist agent memory.
At that point the per-user directory exists, but only memory.json is in it
no config.yaml is written there because the agent itself still lives under
the legacy shared location {base_dir}/agents/{name}/.

resolve_agent_dir currently treats the existence of users/{uid}/agents/{name}/
as sufficient evidence that the user owns the agent:

user_path = paths.user_agent_dir(effective_user, name)
if user_path.exists():
    return user_path

So on the next request from the same user, resolve_agent_dir returns the
memory-only path. The downstream load_agent_config cannot find config.yaml
there and raises:

FileNotFoundError: Agent config not found: .../users/{uid}/agents/{name}/config.yaml

User-visible symptom: the custom agent disappears from the assistant menu after
the user's first successful chat with it. The agent only reappears after manual
intervention (deleting memory.json, restarting, or seeding config.yaml).

This was reproduced on a downstream deployment that uses a custom auth
middleware mapping external users to DeerFlow user IDs, but the failure is
intrinsic to upstream resolve_agent_dir and reproducible on plain bytedance/deer-flow
with any custom agent (including those created via the agents API).

Goal

resolve_agent_dir should resolve to the per-user directory only when that
directory actually contains a config.yaml
, otherwise it should fall through
to the legacy shared layout (or to the empty per-user write target if neither
exists).

This keeps the existing semantics for:

  • properly-installed per-user agents (per-user dir wins)
  • legacy/pre-migration installs (legacy fallback)
  • brand-new agents (returns per-user write target)

…while fixing the case where the memory subsystem has lazily created a per-user
agent dir but no agent config lives there yet.

Solution

Add an explicit config.yaml existence check to the per-user fast path:

user_path = paths.user_agent_dir(effective_user, name)
if user_path.exists() and (user_path / "config.yaml").exists():
    return user_path

This mirrors the gate that list_custom_agents already applies when scanning
directories (it skips entries without config.yaml). Now resolve_agent_dir is
consistent with that contract.

The docstring is updated to spell out why directory existence alone is not
enough, so the next person who reads the code understands the memory-writer
interaction.

Tests

Added a new TestResolveAgentDir class in backend/tests/test_custom_agent.py
covering:

Test Behaviour
test_skips_user_dir_with_memory_only per-user dir with only memory.json falls through to legacy
test_prefers_user_dir_when_config_yaml_exists per-user dir with valid config.yaml still wins
test_returns_user_path_when_neither_exists empty FS returns per-user write target (unchanged)
test_load_agent_config_falls_back_when_user_has_memory_only end-to-end: load_agent_config succeeds via legacy fallback

Test results

Run on python 3.12.13, pytest 9.0.3, uv managed environment.

With this patch:

tests/test_custom_agent.py::TestResolveAgentDir::test_skips_user_dir_with_memory_only PASSED
tests/test_custom_agent.py::TestResolveAgentDir::test_prefers_user_dir_when_config_yaml_exists PASSED
tests/test_custom_agent.py::TestResolveAgentDir::test_returns_user_path_when_neither_exists PASSED
tests/test_custom_agent.py::TestResolveAgentDir::test_load_agent_config_falls_back_when_user_has_memory_only PASSED
======================== 4 passed in 0.91s ========================

Without this patch (reverting just the 1-line change, tests kept):

FAILED tests/test_custom_agent.py::TestResolveAgentDir::test_skips_user_dir_with_memory_only
FAILED tests/test_custom_agent.py::TestResolveAgentDir::test_load_agent_config_falls_back_when_user_has_memory_only
============================================
E   FileNotFoundError: Agent config not found:
    .../users/test-user-autouse/agents/shared-agent/config.yaml

i.e. the two new tests that target the bug reliably reproduce the original
production symptom, and pass once the guard is added.

No regression in nearby modules:

tests/test_custom_agent.py ........................................... 59 passed
tests/test_paths_user_isolation.py ......................................... 34 passed
tests/test_update_agent_e2e_user_isolation.py ........................... 3 passed
tests/test_update_agent_tool.py .......................................... 16 passed
tests/test_lead_agent_skills.py .......................................... 13 passed

Total: 125 passed across test_custom_agent.py and the four most directly
related test modules.

Lint / format

$ uv run ruff check packages/harness/deerflow/config/agents_config.py tests/test_custom_agent.py
All checks passed!

$ uv run ruff format --check packages/harness/deerflow/config/agents_config.py tests/test_custom_agent.py
2 files already formatted

Compatibility

  • No public API change.
  • No config / docs / migration change required.
  • Behaviour for fully-installed per-user agents and for legacy-only installs is unchanged.
  • The only behavioural change is for the previously-broken case (memory-only user dir),
    which is silently steered back onto the legacy fallback.

Made with Cursor

…s resolved

`resolve_agent_dir` previously treated any existing
`users/{uid}/agents/{name}/` directory as a valid user-owned agent dir.
However the memory writer creates that directory with just `memory.json`
on the first chat, before any config exists. The next request then
resolves to that dir, `load_agent_config` cannot find `config.yaml`, and
the agent disappears from the assistant menu with
`FileNotFoundError: Agent config not found`.

Guard the per-user fast path with an explicit `config.yaml` existence
check so a memory-only directory falls through to the legacy shared
agents path, matching how `list_custom_agents` already treats
`config.yaml` as the gate.

Adds four regression tests covering:
- memory-only user dir falls through to legacy
- user dir with config.yaml still wins over legacy
- empty filesystem returns per-user write target
- end-to-end load_agent_config succeeds via legacy fallback

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant