Skip to content

feat: add toolset plugin#13444

Closed
AlinsRan wants to merge 12 commits into
masterfrom
feat/toolset-plugin
Closed

feat: add toolset plugin#13444
AlinsRan wants to merge 12 commits into
masterfrom
feat/toolset-plugin

Conversation

@AlinsRan
Copy link
Copy Markdown
Contributor

@AlinsRan AlinsRan commented May 26, 2026

Motivation

Diagnosing performance issues or memory leaks in a production APISIX deployment is difficult today:

  • Enabling verbose logging restarts APISIX and affects all traffic.
  • Adding observability requires route-level plugin changes that persist in etcd.
  • There is no lightweight way to temporarily instrument request phases or monitor internal Lua state without side effects.

This PR introduces the toolset plugin — a low-overhead diagnostics framework that can be toggled at runtime without restarting APISIX or touching any route configuration.

Design

The toolset plugin acts as a container for lightweight sub-plugins. Sub-plugins are not configured via the APISIX Admin API; instead they are configured by editing a single Lua file (apisix/plugins/toolset/config.lua) on disk. The plugin polls this file every second and hot-reloads sub-plugins when a change is detected.

Key design choices:

  • No per-route schema — the plugin always operates at global scope, so it instruments every request without touching routes or services.
  • File-based configuration — avoids etcd round-trips and lets operators apply changes by editing a single file (or with a configuration management tool), with sub-second propagation.
  • Dynamic load/unload — sub-plugins are loaded or unloaded at runtime; only sub-plugins with non-empty configuration are active.

Sub-plugins

trace

Instruments all APISIX request phases (access, balancer, upstream, header_filter, body_filter, log) and emits a formatted timing table to the error log for sampled requests that match the configured filters.

Features:

  • Configurable sampling rate (N out of 100 requests)
  • Host and path allowlist filtering with glob pattern support
  • Recognises common trace headers (x-request-id, traceparent, sw8, x-b3-traceid) and attaches their values to the log line
  • Optional UUID generation when no trace header is present (gen_uid)
  • Minimum total-duration threshold to suppress fast requests from the log
  • Capture of additional nginx/APISIX variables alongside the timing table

Attributes (trace):

Name Type Default Description
rate integer 1 Sampling rate as N-out-of-100. 1 = 1%; 100 = every request.
hosts array [] Allowlist of Host header values (glob). Empty = all hosts.
paths array [] Allowlist of request URI patterns (glob). Empty = all paths.
gen_uid boolean false Generate a UUID trace ID when no standard trace header is found.
vars array [] Extra nginx/APISIX variables to prepend to the trace output.
timespan_threshold number 0 Minimum total request duration (seconds) before emitting the log.

table_count

Periodically measures and logs the entry count of specified Lua module tables. Useful for detecting table leaks or monitoring memory growth (e.g. router or plugin state that is expected to be stable).

Attributes (table_count):

Name Type Default Description
lua_modules array Lua module paths to measure (e.g. ["apisix.router"]).
interval integer 5 Seconds between measurements.
depth integer 10 Recursion depth when counting sub-tables.
scopes array ["worker","privileged agent"] APISIX process types in which this sub-plugin runs.

Enable Plugin

Add toolset to the plugins list in config.yaml:

plugins:
  - toolset

Then edit apisix/plugins/toolset/config.lua to activate the desired sub-plugins. The default file ships with all sub-plugins disabled. Example full configuration:

return {
    trace = {
        rate = 10,              -- sample 10% of requests
        hosts = { "*.example.com" },
        paths = { "/api/*" },
        gen_uid = true,
        vars = { "remote_addr", "upstream_addr" },
        timespan_threshold = 0.5  -- only log requests slower than 500ms
    },
    table_count = {
        lua_modules = { "apisix.router" },
        interval = 30,
        depth = 5,
        scopes = { "worker" }
    }
}

Changes take effect within one second — no APISIX restart required.

To disable a sub-plugin at runtime, remove its key or set lua_modules to {} and save the file.

Example Usage

1. Tracing slow requests across a specific host

Edit apisix/plugins/toolset/config.lua:

return {
    trace = {
        rate   = 100,
        hosts  = { "api.example.com" },
        timespan_threshold = 1.0   -- only requests slower than 1s
    }
}

When a matching request exceeds 1 second, APISIX writes a timing table to the error log at WARN level:

[toolset/trace] uid=7f3a1c2d-... remote_addr=10.0.0.1
+----------+---------------------------+----------+-------------------------+
| Role     | Phase                     | Timespan | Start time              |
+----------+---------------------------+----------+-------------------------+
| APISIX   | access                    | 3ms      | 2024-01-01 12:00:00.123 |
| APISIX   | _match_route              | 1ms      | 2024-01-01 12:00:00.124 |
| APISIX   | balancer                  | 1ms      | 2024-01-01 12:00:00.125 |
| Upstream | upstream (req + response) | 1200ms   | 2024-01-01 12:00:00.126 |
| APISIX   | header_filter             | 0ms      | 2024-01-01 12:00:00.646 |
| APISIX   | body_filter               | 0ms      | 2024-01-01 12:00:00.646 |
| Client   | response                  | 1ms      | 2024-01-01 12:00:00.647 |
| APISIX   | log                       | 0ms      | 2024-01-01 12:00:00.648 |
+----------+---------------------------+----------+-------------------------+

2. Sampling traffic with path and trace-ID correlation

return {
    trace = {
        rate    = 5,               -- 5% of requests
        paths   = { "/api/v1/*" },
        gen_uid = true,            -- attach generated UID for tracing
        vars    = { "remote_addr" }
    }
}

The gen_uid option ensures every sampled request has a traceable ID even when the client does not send a trace header, making it easy to correlate the log entry with downstream access logs.

3. Monitoring Lua table growth

return {
    table_count = {
        lua_modules = { "apisix.router", "apisix.core.config_etcd" },
        interval    = 60,
        depth       = 5,
        scopes      = { "worker" }
    }
}

Every 60 seconds, each worker logs:

package apisix.router table count is: 1482 for loaded: 1
package apisix.core.config_etcd table count is: 347 for loaded: 1

A growing count between intervals indicates a potential table leak in that module.

Changes

File Description
apisix/plugins/toolset/init.lua Plugin entry point; 1-second timer for config hot-reload
apisix/plugins/toolset/config.lua Default (disabled) sub-plugin configuration
apisix/plugins/toolset/src/trace.lua trace sub-plugin implementation
apisix/plugins/toolset/src/table-count/init.lua table_count sub-plugin implementation
conf/config.yaml.example Added toolset entry (commented out, disabled by default)
docs/en/latest/plugins/toolset.md English documentation
docs/zh/latest/plugins/toolset.md Chinese documentation
t/plugin/toolset.t / t/plugin/trace*.t / t/plugin/table-count.t Test suite

The toolset plugin is a diagnostics and observability framework that
hosts multiple lightweight sub-plugins, each independently configured
via plugin_attr and dynamically loaded/unloaded at runtime.

Sub-plugins included:
- trace: instruments APISIX request phases and logs a timing table for
  sampled requests, supports host/path filtering, sampling rate,
  trace header detection, and minimum timespan threshold
- table_count: periodically measures and logs the entry count of
  specified Lua module tables, useful for monitoring memory growth

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. doc Documentation things enhancement New feature or request plugin labels May 26, 2026
AlinsRan and others added 2 commits May 27, 2026 04:15
The timer returned by ngx.timer.at() was not checked, causing silent
failure if the timer could not be created (e.g. resource exhaustion).
Apply the same error-checking pattern used in the sync() function.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- t/plugin/toolset.t: sync loop and config reload tests
- t/plugin/trace.t: phase timing, rate sampling, threshold tests
- t/plugin/trace.host.t: host glob pattern matching tests
- t/plugin/trace.path.t: path glob pattern matching tests
- t/plugin/trace.headers.t: trace header detection and vars tests
- t/plugin/trace.dns.t: DNS resolve phase timing test
- t/plugin/table-count.t: Lua module table counting tests
- t/table-count-example.lua: test fixture for table-count tests
- docs/zh/latest/plugins/toolset.md: Chinese documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels May 27, 2026
…dition

- Add Apache 2.0 license headers to all plugin Lua files and test files
- Add toolset plugin install entries to Makefile so luarocks installs
  all plugin files including the src/ and src/table-count/ subdirectories
- Fix race condition in sync(): check stop_timer at function entry to
  prevent a scheduled sync() from re-initializing sub-plugins after
  destroy() has already cleared the cache and set stop_timer = true

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new global toolset plugin intended as a diagnostics/observability host that can dynamically load/unload lightweight sub-plugins (notably trace and table_count), along with tests, docs, and packaging/config updates.

Changes:

  • Added toolset plugin framework with sub-plugins for request phase timing logs (trace) and periodic Lua table size counting (table_count).
  • Added t::APISIX tests for trace, table_count, and toolset reload/sync behavior.
  • Updated defaults/docs/navigation and installation packaging to include the new plugin and its configuration examples.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
t/table-count-example.lua Test helper module used by table_count tests (tables, circular refs, deep nesting).
t/plugin/trace.t End-to-end tests for trace table output, sampling, and reload behavior.
t/plugin/trace.path.t Tests glob path matching behavior for trace allowlist.
t/plugin/trace.host.t Tests host matching behavior for trace allowlist.
t/plugin/trace.headers.t Tests trace header detection, var capture, and UUID generation.
t/plugin/trace.dns.t Tests DNS resolve phase instrumentation logging.
t/plugin/toolset.t Tests toolset sync loop and reload behavior with config changes.
t/plugin/table-count.t Tests periodic module table counting, circular detection, and depth enforcement.
Makefile Installs toolset plugin Lua files into the runtime Lua dir.
docs/zh/latest/plugins/toolset.md Chinese documentation for toolset plugin configuration and examples.
docs/zh/latest/config.json Adds toolset doc entry to the Chinese docs sidebar/nav.
docs/en/latest/plugins/toolset.md English documentation for toolset plugin configuration and examples.
docs/en/latest/config.json Adds toolset doc entry to the English docs sidebar/nav.
conf/config.yaml.example Adds toolset to plugin list (commented) and documents plugin_attr.toolset examples.
apisix/plugins/toolset/src/trace.lua Implements trace sub-plugin by instrumenting APISIX phases and logging a timing table.
apisix/plugins/toolset/src/table-count/init.lua Implements periodic Lua module table counting with depth/cycle handling and scope filtering.
apisix/plugins/toolset/init.lua Toolset entry point: periodic sync loop and dynamic load/unload of sub-plugins.
apisix/plugins/toolset/config.lua Default sub-plugin configuration values (currently loaded via require).
apisix/cli/config.lua Adds default plugin_attr.toolset configuration in CLI defaults.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread apisix/plugins/toolset/src/trace.lua Outdated
Comment thread apisix/plugins/toolset/src/trace.lua Outdated
Comment thread apisix/plugins/toolset/src/trace.lua
Comment thread apisix/plugins/toolset/src/trace.lua Outdated
Comment thread apisix/plugins/toolset/src/trace.lua
Comment thread t/plugin/trace.t Outdated
Comment thread docs/en/latest/plugins/toolset.md
Comment thread docs/zh/latest/plugins/toolset.md
Comment thread t/table-count-example.lua Outdated
Comment thread t/plugin/trace.t Outdated
AlinsRan and others added 8 commits May 27, 2026 14:41
…ondition

plugins/reload uses events:post() which is asynchronous - it returns 200
before destroy() is called. Without a sleep, the /hello request can arrive
before trace.destroy() restores the phase hooks, causing spurious trace: logs.

Use ngx.sleep(2) consistent with other reload tests in the test suite.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- trace: escape regex metacharacters in glob patterns before * substitution
- trace: replace unique_random() pool-depletion logic with math.random(100)
- trace: fix package.loaded reset from false to nil to allow require() reload
- trace: return match_route result to preserve router semantics
- trace: restore dns.resolve in destroy() to prevent stacking on reload
- toolset: change sync log from info to debug to avoid log flooding
- table-count: reset stop=false in init() so plugin can be re-enabled
- table-count: fix depth default from 1 to 10 to match config/docs
- test: fix undefined 'message' variable in trace.t TEST 3
- test: remove dead read-then-overwrite pattern in trace.t TEST 1
- test: fix comment typo in table-count-example.lua
- docs: add canonical link to EN/ZH plugin documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…lid plugin_attr

The toolset plugin reads configuration from apisix/plugins/toolset/config.lua,
not from plugin_attr in config.yaml. The plugin_attr entries added to
cli/config.lua and config.yaml.example were never read by any code.

- Remove plugin_attr.toolset from cli/config.lua
- Remove plugin_attr.toolset example from conf/config.yaml.example
- Rewrite EN/ZH docs to document config.lua as the configuration method
- Restore sync log from debug to info (required by toolset.t TEST 1)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ua load failure

Statements after 'return' in Lua 5.1 are a parse error, causing the entire
trace.lua module to fail loading and all trace tests to fail. Fix by capturing
return values with table.pack, measuring timing, then returning with unpack.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
luacheck with std=ngx_lua does not recognize table.pack/table.unpack.
Replace with a single return value capture, which is correct since
router.match in APISIX http router does not return meaningful values
(the caller in init.lua ignores the return value entirely).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…uard

- Restore pool-based unique_random() with repopulation when pool is
  exhausted, ensuring exactly 'rate' out of every 100 requests are
  sampled (required by TEST 14 deterministic assertion)
- Add trace_active flag set true in init() and false in destroy() so
  that the access_phase wrapper skips sampling after plugin is removed,
  preventing stale trace logs from appearing in TEST 3

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
After destroy() clears the cache and sets stop_timer=true, the
pending sync timer still fires and sees an empty cache, triggering
re-init of all sub-plugins. Guard against this by returning at the
start of sync() when stop_timer is already true.

EE already had this guard; this adds it to the OSS backport.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…worker race

External httpc:request_uri() can be dispatched to a different nginx worker
that hasn't yet processed the plugin reload event, causing trace: to still
appear in logs. ngx.location.capture() is always handled by the same worker
as the /t handler, which has already called destroy() after the 2-second
sleep, guaranteeing trace_active=false at request time.
@AlinsRan AlinsRan closed this Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Documentation things enhancement New feature or request plugin size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants