diff --git a/.agents/threat_model/THREAT_MODEL.md b/.agents/threat_model/THREAT_MODEL.md new file mode 100644 index 0000000000000..6c4aeb4dde521 --- /dev/null +++ b/.agents/threat_model/THREAT_MODEL.md @@ -0,0 +1,121 @@ +# Threat Model: Envoy Proxy + +## 1. System context + +Envoy is a CNCF-graduated L3/L4 and L7 proxy written in modern C++ (with growing Rust components), built with Bazel, and deployed as an **edge/front proxy** terminating untrusted Internet traffic and as a **service-mesh sidecar** alongside every workload in a microservices fleet (Istio, Cilium, AWS App Mesh, Consul Connect). It speaks HTTP/1.1, HTTP/2, HTTP/3-QUIC, gRPC, **MCP (Model Context Protocol), A2A (Agent-to-Agent)**, and a long tail of L7 protocols (Redis, Thrift, DNS, …) via a filter-chain extension model. The MCP filters (`mcp`, `mcp_router`, `mcp_json_rest_bridge`, `mcp_multicluster`) make Envoy an AI-agent-protocol gateway: aggregating tool lists from multiple MCP backends and routing `tools/call` requests. Configuration is delivered statically via bootstrap YAML or dynamically via the gRPC/REST **xDS** APIs. The `mobile/` subtree ships Envoy as a client-side networking library for iOS/Android. + +This model covers the **upstream envoyproxy/envoy project** as shipped: core, Envoy Mobile, the CI/release pipeline, and — from the alpha tier — the MCP and A2A extensions only. `contrib/`, all other alpha-status extensions, and Windows-specific code paths are out of scope (§5). + +The project publishes its own threat model (`docs/root/intro/arch_overview/security/threat_model.rst`), which this document incorporates with **one owner-directed divergence**: ext_authz and ext_proc side-call responses are treated as **untrusted** here (upstream currently declares side-calls trusted). Other side-calls (rate-limit, ALS, tracers, credential suppliers) remain trusted. Upstream's other assumptions are adopted: downstream and upstream peers are untrusted for core; xDS transport is trusted; Lua/Wasm/dynamic-module code is trusted; the admin interface is operator-only; DoS triggers the security process only at ≥100× amplification or query-of-death on hardened components. + +The codebase is ~3k C++ source files plus ~113 vendored dependencies (BoringSSL, nghttp2, quiche, c-ares, V8, LuaJIT, wasmtime, RE2, abseil, …). It is continuously fuzzed via OSS-Fuzz and has shipped ~80+ first-party CVEs/GHSAs since 2019, dominated by HTTP/2 frame-handling DoS, HTTP filter-chain lifecycle use-after-free, and authn/authz filter bypass. + +## 2. Assets + +| asset | description | sensitivity | +|---|---|---| +| Process & host integrity | Native C++ parsing untrusted wire bytes; memory corruption → RCE in a process with network reach into the entire mesh/edge | critical | +| TLS private keys & session tickets | Server/client cert private keys and TLS session-ticket keys held in-process by SDS/secret manager; leak = passive decrypt + edge impersonation | critical | +| Proxied request/response data | Decrypted bodies, headers, cookies, bearer tokens for every request after TLS termination | critical | +| Service availability | Envoy is the front door (edge) and the only data-path hop (mesh); a parser crash or OOM is a full outage | critical | +| Mobile app process & on-device user data | Envoy Mobile runs inside iOS/Android apps; compromise = app-privilege code exec and access to on-device user data | critical | +| Internal upstream services | Route table + cluster manager give Envoy authenticated reach to every backend, metadata server, and control plane (SSRF pivot) | high | +| xDS configuration integrity | Routes, listeners, clusters, RBAC, filter bytecode delivered via xDS; integrity loss = arbitrary traffic redirection | high | +| mTLS workload identity | SPIFFE SVID validation result; downstream RBAC and upstreams trust it | high | +| AuthN/AuthZ verdicts | jwt_authn claims, ext_authz allow/deny, RBAC decisions propagated as trusted `x-envoy-*` / dynamic metadata | high | +| Injected upstream credentials | OAuth2 client_secret + HMAC key, AWS SigV4 keys, credential-injector secrets attached to outbound requests | high | +| Agent tool-call & capability integrity | MCP/A2A `tools/call` payloads, capability negotiations, and aggregated tool lists that downstream agents act on; tampering = arbitrary tool execution | high | +| Release artifact integrity | `docker.io/envoyproxy/*` images, signed deb/rpm/tarballs, Maven Central / PyPI envoy-mobile packages consumed by the entire ecosystem | high | +| CI/release secrets | GPG maintainer signing key, DockerHub creds, GCP SA keys, Sonatype creds, multiple GitHub App private keys with multi-repo write | high | +| Access logs & telemetry | Request lines, headers, JWT subjects, client IPs to file/gRPC/OTel sinks; PII-bearing and an integrity target (log injection) | medium | +| Internal trust headers | `x-envoy-*`, XFF, `x-request-id` that upstreams treat as authoritative; failure to strip = trust-boundary bypass | medium | + +## 3. Entry points & trust boundaries + +| entry_point | description | trust_boundary | reachable_assets | +|---|---|---|---| +| Downstream TCP listener | Socket accept → listener-filter chain → network-filter `onData`; first touch of untrusted bytes (`source/common/listener_manager/active_tcp_listener.cc`) | untrusted Internet client → worker thread | Process & host integrity, Service availability, Proxied request/response data | +| Downstream UDP/QUIC ingest | UDP `recvmmsg` → quiche dispatcher; pre-handshake, source-spoofable (`source/common/quic/active_quic_listener.cc`) | spoofable Internet UDP → process memory | Process & host integrity, Service availability | +| HTTP/1 codec | Balsa / http-parser tokenises request-line, headers, chunked TE (`source/common/http/http1/codec_impl.cc`) | untrusted downstream bytes → HCM | Process & host integrity, Proxied request/response data, AuthN/AuthZ verdicts, Internal trust headers | +| HTTP/2 codec | nghttp2/oghttp2 HPACK decode, stream mux, flood limits, METADATA (`source/common/http/http2/codec_impl.cc`) | untrusted downstream bytes → HCM | Process & host integrity, Service availability, Proxied request/response data | +| HTTP/3/QUIC codec | quiche QPACK + H3 frame parsing, CONNECT-UDP datagrams (`source/common/quic/envoy_quic_server_stream.cc`) | untrusted downstream bytes → HCM | Process & host integrity, Service availability | +| Listener filters (TLS-inspector / PROXY-protocol) | Hand-rolled parsers peek raw bytes pre-filter-chain to extract SNI/ALPN or PROXY v1/v2 TLVs (`source/extensions/filters/listener/{tls_inspector,proxy_protocol}/`) | untrusted downstream bytes → filter-chain selection | Process & host integrity, Internal trust headers, AuthN/AuthZ verdicts | +| TLS handshake & cert validation | BoringSSL handshake; client-cert chain verify, SAN/SPKI match, OCSP ASN.1 parse (`source/common/tls/cert_validator/`) | untrusted peer X.509 → trust decision | mTLS workload identity, TLS private keys & session tickets, Process & host integrity | +| Header & path normalization | Header storage/validation, `:path` canonicalisation, `%2F`/slash-merge (`source/common/http/path_utility.cc`, `header_utility.cc`) | attacker-controlled headers → security decisions | AuthN/AuthZ verdicts, Internal upstream services, Internal trust headers | +| Route matching & regex | VirtualHost/route eval; tenant-supplied RE2 patterns evaluated on attacker input (`source/common/router/config_impl.cc`, `source/common/common/regex.cc`) | untrusted headers × semi-trusted config → routing/auth | Internal upstream services, AuthN/AuthZ verdicts, Service availability | +| Non-HTTP L7 codecs (core) | Hand-written wire decoders: Redis, Thrift, Dubbo, Mongo, ZooKeeper (`source/extensions/filters/network/*/decoder*.cc`) | untrusted downstream bytes → proxy logic | Process & host integrity, Service availability | +| DNS UDP filter (server) | Envoy-as-DNS-server parses arbitrary queries (`source/extensions/filters/udp/dns_filter/dns_parser.cc`) | spoofable UDP → DNS parser | Process & host integrity, Service availability | +| MCP / A2A agent-protocol filters | JSON-RPC request body parsing + multi-backend response aggregation, session routing (`source/extensions/filters/http/{mcp,mcp_router,a2a}/`, `mcp_json_parser.cc` 23K, `mcp_router.cc` 60K) | untrusted downstream agent ↔ untrusted upstream MCP/A2A server | Process & host integrity, Agent tool-call & capability integrity, Internal upstream services, Service availability | +| xDS control-plane ingestion | gRPC/REST/filesystem `DiscoveryResponse` → proto unpack → live reconfig (`source/extensions/config_subscription/grpc/grpc_mux_impl.cc`) | management server (trusted transport, semi-trusted payload) → data-plane config | xDS configuration integrity, Service availability, Process & host integrity | +| Upstream response path | Envoy-as-client parses untrusted upstream HTTP/1/2/3 responses (`source/common/router/upstream_request.cc`, `source/common/http/codec_client.cc`) | malicious/compromised upstream → process memory & downstream | Process & host integrity, Proxied request/response data, Service availability | +| ext_proc / ext_authz responses | Side-call gRPC returns header/body mutations or allow/deny that Envoy applies verbatim (`source/extensions/filters/http/ext_proc/mutation_utils.cc`, `source/extensions/filters/common/ext_authz/`) | **untrusted** side-call service → request mutation & filter-chain state | AuthN/AuthZ verdicts, Proxied request/response data, Internal trust headers, Process & host integrity | +| Decompression & transcoding | gzip/brotli/zstd inflate of attacker bodies; gRPC↔JSON transcoder (`source/extensions/compression/*/decompressor/`, `source/extensions/filters/http/grpc_json_transcoder/`) | untrusted body bytes → heap | Service availability, Process & host integrity | +| JWT / OAuth2 authn filters | Parse attacker-supplied JWT (b64/JSON/sig), OAuth2 redirect/state/cookies, fetch JWKS (`source/extensions/filters/http/{jwt_authn,oauth2}/`) | untrusted credential bytes → auth decision | AuthN/AuthZ verdicts, Injected upstream credentials, Internal upstream services | +| Access-log formatter | `%REQ()%` / `%RESP()%` / `%CEL()%` interpolate attacker headers into log sinks (`source/common/formatter/substitution_formatter.cc`) | untrusted headers → log sinks / SIEM | Access logs & telemetry | +| DNS resolver (client) | c-ares / getaddrinfo / Apple / hickory parse responses from upstream resolver (`source/extensions/network/dns_resolver/cares/dns_impl.cc`) | untrusted DNS responder → cluster endpoints | Process & host integrity, Internal upstream services | +| Envoy Mobile upstream ingest | Envoy Mobile (iOS/Android client library) parses responses from arbitrary servers; trust model inverted — server is the attacker (`mobile/library/common/`) | malicious origin server → mobile app process | Mobile app process & on-device user data, Proxied request/response data | +| Bazel dependency fetch | ~113 `http_archive` deps fetched at build, SHA256-pinned (`bazel/repository_locations.bzl`) | upstream maintainer / mirror → compiled binary | Release artifact integrity, Process & host integrity | +| GitHub Actions CI pipeline | `pull_request_target` → `workflow_run` chain executes PR code; `trusted` flag computed in `_load.yml` gates secrets (`.github/workflows/_run.yml`) | external contributor → CI runner with org secrets | CI/release secrets, Release artifact integrity | +| Release artifact publish | Docker push (no cosign, `--provenance=false`), GPG-signed tarballs, Sonatype/PyPI upload; PyPI action branch-pinned (`distribution/docker/build.sh`, `.github/workflows/mobile-release.yml`) | CI runner → public registries | Release artifact integrity | + +## 4. Threats + +| id | threat | actor | surface | asset | impact | likelihood | status | controls | evidence | +|---|---|---|---|---|---|---|---|---|---| +| T1 | Memory corruption (use-after-free) leading to RCE via stream-reset / async-callback ordering races in the HTTP filter chain | remote_unauth | HTTP/1 codec, HTTP/2 codec, HTTP/3/QUIC codec, Upstream response path | Process & host integrity | critical | almost_certain | partially_mitigated | OSS-Fuzz, ASAN CI, deferred-deletion idiom | CVE-2026-26311, CVE-2026-26330, CVE-2024-45810, CVE-2023-35943, CVE-2023-35942, CVE-2022-29227, 7be853f757 | +| T2 | Authentication / authorization bypass via auth-filter logic or header-handling flaws | remote_unauth | JWT / OAuth2 authn filters, Header & path normalization, ext_proc / ext_authz responses, Listener filters (TLS-inspector / PROXY-protocol) | AuthN/AuthZ verdicts, Internal upstream services | critical | almost_certain | partially_mitigated | header sanitization in HCM, RBAC filter | CVE-2022-29226, CVE-2023-35941, CVE-2026-26308, CVE-2020-25017, CVE-2024-45806, CVE-2025-55162, CVE-2024-23322 | +| T3 | Memory corruption leading to RCE via hand-written non-HTTP L7 protocol parsers | remote_unauth | Non-HTTP L7 codecs (core), DNS UDP filter (server), Listener filters (TLS-inspector / PROXY-protocol) | Process & host integrity | critical | likely | partially_mitigated | per-extension `security_posture` tag; OSS-Fuzz on subset | CVE-2024-23322, CVE-2024-23323, CVE-2024-23324, CVE-2024-23325, CVE-2026-26310, CVE-2019-18801 | +| T4 | Memory corruption or crash via crafted HTTP/3 / QUIC stream-state transitions | remote_unauth | Downstream UDP/QUIC ingest, HTTP/3/QUIC codec | Process & host integrity, Service availability | critical | likely | partially_mitigated | quiche owned/fuzzed by Google; QUIC behind feature flag in many deploys | CVE-2024-32974, CVE-2024-32976, CVE-2024-34362 | +| T29 | Agent tool-call tampering, session confusion, or memory corruption via MCP/A2A JSON-RPC parsing and router response construction | remote_unauth | MCP / A2A agent-protocol filters | Process & host integrity, Agent tool-call & capability integrity, Internal upstream services | critical | likely | unmitigated | alpha tag only; no fuzz target | | +| T5 | mTLS / TLS identity spoofing via certificate-validation edge cases | remote_unauth | TLS handshake & cert validation | mTLS workload identity, Internal upstream services | critical | possible | partially_mitigated | BoringSSL; SPIFFE validator; SAN matchers | CVE-2025-66220, CVE-2023-0286 | +| T26 | Memory corruption or crash via malformed ext_proc / ext_authz gRPC response applied by `mutation_utils` | adjacent_network | ext_proc / ext_authz responses | Process & host integrity | critical | possible | unmitigated | none (path written assuming trusted input; no fuzz target) | | +| T28 | Mobile-app RCE or data exfiltration via malicious server response to Envoy Mobile client | remote_unauth | Envoy Mobile upstream ingest | Mobile app process & on-device user data | critical | possible | partially_mitigated | shares hardened core codecs; iOS/Android process sandbox | | +| T8 | CI secret exfiltration and signed-release takeover via `pull_request_target` trust-flag confusion or toolshed-action compromise | supply_chain | GitHub Actions CI pipeline | CI/release secrets, Release artifact integrity | critical | possible | partially_mitigated | actions SHA-pinned (one exception); `external-contributors` env gate; `trusted` flag in `_load.yml` | | +| T9 | Malicious release binary via EngFlow remote build-cache poisoning | supply_chain | GitHub Actions CI pipeline, Bazel dependency fetch | Release artifact integrity | critical | rare | partially_mitigated | EngFlow tenant ACL (out of tree); Bazel action hashing | | +| T11 | Request smuggling / cache poisoning via HTTP/1 parser differential between Envoy and upstream | remote_unauth | HTTP/1 codec, Upstream response path, Header & path normalization | Proxied request/response data, AuthN/AuthZ verdicts, Internal upstream services | high | almost_certain | partially_mitigated | Balsa parser; strict header checks (operator-configurable) | CVE-2019-9900, CVE-2019-18802, CVE-2024-45809, CVE-2024-34363, CVE-2023-35944, CVE-2025-64763, CVE-2020-25018 | +| T10 | Resource exhaustion (CPU/memory) via crafted HTTP/2 frame sequences | remote_unauth | HTTP/2 codec, Downstream TCP listener | Service availability | high | almost_certain | partially_mitigated | `protocol_constraints.cc` flood limits; overload manager (operator-configured, off by default) | CVE-2024-30255, CVE-2023-44487, CVE-2020-11080, CVE-2020-12603, CVE-2020-12604, CVE-2020-12605, CVE-2023-35945, CVE-2026-27135 | +| T12 | Route / RBAC policy bypass and SSRF to internal services via path-normalization or matcher edge cases | remote_unauth | Header & path normalization, Route matching & regex | Internal upstream services, AuthN/AuthZ verdicts | high | likely | partially_mitigated | `normalize_path`, `merge_slashes`, `path_with_escaped_slashes_action` (operator-configured) | CVE-2019-9901, CVE-2023-27487, CVE-2020-25017 | +| T13 | Memory / CPU exhaustion via decompression bomb or transcoder amplification | remote_unauth | Decompression & transcoding | Service availability | high | likely | partially_mitigated | per-decompressor output-size limits (added post-CVE); buffer watermarks | CVE-2022-29225, CVE-2024-32475 | +| T14 | Memory corruption or cluster-endpoint poisoning via malicious DNS responses | adjacent_network | DNS resolver (client) | Process & host integrity, Internal upstream services | high | likely | partially_mitigated | c-ares maintained upstream; hickory (Rust) alternative | GHSA-fg9g-pvc4-776f, CVE-2025-31498, CVE-2023-32067, CVE-2023-31147, GHSA-g9vw-6pvx-7gmw | +| T15 | Crash, memory corruption, or response smuggling via malicious upstream responses | remote_auth | Upstream response path | Process & host integrity, Proxied request/response data, Service availability | high | likely | partially_mitigated | core declared robust-to-untrusted-upstream; same codec hardening as downstream | CVE-2022-29224, CVE-2023-35945, CVE-2024-45809, CVE-2024-34364 | +| T18 | Request/response tampering and auth-verdict forgery via untrusted ext_proc / ext_authz side-call | adjacent_network | ext_proc / ext_authz responses | Proxied request/response data, AuthN/AuthZ verdicts, Internal trust headers | high | likely | partially_mitigated | `mutation_rules` allowlist (opt-in); `allowed_headers`/`disallowed_headers` on ext_authz (opt-in) | | +| T27 | Resource exhaustion via unbounded ext_proc body mutation, header-set size, or stream hold-open | adjacent_network | ext_proc / ext_authz responses | Service availability | high | likely | partially_mitigated | `message_timeout`; per-message size limits (partial) | | +| T16 | Fleet-wide crash or resource exhaustion via malformed / pathological xDS configuration | insider | xDS control-plane ingestion, Route matching & regex | Service availability, xDS configuration integrity | high | likely | partially_mitigated | PGV proto validation; config-rejection stats; RE2 (linear-time) for regex | a20c0ab8dd, 2bcebccc0d, CVE-2019-15225, CVE-2019-15226 | +| T17 | Downstream consumer compromise via tampered OCI image (no signature / no SLSA provenance) | supply_chain | Release artifact publish | Release artifact integrity | high | possible | unmitigated | GPG on tarballs only; `--sbom=false --provenance=false` set; no cosign | | +| T19 | Traffic redirection or filter-chain injection via compromised xDS management server | insider | xDS control-plane ingestion | xDS configuration integrity, Proxied request/response data, Internal upstream services | high | possible | risk_accepted | upstream threat model declares xDS transport trusted; mTLS to control plane | | +| T20 | Build-time code injection via compromised upstream dependency tarball | supply_chain | Bazel dependency fetch | Release artifact integrity, Process & host integrity | high | rare | mitigated | all 113 deps SHA256-pinned; `tools/dependency/` validators; CPE tracking | | +| T21 | Log injection / SIEM poisoning via unescaped attacker-controlled fields in access-log output | remote_unauth | Access-log formatter | Access logs & telemetry | medium | likely | partially_mitigated | `json_format` escaping; `JsonEscaper` (itself patched for OOB) | CVE-2024-45808, CVE-2026-26309 | +| T22 | CPU exhaustion via tenant-supplied regex evaluated against attacker input (multi-tenant xDS) | remote_unauth | Route matching & regex, xDS control-plane ingestion | Service availability | medium | possible | partially_mitigated | RE2 default (linear); program-size limit; std::regex removed | CVE-2019-15225 | +| T25 | Reflection / amplification abuse of Envoy QUIC or DNS UDP listeners against third parties | remote_unauth | Downstream UDP/QUIC ingest, DNS UDP filter (server) | Service availability | low | possible | partially_mitigated | QUIC Retry / amplification limit in quiche; DNS filter response-size cap | | + +## 5. Deprioritized + +| threat | reason | +|---|---| +| Admin HTTP interface — process control / info disclosure | Admin endpoint is operator-trusted per upstream threat model; network exposure is operator deployment misconfiguration, not an Envoy bug. Surface removed from §3. | +| Wasm / Lua / dynamic-module host ABI — sandbox escape | Module code is trusted per upstream threat model; sandbox is explicitly not a security boundary. Runtime CVEs (CVE-2023-26489, CVE-2023-27477, CVE-2024-25176/77/78, CVE-2025-53901) are tracked as dependency hygiene, not modeled threats. Surface removed from §3. | +| CLI / env — privilege via process environment | Process spawner (orchestrator/operator) is trusted; no privilege boundary crossed. Surface removed from §3. | +| Hot-restart UDS — FD/state theft by same-netns process | Same-network-namespace local process is trusted; netns isolation is the boundary, not the abstract socket. Surface removed from §3. | +| Bootstrap & file-based config — arbitrary file read via config paths | Config author is trusted per upstream threat model. Surface removed from §3. | +| `contrib/` and alpha-status extensions (except MCP and A2A) | Outside upstream security-team coverage per `EXTENSION_POLICY.md`; owner confirms out of scope for this model. Includes Postgres/MySQL/Kafka/SIP/RocketMQ/Go-filter parsers. Surface removed from §3. | +| Windows-specific code paths | Out of scope for this model per owner. | +| Repudiation of proxied requests | Envoy is not the system of record for non-repudiation; access logs are an asset (§2), but cryptographic non-repudiation is out of scope. | +| Side-channel / timing attacks on TLS private keys | BoringSSL's responsibility; constant-time guarantees inherited. | +| DoS below 100× amplification on hardened components | Upstream explicitly does not treat sub-threshold DoS as a security issue; operators must configure overload manager. | + +## 6. Recommended mitigations + +| mitigation | threat_ids | closes_class | effort | +|---|---|---|---| +| Adopt a uniform deferred-stream-destruction / weak-ref idiom across all async filter callbacks; add a clang-tidy check for raw `this` capture in posted callbacks | T1 | partial | L | +| Strict-by-default HCM header validation (reject duplicate auth-relevant headers, reject CL+TE conflict, reject embedded NUL/CR/LF) independent of UHV | T2, T11 | partial | M | +| Make `normalize_path` / `merge_slashes` / `path_with_escaped_slashes_action: REJECT_REQUEST` the safe default | T12 | partial | S | +| Require `robust_to_untrusted_downstream` posture + a fuzz target before any L7 codec graduates from alpha | T3, T29 | partial | M | +| Ship overload-manager and HTTP/2 flood limits as on-by-default safe baseline rather than operator opt-in | T10, T13, T22 | partial | M | +| Make ext_proc `mutation_rules` deny-by-default; add a fuzz target for the `ProcessingResponse` / `CheckResponse` apply path | T18, T26, T27 | partial | M | +| Replace `StrCat` JSON construction in `mcp_router` with a structured serializer; add fuzz targets for `mcp_json_parser` / `a2a_json_parser` and the backend-response merge path | T29 | partial | M | +| Migrate default DNS resolver from c-ares to hickory (Rust) or getaddrinfo-behind-local-stub | T14 | partial | M | +| Sign OCI images with cosign and emit SLSA provenance; SHA-pin `pypa/gh-action-pypi-publish`; move GCP auth to WIF/OIDC | T8, T17 | partial | S | +| Isolate release builds from PR-triggered cache writes (separate EngFlow cache namespace or `--noremote_upload_local_results` for untrusted) | T9 | yes | S | +| Hard-cap decompressed output as a ratio of input across all decompressors and the gRPC-JSON transcoder | T13 | yes | S | +| Escape/validate all formatter-expanded fields against the sink's grammar (newline-strip for text, full JSON escape for `json_format`) | T21 | yes | S |