Skip to content

fix(dns): Fix response buffer race condition in concurrent queries#1054

Open
abhik-roy85 wants to merge 3 commits into
espressif:masterfrom
abhik-roy85:component/esp_dns_fix
Open

fix(dns): Fix response buffer race condition in concurrent queries#1054
abhik-roy85 wants to merge 3 commits into
espressif:masterfrom
abhik-roy85:component/esp_dns_fix

Conversation

@abhik-roy85

Copy link
Copy Markdown
Collaborator

Description

Moved response_buffer out of the global handle context (esp_dns_handle) and into automatic/local storage within dns_resolve_doh, dns_resolve_dot, and dns_resolve_tcp. This prevents memory corruption and race conditions when multiple DNS queries are running concurrently.

Also included esp_dns_concurrent_test example to validate concurrent lookup behavior.

Related

Testing


Checklist

Before submitting a Pull Request, please ensure the following:

  • 🚨 This PR does not introduce breaking changes.
  • All CI checks (GH Actions) pass.
  • Documentation is updated as needed.
  • Tests are updated or added as necessary.
  • Code is well-commented, especially in complex areas.
  • Git history is clean — commits are squashed to the minimum necessary.

@abhik-roy85 abhik-roy85 self-assigned this May 4, 2026
@abhik-roy85 abhik-roy85 added the dns label May 4, 2026
@abhik-roy85 abhik-roy85 force-pushed the component/esp_dns_fix branch 2 times, most recently from 7a1adb3 to f2b75d9 Compare May 14, 2026 15:32
@abhik-roy85 abhik-roy85 force-pushed the component/esp_dns_fix branch from f2b75d9 to ec6a7b6 Compare May 15, 2026 10:48

@euripedesrocha euripedesrocha left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a partial review, but there are some discussions that might lead to structural changes.

#define ESP_DNS_BUFFER_SIZE 512
/** Maximum number of resource records to scan in a response */
#ifndef ESP_DNS_MAX_ANSWER_SCAN
#define ESP_DNS_MAX_ANSWER_SCAN 16

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be controlled by the user?

err_t status; /* Status of the answer */
ip_addr_t ip; /* IP address from the answer */
} dns_answer_storage_t;
#define ESP_DNS_BUFFER_SIZE 512

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be configurable as well.

* @param response The DNS response to process.
* @param ipaddr An array to store the extracted IP addresses.
* @param response The parsed DNS response
* @param ipaddr Array to store the copied IP addresses

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't better to have user to pass max number of entries?

uint16_t id; /* Transaction ID */
int num_answers; /* Number of valid answers */
dns_answer_storage_t answers[MAX_ANSWERS]; /* Array of answers */
ip_addr_t answers[MAX_ANSWERS]; /* Array of IP addresses */

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a breaking change? I think we are under 1.0, but it is worth to guide users on this.

- **Certificate Errors**:
- Verify that the correct certificate is provided for secure protocols
- For public DNS servers, use the certificate bundle approach
- If you see `No matching trusted root certificate found` when using the certificate bundle (e.g. with `dns.google` or Cloudflare), the server likely uses a cross-signed chain. You must enable `CONFIG_MBEDTLS_CERTIFICATE_BUNDLE_CROSS_SIGNED_VERIFY=y` in your `sdkconfig` (under Component config → mbedTLS → Certificate Bundle → Support cross-signed certificate verification).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How likely is this to happen? Should we have this set by default and let user to opt-out instead?


/* ESP-IDF xTaskCreate() stack size is in bytes. DoT/DoH paths pull in mbedTLS inside
* getaddrinfo(); 4 KiB is too small and triggers a stack overflow on typical builds. */
#define DNS_ADDRINFO_TASK_STACK 8192

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it configurable?


* **Certificate Issues**:
For DoT and DoH protocols, ensure that the certificates are valid for the DNS server you're using. The example includes Google DNS certificates, but these may need to be updated if they expire.
If you are using the Certificate Bundle and see a `No matching trusted root certificate found` error, it is likely due to cross-signed chains (e.g., from Google). Make sure that `CONFIG_MBEDTLS_CERTIFICATE_BUNDLE_CROSS_SIGNED_VERIFY=y` is enabled in your configuration. This can be enabled via `idf.py menuconfig` -> `Component config` -> `mbedTLS` -> `Certificate Bundle` -> `Support cross-signed certificate verification`.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be set as default and let user opt out?

| Supported Targets | ESP32 | ESP32-C2 | ESP32-C3 | ESP32-C5 | ESP32-C6 | ESP32-C61 | ESP32-H2 | ESP32-P4 | ESP32-S2 | ESP32-S3 |
| ----------------- | ----- | -------- | -------- | -------- | -------- | --------- | -------- | -------- | -------- | -------- |

This example stress-tests **`getaddrinfo()`** while the **esp_dns** component owns the lwIP **external resolve** hook. It is meant to surface concurrency bugs documented in [ISSUES_AND_PROGRESS.md](../../ISSUES_AND_PROGRESS.md):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the reference of non-committed file.

@@ -0,0 +1,49 @@
# esp_dns concurrent resolver test

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a test app, should be treated as one. Not on the examples.


This example stress-tests **`getaddrinfo()`** while the **esp_dns** component owns the lwIP **external resolve** hook. It is meant to surface concurrency bugs documented in [ISSUES_AND_PROGRESS.md](../../ISSUES_AND_PROGRESS.md):

- **Issue 1** — shared singleton resolver state (`response_buffer`, transaction id) under concurrent lookups.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we review the architecture and adopt a queue to the requests? This way we would fix the issu by removing the singleton.

Moved `response_buffer` out of the global handle context (`esp_dns_handle`)
and into automatic/local storage within `dns_resolve_doh`, `dns_resolve_dot`,
and `dns_resolve_tcp`. This prevents memory corruption and race conditions
when multiple DNS queries are running concurrently.

Also included `esp_dns_concurrent_test` example to validate concurrent lookup behavior.
- Simplified `dns_response_t` to store `ip_addr_t` directly instead of a nested struct.
- Renamed `esp_dns_extract_ip_addresses_from_response` to `esp_dns_get_ips_from_response` and optimized the answer scanning limits.
- Migrated `esp_dns_basic` and `esp_dns_concurrent_test` examples from `protocol_examples_common` to `net_connect`.
- Increased the `addr_info_task` stack size in `esp_dns_basic` to 8192 bytes to prevent stack overflows during DoT/DoH mbedTLS operations.
- Enabled `CONFIG_MBEDTLS_CERTIFICATE_BUNDLE_CROSS_SIGNED_VERIFY` in example `sdkconfig.defaults` to fix validation errors with cross-signed certificate chains (e.g., Google, Cloudflare).
- Updated READMEs with troubleshooting steps for cross-signed certificate errors.
…494)

Fix ESP-IDF-ESP_DNS-DOS-001 by making esp_dns response parsing
strictly bounds-safe. A malicious DNS response could previously cause
an infinite loop or read past the end of the receive buffer.

- skip_dns_name(): use size_t offset to prevent wraparound, and check
  bounds before reading labels and compression pointers
- esp_dns_parse_response(): reject responses shorter than a DNS header,
  track buffer_end, guard pointer arithmetic, and validate answer
  record headers and RR data before memcpy
- TCP/DoT/DoH callers: require a minimum response size and verify the
  RFC 7858 length prefix before parsing
@abhik-roy85 abhik-roy85 force-pushed the component/esp_dns_fix branch from ec6a7b6 to 135fa98 Compare June 11, 2026 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants