Skip to content

Fix possible 100% CPU loop in CivetWeb#2882

Open
DL6ER wants to merge 1 commit into
developmentfrom
fix/spinning-civet
Open

Fix possible 100% CPU loop in CivetWeb#2882
DL6ER wants to merge 1 commit into
developmentfrom
fix/spinning-civet

Conversation

@DL6ER
Copy link
Copy Markdown
Member

@DL6ER DL6ER commented May 6, 2026

What does this implement/fix?

Try tiny backoff to avoid tight retry loops on idle HTTPS keep-alive connections


Related issue or feature (if applicable): N/A

Pull request in docs with documentation (if applicable): N/A


By submitting this pull request, I confirm the following:

  1. I have read and understood the contributors guide, as well as this entire template. I understand which branch to base my commits and Pull Requests against.
  2. I have commented my proposed changes within the code.
  3. I am willing to help maintain this change if there are issues with it later.
  4. It is compatible with the EUPL 1.2 license
  5. I have squashed any insignificant commits. (git rebase)

Checklist:

  • The code change is tested and works locally.
  • I based my code and PRs against the repositories development branch.
  • I signed off all commits. Pi-hole enforces the DCO for all contributions
  • I signed all my commits. Pi-hole requires signatures to verify authorship
  • I have read the above and my PR is ready for review.

…connections

Signed-off-by: Dominik <dl6er@dl6er.de>
@DL6ER DL6ER marked this pull request as ready for review May 11, 2026 19:54
@DL6ER DL6ER requested a review from a team as a code owner May 11, 2026 19:54
Copilot AI review requested due to automatic review settings May 11, 2026 19:54
@DL6ER
Copy link
Copy Markdown
Member Author

DL6ER commented May 11, 2026

TODO: Need to submit a PR upstream

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces a small configurable backoff for non-blocking mbedTLS operations in CivetWeb to prevent tight retry loops that can otherwise drive a worker thread to 100% CPU on idle/keep-alive HTTPS connections.

Changes:

  • Define MG_MBEDTLS_WANT_RETRY_DELAY_MS (default: 5ms) as a tunable backoff interval.
  • Sleep briefly in the mbedTLS handshake loop when WANT_READ/WRITE (or async-in-progress) is returned.
  • Sleep briefly in the mbedTLS read path when WANT_READ/WRITE (or async-in-progress) is returned after a poll-readability event.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/webserver/civetweb/mod_mbedtls.inl Adds a tiny sleep in the mbedTLS handshake retry loop to avoid spinning on non-blocking sockets.
src/webserver/civetweb/civetweb.c Introduces the backoff macro and applies it in the mbedTLS read path when WANT_* is returned.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Member

@yubiuser yubiuser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should such a change not better be a patch at https://github.com/pi-hole/FTL/tree/master/patch/civetweb

@rdwebdesign
Copy link
Copy Markdown
Member

I think this is the intention:

TODO: Need to submit a PR upstream

@gkuchta
Copy link
Copy Markdown

gkuchta commented May 24, 2026

FWIW, I think I maybe ran into this after issuing a request to the admin UI (clean session; was just going to add a blocklist entry). I saw a single pihole-FTL thread spike to, and stay at, 100% cpu use. From strace:

3185 20:44:39.629235 <... select resumed>) = 0 (Timeout) <0.048528> 3185 20:44:39.629253 select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=100000} <unfinished ...> 3191 20:44:39.629265 poll([{fd=35, events=POLLIN}, {fd=36, events=POLLIN}, {fd=37, events=POLLIN}, {fd=38, events=POLLIN}, {fd=34, events=POLLIN}], 5, 2000) = 4 ([{fd=35, revents=POLLIN}, {fd=36, revents=POLLIN}, { fd=37, revents=POLLIN}, {fd=38, revents=POLLIN}]) <0.000007> 3191 20:44:39.629292 poll([{fd=35, events=POLLIN}, {fd=36, events=POLLIN}, {fd=37, events=POLLIN}, {fd=38, events=POLLIN}, {fd=34, events=POLLIN}], 5, 2000) = 4 ([{fd=35, revents=POLLIN}, {fd=36, revents=POLLIN}, { fd=37, revents=POLLIN}, {fd=38, revents=POLLIN}]) <0.000007> 3191 20:44:39.629319 poll([{fd=35, events=POLLIN}, {fd=36, events=POLLIN}, {fd=37, events=POLLIN}, {fd=38, events=POLLIN}, {fd=34, events=POLLIN}], 5, 2000) = 4 ([{fd=35, revents=POLLIN}, {fd=36, revents=POLLIN}, { fd=37, revents=POLLIN}, {fd=38, revents=POLLIN}]) <0.000007>

fd35 = 0.0.0.0:80
fd36 = 0.0.0.0:443
fd37 = [::]:80
fd38 = [::]:443
poll() returns POLLIN for all four listener fds in ~7us
FTL performs no accept/read/write (or any other calls) between polls; basically just an infinite poll() loop.
Recv-Q remains nonzero (Recv-Q was at 2) on listeners
no inbound 80/443 traffic observed via tcpdump
dns resolution continued without interruption

If I run into it again I can try to get some more useful info via gdb or something, but it's just my home network so I just dumped what info I could and HUP'd the process

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants