Skip to content

SnapCap TCP Connection Reliability Improvements#2381

Closed
miceno wants to merge 12 commits into
indilib:masterfrom
miceno:feat/retry-snapcap
Closed

SnapCap TCP Connection Reliability Improvements#2381
miceno wants to merge 12 commits into
indilib:masterfrom
miceno:feat/retry-snapcap

Conversation

@miceno

@miceno miceno commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

SnapCap TCP Connection Reliability Improvements

Problem Statement

SnapCap communication can degrade when:

  • The controller reboots unexpectedly.
  • A TCP route is temporarily interrupted.
  • The socket remains open but the endpoint is no longer responsive.

Without recovery logic, manual disconnect/reconnect is required from the client.

Proposed Design

The driver now uses a reconnect abstraction:

  • SnapCap::ReconnectInterface in drivers/auxiliary/snapcap.h
  • SnapCapReconnect implementation in drivers/auxiliary/snapcap.cpp

Reconnect is non-blocking and driven from TimerHit(), so command handlers do not sleep-loop while recovering.

Configurable policy from INDI UI (RECONNECT_POLICY)

  • Exposed on CONNECTION_TAB as property RECONNECT_POLICY.
  • UI labels are now explicit:
    • FAILURES_BEFORE_RECONNECT shows as Max Failures
    • DELAY_MS shows as Base delay (ms)
  • Fields:
    • FAILURES_BEFORE_RECONNECT (MIN_FAILURES_BEFORE_RECONNECT..MAX_FAILURES_BEFORE_RECONNECT, default 2)
      • Meaning: consecutive command-failure threshold before reconnect is scheduled.
      • It does not limit the total number of reconnect attempts.
    • DELAY_MS (MIN_RECONNECT_DELAY_MS..MAX_RECONNECT_DELAY_SETTING_MS, default 500)
      • Meaning: base delay used by reconnect backoff after a failed reconnect attempt.
      • The first reconnect attempt is scheduled immediately; delay applies to subsequent retries.
  • Values are clamped in ISNewNumber() and persisted with saveConfig(ReconnectNP).
  • When changed, the driver logs a user-facing confirmation message with the interpreted meaning (failure threshold + base backoff delay).

Exponential backoff with cap

  • Delay is computed in computeReconnectDelayMs(attemptCount).
  • Uses base delay from DELAY_MS, doubles by attempt, capped by MAX_RECONNECT_DELAY_MS (4000 ms).
  • Effective sequence after reconnect failures is approximately: 2x, 4x, 8x, ... of DELAY_MS (with cap).

Future work

Use this same strategy for other components that communicate over TCP, like the rolloffino.

@knro

knro commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

Thanks. This would affect any INDI driver, not just SnapCap. What would lead to this exactly? WiFi/Network Controller drops the connection or what?

@miceno

miceno commented Apr 30, 2026

Copy link
Copy Markdown
Contributor Author

The driver is for a home-made SnapCap dust cover. The cover is implemented using a ESP8266 and the connection is by WiFi. From time to time the ESP8266 disconnects from the network or reboots. When it reboots, or disconnects, the connection is lost forever, the driver does not try it again.

If you have automated a session, the session will not succeed, since the cap will never open nor close when required.

@miceno

miceno commented May 9, 2026

Copy link
Copy Markdown
Contributor Author

hi again @knro

does this pr need refinement? will you please advice on what the next step is?

thank you in advance 👍

@knro

knro commented May 10, 2026

Copy link
Copy Markdown
Contributor

Sorry for the delay. I don't think it's the right place to do it at the driver level. This issue should be handled at the plugin level for TCP/UDP connections and then delegated to the concrete drivers to make a decision on how to reconnect..etc. Let me think more about it since this change needs to be at INDI base level.

@miceno

miceno commented May 10, 2026

Copy link
Copy Markdown
Contributor Author

Thank you!

I am planning to get involved in solving the TCP reconnect issue, how can I get involved?

In fact, I have been working also on a solution at the Connection::TCP class (for the Dome interface), but I think it might require more discussion. I will share it in another PR and let's start discussing about it there.

@knro

knro commented May 10, 2026

Copy link
Copy Markdown
Contributor

Exactly, it needs to happen at the Connection::TCP class so that all drivers benefit from this.

@miceno

miceno commented May 10, 2026

Copy link
Copy Markdown
Contributor Author

I will open a new PR with my current approach.

@miceno

miceno commented May 10, 2026

Copy link
Copy Markdown
Contributor Author

Here it is #2390

@knro knro closed this May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants