Skip to content

ssh: fix mlkem768x25519 hybrid shared secret encoding#11209

Open
cole-christensen wants to merge 1 commit into
erlang:maintfrom
cole-christensen:fix/ssh-mlkem-hybrid-x25519-secret-encoding
Open

ssh: fix mlkem768x25519 hybrid shared secret encoding#11209
cole-christensen wants to merge 1 commit into
erlang:maintfrom
cole-christensen:fix/ssh-mlkem-hybrid-x25519-secret-encoding

Conversation

@cole-christensen
Copy link
Copy Markdown

@cole-christensen cole-christensen commented Jun 6, 2026

Closes #11208

Summary

mlkem768x25519-sha256 key exchange intermittently fails (~1 in 512 handshakes)
with incorrect signature on the peer. hybrid_common/4 reconstructed the
classical X25519 shared secret by taking the last ?X25519_PUBLICKEY_SIZE bytes
of its mpint encoding. Because mpint is minimal-length, a shared secret whose
most significant byte is 0x00 and whose next byte is < 0x80 encodes to a
31-byte payload, and "the last 32 bytes" then includes a byte of the 4-byte
length prefix (0x1f) in place of the genuine leading 0x00. The two peers hash
different octet strings, derive a different exchange hash H, and the signature
check fails.

This encodes the ECDH shared secret as a fixed-width big-endian octet string,
preserving leading zero bytes.
draft-ietf-sshm-mlkem-hybrid-kex
§2.4 specifies this encoding — the component secrets are hashed as fixed-length
big-endian byte arrays — and OpenSSH's kexmlkem768x25519.c matches it.

 hybrid_common(K_pq_secret, Curve, PeerPublic, MyPrivate) ->
     K_cl_secret = compute_key(ecdh, PeerPublic, MyPrivate, Curve),
-    K_cl_secret_mpint = <<?Empint(K_cl_secret)>>,
-    K_cl_secret_mpint_trim =
-        binary:part(K_cl_secret_mpint, byte_size(K_cl_secret_mpint), -?X25519_PUBLICKEY_SIZE),
-    crypto:hash(sha(Curve), <<K_pq_secret/binary, K_cl_secret_mpint_trim/binary>>).
+    K_cl_secret_fixed = <<K_cl_secret:(?X25519_PUBLICKEY_SIZE*8)/big-unsigned-integer>>,
+    crypto:hash(sha(Curve), <<K_pq_secret/binary, K_cl_secret_fixed/binary>>).

The new encoding is byte-for-byte identical to the old one in every case except
the previously-broken one, so it only changes behaviour where the handshake was
already failing.

Probability / impact

P(trigger) = P(secret[0]==0x00) × P(secret[1] < 0x80) = 1/256 × 1/2 =
1/512 ≈ 0.195% per mlkem768x25519-sha256 handshake. At any non-trivial
connection volume this is a frequent, hard-to-diagnose interop failure against
OpenSSH (which is correct).

Introduced in 95d0848c60 ("ssh: MLKEM kex ssh"), released in OTP-28.4.

Verification

  • Deterministic unit check (<<0,1,0:240>> → old code yields 1F01…, fixed
    yields 0001…). See linked issue.

  • Monte-Carlo: 300k random secrets, old-vs-fixed mismatch rate 0.194%, every
    mismatch satisfies secret[0]==0x00 ∧ secret[1] < 0x80.

  • End-to-end against OpenSSH 10.3p1, BOTH roles, with the hybrid code
    instrumented to count how many handshakes actually hit the trigger ("case C")
    so success is not luck-of-the-draw:

    Role Mode Trigger handshakes Failures
    OTP daemon ← OpenSSH client stock 20 20 (1:1)
    OTP daemon ← OpenSSH client fixed 45 0
    OTP client → OpenSSH sshd stock 21 21 (1:1)
    OTP client → OpenSSH sshd fixed 31 0
  • Regression test in the suite (ssh_algorithms_SUITE:mlkem768x25519_hybrid_secret_encoding):
    passes with the fix, fails against the stock encoding. Run via the supported
    framework: ts:run(ssh, ssh_algorithms_SUITE, mlkem768x25519_hybrid_secret_encoding, [batch])
    TEST COMPLETE, 1 ok, 0 failed.

  • Full ssh_algorithms_SUITE: TEST COMPLETE, 930 ok, 0 failed of 930

The classical X25519 part of the mlkem768x25519-sha256 hybrid key
exchange shared secret was reconstructed by taking the last 32 bytes of
the mpint encoding of the ECDH secret. mpint is a minimal-length
big-endian representation, so when the shared secret's most significant
byte is 0x00 and the following byte is < 0x80 the payload is only 31
bytes and "the last 32 bytes" pulls in a byte of the 4-byte length
prefix (0x1f) in place of the genuine leading 0x00.

hybrid_common/4 then hashes a different octet string than the peer
(e.g. OpenSSH), producing a different exchange hash H. The server signs
an H the client never computed, and the client aborts the connection
with "incorrect signature". This affects roughly 1 in 512 handshakes
(P[msb=0x00] * P[next<0x80] = 1/256 * 1/2).

Encode the ECDH shared secret as a fixed-width big-endian octet string
so a genuine leading 0x00 byte is preserved, matching the peer.

Add a regression test (ssh_algorithms_SUITE:mlkem768x25519_hybrid_secret_encoding)
that searches for an X25519 key pair whose ECDH secret triggers the
condition and asserts hybrid_common/4 hashes it as a fixed-width 32-byte
octet string. The test fails against the old code and passes with the fix.

Introduced in 95d0848 (ssh: MLKEM kex ssh); present since OTP-28.4.
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Jun 6, 2026

CLA assistant check
All committers have signed the CLA.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

CT Test Results

    2 files     29 suites   26m 5s ⏱️
  503 tests   497 ✅  6 💤 0 ❌
1 712 runs  1 684 ✅ 28 💤 0 ❌

Results for commit 869e6b5.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@Mikaka27 Mikaka27 added team:PS Assigned to OTP team PS testing currently being tested, tag is used by OTP internal CI labels Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

team:PS Assigned to OTP team PS testing currently being tested, tag is used by OTP internal CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants