Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/tls: interoperability problems between go tls server and microsoft/outlook.com tls (smtp starttls) client #70232

Open
mjl- opened this issue Nov 6, 2024 · 5 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@mjl-
Copy link

mjl- commented Nov 6, 2024

Go version

go1.23.2 linux/amd64

Output of go env in your module/workspace:

n/a

What did you do?

Deploy mox, a mail server, and successfully get incoming email message deliveries from microsoft (outlook.com, both office365 and personal/free accounts) to mox over SMTP with STARTTLS (crypto/tls server).

What did you see happen?

On October 24 I started receiving "TLS reporting" errors with "validation failure" error in the "sts" (MTA-STS) section. Up to and including October 23 I received TLS reports with only successful delivery attempts. I investigated, but couldn't find anything wrong. Yesterday I learned message deliveries from microsoft (outlook.com servers) to mox were failing. The TLS reporting error message wasn't precise/clear, but there's a good chance it was about these failing deliveries attempts.

The symptoms: I would see an incoming smtp connection, the "starttls" command, and an abrupt close of the connection by remote. Debugging revealed the connection was closed by remote after reading the server-side response the the TLS client hello message, without the remote writing anything in response (EOF while trying to read the first bytes looking for the "client finished" message). During more debugging, I noticed the Go TLS server code sends a session ticket message as part of its response to the client hello message. Setting tls.Config.SessionTicketsDisabled = true prevents the new session ticket from being sent, and makes the Microsoft SMTP STARTTLS command, and delivery of messages, succeed.

At https://datatracker.ietf.org/doc/html/rfc8446#section-4.6.1 I noticed:

At any time after the server has received the client Finished
message, it MAY send a NewSessionTicket message.

One theory: The Go TLS server is sending the NewSessionTicket message too soon, and Microsoft changed their implementation to be more strict about when it allows certain messages.

This isn't specific to mox. Maddy, another mail server written in Go is also seeing TLS interoperability issues with Microsoft/outlook.com. More details:

https://github.com/mjl-/mox/issues/237
foxcpp/maddy#730

What did you expect to see?

The Go TLS session ticket may come too early for some other TLS clients. I did not try changing the crypto/tls code to only send a new session ticket message after having read the client finished message. May be worth trying, to see if that will result in a successful TLS session or sees the same abrupt connection close.

@seankhliao
Copy link
Member

cc @golang/security

@seankhliao seankhliao added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 6, 2024
@ianlancetaylor
Copy link
Contributor

Possibly a case of https://tldr.fail.

@FiloSottile
Copy link
Contributor

At that stage in the handshake tldr.fail is unlikely. We did have an issue I can’t find at the moment about the size of our tickets with MS stacks, but it’s mostly client certificates that make those grow.

@nabeken
Copy link

nabeken commented Nov 8, 2024

@mjl- Thank you for your report. We were also about to report this at #61721 but I found this issue. So I decided to redirect our report here.

We (@stupoid and I) are investigating TLS 1.3 handshake issue with connections from Outlook (Exchange Online). Let us share our findings.

In short: Microsoft's TLS 1.3 implementation seems to terminate a TLS connection during the handshake depending on timing when it receives NewSessionTicket message.

If you are experiencing similar issues with TLS 1.3 handshakes and find that setting SessionTicketsDisabled: true resolves the problem, you might be impacted by this issue.

Below is the details of our findings.

TLS 1.3 Handshake Interoperability Issue

Our server Setup

# go version
go version go1.22.7 linux/amd64

Problem

As @stupoid describes at #61721 (comment), we were encountering EOF errors during TLS 1.3 handshake from connections from outlook.com (Exchange Online). The problem was gone after disabling TLS 1.3 for them.

Observation

We compared how TLS 1.3 handshake went with Go 1.22.7, against an SMTP client and Exchange Online (outlook.com).

The following is a dump of a successful handshake with SMTP client (openssl s_client -tls1_3 -starttls smtp -connect x.x.x.x:25).

13	0.012573	148.109.19.178	10.0.102.69	TLSv1.3	350	Client Hello
14	0.019012	10.0.102.69	148.109.19.178	TLSv1.3	1514	Server Hello, Change Cipher Spec, Encrypted Extensions, Certificate, Certificate Verify, Finished
15	0.019017	10.0.102.69	148.109.19.178	TLSv1.3	207	New Session Ticket
16	0.022057	148.109.19.178	10.0.102.69	TCP	66	47586 → 25 [ACK] Seq=318 Ack=1557 Win=130176 Len=0 TSval=3073220876 TSecr=799896659
17	0.022058	148.109.19.178	10.0.102.69	TCP	66	47586 → 25 [ACK] Seq=318 Ack=1698 Win=130048 Len=0 TSval=3073220876 TSecr=799896659
18	0.023597	148.109.19.178	10.0.102.69	TLSv1.3	130	Change Cipher Spec, Finished
19	0.072053	10.0.102.69	148.109.19.178	TCP	66	25 → 47586 [ACK] Seq=1698 Ack=382 Win=62464 Len=0 TSval=799896713 TSecr=3073220877

The following is a dump of a failed handshake with Exchange Online.

10	0.043841	104.47.23.113	10.0.102.69	TLSv1.3	361	Client Hello
11	0.058433	10.0.102.69	104.47.23.113	TLSv1.3	1514	Server Hello, Change Cipher Spec, Encrypted Extensions, Certificate, Certificate Verify
12	0.058439	10.0.102.69	104.47.23.113	TLSv1.3	248	Finished, New Session Ticket
13	0.068171	104.47.23.113	10.0.102.69	TCP	60	55106 → 25 [ACK] Seq=370 Ack=1792 Win=525568 Len=0
14	0.070915	104.47.23.113	10.0.102.69	TCP	60	55106 → 25 [FIN, ACK] Seq=370 Ack=1792 Win=525568 Len=0
15	0.071059	10.0.102.69	104.47.23.113	SMTP	101	S: 454 TLS not available due to temporary reason

The handshake didn't finish because the client sent FIN packet right after receiving the server's Finished and New Session Ticket.
tls.Conn.Handshake returned a EOF at frame 15.

The key difference is the timing of New Session Ticket message where

  • With SMTP client, New Session Ticket is sent after flushing sending Finished message from the server
  • Exchange Online interaction, New Session Ticket is sent along with the server's Finished message and before receiving the client's Finished message

On a side note, then we tested this with Postfix + OpenSSL (openssl-3.0.8-1.amzn2023.0.16.x86_64) and it seems to work fine but uses a different flow where OpenSSL (Postfix) sends New Session Ticket after receiving Finished message from the client (outlook.com)

To verify an assumption that New Session Ticket message might cause the problem in Microsoft's TLS implementation, we tried with SessionTicketsDisabled: true with Go and confirmed the handshake went well:

10	0.017334	104.47.23.169	10.0.102.69	TLSv1.3	361	Client Hello
11	0.031968	10.0.102.69	104.47.23.169	TLSv1.3	1514	Server Hello, Change Cipher Spec, Application Data, Application Data, Application Data
12	0.031973	10.0.102.69	104.47.23.169	TLSv1.3	104	Application Data
13	0.035080	104.47.23.169	10.0.102.69	TCP	60	42783 → 25 [ACK] Seq=370 Ack=1648 Win=525568 Len=0
14	0.037309	104.47.23.169	10.0.102.69	TLSv1.3	118	Change Cipher Spec, Application Data
15	0.079227	10.0.102.69	104.47.23.169	TCP	54	25 → 42783 [ACK] Seq=1648 Ack=434 Win=62592 Len=0
16	0.082362	104.47.23.169	10.0.102.69	TLSv1.3	128	Application Data
17	0.082402	10.0.102.69	104.47.23.169	TCP	54	25 → 42783 [ACK] Seq=1648 Ack=508 Win=62592 Len=0
18	0.082849	10.0.102.69	104.47.23.169	TLSv1.3	149	Application Data
19	0.095041	104.47.23.169	10.0.102.69	TLSv1.3	82	Application Data
20	0.095141	104.47.23.169	10.0.102.69	TCP	60	42783 → 25 [RST, ACK] Seq=536 Ack=1743 Win=0 Len=0
21	22.653077	40.93.73.24	10.0.102.69	TCP	66	60619 → 25 [SYN] Seq=0 Win=64240 Len=0 MSS=1398 WS=256 SACK_PERM
22	22.653108	10.0.102.69	40.93.73.24	TCP	66	25 → 60619 [SYN, ACK] Seq=0 Ack=1 Win=62727 Len=0 MSS=8961 SACK_PERM WS=128
23	22.656931	40.93.73.24	10.0.102.69	TCP	60	60619 → 25 [ACK] Seq=1 Ack=1 Win=524288 Len=0
24	22.657119	10.0.102.69	40.93.73.24	SMTP	80	S: 220 mx.example.com ESMTP

Analysis

While I'm not an expert in TLS implementation, I reviewed the spec and found the following:

https://datatracker.ietf.org/doc/html/rfc8446#section-4.6.1 says:

At any time after the server has received the client Finished message, it MAY send a NewSessionTicket message.

and

Note: Although the resumption master secret depends on the client's second flight,
a server which does not request client authentication MAY compute the remainder of the transcript independently
and then send a NewSessionTicket immediately upon sending its Finished rather than waiting for the client Finished.

I think Go's TLS stack follows the second case because the server doesn't request client authentication.

On the other hands, Microsoft's TLS stack might expect to receive the server's Finished first and receive NewSessionTicket message in another flight, especially because Go's TLS will flush the buffer along with Finished and NewSessionTicket, not flush Finished message first and send NewSessionTicket.

To verify this hypothesis, I made a small modification to the Go's handshake code to flush the buffer first before sending NewSessionTicket, and send it after the flush.

Here is the patch I tested with:

--- src/crypto/tls/handshake_server_tls13.go.orig	2024-11-07 04:28:50.967023405 +0000
+++ src/crypto/tls/handshake_server_tls13.go	2024-11-07 05:02:21.053073557 +0000
@@ -75,9 +75,17 @@
 	if _, err := c.flush(); err != nil {
 		return err
 	}
+
 	if err := hs.readClientCertificate(); err != nil {
 		return err
 	}
+
+	if !hs.requestClientCert() {
+		if err := hs.sendSessionTickets(); err != nil {
+			return err
+		}
+	}
+
 	if err := hs.readClientFinished(); err != nil {
 		return err
 	}
@@ -777,11 +785,11 @@
 	// If we did not request client certificates, at this point we can
 	// precompute the client finished and roll the transcript forward to send
 	// session tickets in our first flight.
-	if !hs.requestClientCert() {
-		if err := hs.sendSessionTickets(); err != nil {
-			return err
-		}
-	}
+	//if !hs.requestClientCert() {
+	//	if err := hs.sendSessionTickets(); err != nil {
+	//		return err
+	//	}
+	//}
 
 	return nil
 }

It seemed to work.

10	0.044304	104.47.23.112	10.0.102.69	TLSv1.3	361	Client Hello
11	0.051822	10.0.102.69	104.47.23.112	TLSv1.3	1514	Server Hello, Change Cipher Spec, Encrypted Extensions, Certificate, Certificate Verify
12	0.051827	10.0.102.69	104.47.23.112	TLSv1.3	104	Finished
13	0.051889	10.0.102.69	104.47.23.112	TLSv1.3	198	New Session Ticket
14	0.061689	104.47.23.112	10.0.102.69	TCP	60	55773 → 25 [ACK] Seq=370 Ack=1648 Win=525568 Len=0
15	0.063710	104.47.23.112	10.0.102.69	TLSv1.3	118	Change Cipher Spec, Finished
16	0.107821	10.0.102.69	104.47.23.112	TCP	54	25 → 55773 [ACK] Seq=1792 Ack=434 Win=62592 Len=0
17	0.117695	104.47.23.112	10.0.102.69	SMTP	128	C: EHLO JPN01-OS0-obe.outbound.protection.outlook.com

Questions...

@FiloSottile, as the author of this code almost 6 years ago, what do you think about this issue? Given these findings, should Go adjust its handshake behavior, or should Microsoft update their TLS 1.3 implementation for better interoperability?

@stupoid
Copy link

stupoid commented Nov 8, 2024

I did not try changing the crypto/tls code to only send a new session ticket message after having read the client finished message. May be worth trying, to see if that will result in a successful TLS session or sees the same abrupt connection close.

@mjl-

Just to add to this for anyone looking to sidestep this issue.

We encountered really similar issues and also tried what you mentioned by changing tls.Config.ClientAuth to the following 2 modes to see if it would work.
Both seems to work fine without issues.

Dump of interaction with Exchange Online (outlook.com) with ClientAuth set to RequestClientCert

1	2.991447	40.93.130.3	10.0.15.74	TLSv1.3	361	Client Hello
2	2.992553	10.0.15.74	40.93.130.3	TLSv1.3	1527	Server Hello, Change Cipher Spec, Encrypted Extensions, Certificate Request, Certificate, Certificate Verify, Finished
3	3.002707	40.93.130.3	10.0.15.74	TCP	60	11164 → 25 [ACK] Seq=369 Ack=1610 Win=524288 Len=0
4	3.006255	40.93.130.3	10.0.15.74	TLSv1.3	4125	Change Cipher Spec, Certificate, Certificate Verify, Finished
5	3.006322	10.0.15.74	40.93.130.3	TCP	54	25 → 11164 [ACK] Seq=1610 Ack=4440 Win=58624 Len=0
6	3.006673	10.0.15.74	40.93.130.3	TCP	2850	25 → 11164 [PSH, ACK] Seq=1610 Ack=4440 Win=58624 Len=2796 [TCP PDU reassembled in 51]
7	3.006688	10.0.15.74	40.93.130.3	TLSv1.3	1137	New Session Ticket

Dump of interaction with Exchange Online (outlook.com) with ClientAuth set to RequireAndVerifyClientCert

1	2.157838	40.93.130.1	10.0.15.74	TLSv1.3	361	Client Hello
2	2.159897	10.0.15.74	40.93.130.1	TLSv1.3	1526	Server Hello, Change Cipher Spec, Encrypted Extensions, Certificate Request, Certificate, Certificate Verify, Finished
3	2.170862	40.93.130.1	10.0.15.74	TCP	60	15428 → 25 [ACK] Seq=369 Ack=1609 Win=524288 Len=0
4	2.174135	40.93.130.1	10.0.15.74	TLSv1.3	4125	Change Cipher Spec, Certificate, Certificate Verify, Finished
5	2.174193	10.0.15.74	40.93.130.1	TCP	54	25 → 15428 [ACK] Seq=1609 Ack=4440 Win=58624 Len=0
6	2.174901	10.0.15.74	40.93.130.1	TCP	2850	25 → 15428 [PSH, ACK] Seq=1609 Ack=4440 Win=58624 Len=2796 [TCP PDU reassembled in 53]
7	2.174923	10.0.15.74	40.93.130.1	TCP	2850	25 → 15428 [PSH, ACK] Seq=4405 Ack=4440 Win=58624 Len=2796 [TCP PDU reassembled in 53]
8	2.175234	10.0.15.74	40.93.130.1	TLSv1.3	555	New Session Ticket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants