2009年4月2日星期四

Black Hole Socket Problem

Problem:



  • 1 client 1 server, connected with non-block tcp socket. Linux 2.6.*+.

  • Client writes about 3k bytes into the socket, by calling 'write' system call.

  • 'Write' returns with correct number of bytes written to the socket.

  • Server side socket receives nothing.


Causes:



  • Client side MTU is a bit larger than the MTU on some router(s) between the 2 machine

  • ICMP message requesting package fragmentation are block somehow(maybe by firewall).

  • Client insistently sends the same package again and again.


IP package with DF flag set is not fragmentated by routers into smaller ones if package size is larger than the routers. instead, router sends back an ICMP message telling the sender to fragmentate package.

But ICMP messages are blocked somewhere.


Circumstances And Phenomenon:
In this case, MTUs on client and server are both 1500.

I dumped the network traffic with tcpdump, and we can have a close look:

Phenomenon on client:
2 packages sent. The 1st package was resent time by time:


17:23:06.933574 IP (tos 0x0, ttl 64, id 57558, offset 0, flags [DF], proto: TCP (6), length: 1500) 10.54.40.43.43145 > 10.29.14.74.http: ., cksum 0x5096 (incorrect (-> 0x5c4e), 0:1448(1448) ack 1 win 46

17:23:06.933580 IP (tos 0x0, ttl 64, id 57559, offset 0, flags [DF], proto: TCP (6), length: 730) 10.54.40.43.43145 > 10.29.14.74.http: P, cksum 0x4d94 (incorrect (-> 0x3933), 1448:2126(678) ack 1 win 46

17:23:07.167049 IP (tos 0x0, ttl 64, id 57560, offset 0, flags [DF], proto: TCP (6), length: 1500) 10.54.40.43.43145 > 10.29.14.74.http: ., cksum 0x5096 (incorrect (-> 0x5b5b), 0:1448(1448) ack 1 win 46

17:23:07.634922 IP (tos 0x0, ttl 64, id 57561, offset 0, flags [DF], proto: TCP (6), length: 1500) 10.54.40.43.43145 > 10.29.14.74.http: ., cksum 0x5096 (incorrect (-> 0x5987), 0:1448(1448) ack 1 win 46



Phenomenon on server:
only the second package of size 730 received successfuly

17:23:08.605622 IP (tos 0x0, ttl 59, id 57559, offset 0, flags [DF], proto: TCP (6), length: 730) 202.108.3.204.43145 > 10.29.14.74.http: P, cksum 0x9d5b (correct), 1448:2126(678) ack 1 win 46


Solution:
As you may not have the privilege to telnet onto the router to take some adjustment, the simplest way is to change configuration on client in one of the following way:

At network level:
Decrease mtu on network adapter:


ifconfig eth* mtu 1400

At system level:
Clear the default 'MTU discovery' flag with sysctl:


net.ipv4.ip_no_pmtu_disc = 1

At user application level:
Set socket option 'IP_MTU_DISCOVER' with setsockopt(2) to clear 'DF' flag of IP package.


Reference:




  1. DF flag of IP package Header

  2. Internet Control Message Protocol

  3. IP fragmentation

  4. MTU or Maximum transmission unit

  5. IP programming

  6. Path MTU Discovery

  7. sysctl


没有评论: