Friday, June 18, 2010

dropped connections from iptables

In December 2009, we upgraded the lab's connection to the outside world. We went from a crappy Netgear SOHO router to a cute little no-moving parts machine I made from a VIA mini-itx motherboard with two Gbps ethernet ports, and a compact flash drive enclosed in an M350 enclosure. It looks like this:



and lives under the floor tiles.

This machine runs Ubuntu Server 9.10 and acts as the labs NATing firewall. All this is done using an iptables script. This was a real upgrade, and network performance improved noticeably. Unfortunately, we would occasionally get dropped connections. This was most noticeable when logging in from outside using ssh where, after a few seconds the connection would get dropped with the error message connection reset by peer. It would also happen sometimes when downloading large files over HTTP.

After putting up with this for a few months, we spent an afternoon hunting it down. The problem seems to be traceable to a Cisco router in our university that does something it's not supposed to. In short, the Linux TCP/IP iptables/conntrack implementation adheres a little too strictly to standards and was closing a connection when the Cisco router sent it something not completely kosher. Since there was no hope of changing the way the Cisco router works, we found that this little gem takes care of the problem:


echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal


Voila. Problem solved.

wakeonlan from S3 suspend

I struggled for quite a while to get wakeonlan to work in the lab. The bios settings were correct, and sending a magic packet would wake a machine that was turned off, but it wouldn't wake it up from the S3 (suspend) state.

My co-sysadmin Dana Jansens eventually figured this one out. The file /proc/acpi/wakeup contains a list of hardware devices that are allowed to wake the computer. It looks like this:


morin@laplace:~$ cat /proc/acpi/wakeup
Device S-state Status Sysfs node
VBTN S4 *enabled
PCI0 S5 disabled no-bus:pci0000:00
PCI2 S5 disabled pci:0000:00:02.0
PCI3 S5 disabled pci:0000:01:00.0
PCIF S5 disabled pci:0000:02:00.0
PCI5 S5 disabled pci:0000:01:00.3
PCI6 S5 disabled pci:0000:00:03.0
PCI7 S5 disabled pci:0000:00:04.0
PCI8 S5 enabled pci:0000:00:1c.0
PCI9 S5 disabled pci:0000:00:1e.0
KBD S3 disabled pnp:00:06
USB0 S3 disabled pci:0000:00:1d.0
USB1 S3 disabled pci:0000:00:1d.1
USB2 S3 disabled pci:0000:00:1d.2
USB3 S3 disabled pci:0000:00:1d.3


Notice that PCI8 is enabled. PCI8 is the PCI bridge to which the network adapter in this machine is attached. You enable this device to wake the machine with the command echo PCI8 > /proc/acpi/wakeup.

How did Dana know that PCI8 was the right thing to enable? You can use dmesg and lspci to view your PCI devices to look for something plausible and then match what you find with the right column of /proc/acpi/wakeup. It may take some trial and error, though.

As a bonus, you can turn on resume from suspend from any of the other devices. For example the keyboard (KBD, above) or the mouse (USB0 above).

What is this?

This is a blog I started after years of administering a Linux-based lab. I will use it to post solutions to problems I had that took a long time to hunt down and fix, in the hopes that some future Googler will find them handy.