Sunday, August 1, 2010

TFTP Oddities

Like RS-232, TFTP is less and less used these days, but remains an absolutely critical protocol for network engineers.

Here I'll present a couple of TFTP implementation oddities that sometimes break TFTP.

TFTP Uses Ephemeral Ports
The bulk of the data flow between a TFTP client and server uses ephemeral ports on both ends.  This is strange behavior.  Most applications use ephemeral ports on the client side, but the server responds from its well known port number.  Here's an example of a normal application (a DNS query):
13:12:13.497988 IP 192.168.1.91.56119 > 192.168.1.1.53:  54600+ A? www.google.com. (32)
13:12:13.583574 IP 192.168.1.1.53 > 192.168.1.91.56119:  54600 5/4/4 CNAME www.l.google.com.,[|domain]
The client originated the query from an ephemeral port (56119), and sent it to the server's well known port (53).  The server's response was symmetric: 53 -> 56119.

TFTP servers don't generally behave this way.  Consider this TFTP transaction:

12:55:33.188470 IP 192.168.1.9.
34617 > 192.168.1.91.69:  22 RRQ "myfile.dat" netascii
12:55:33.189962 IP 192.168.1.91.
42829 > 192.168.1.9.34617: UDP, length 516
12:55:33.190060 IP 192.168.1.9.
34617 > 192.168.1.91.42829: UDP, length 4
12:55:33.190088 IP 192.168.1.91.
42829 > 192.168.1.9.34617: UDP, length 92
12:55:33.190190 IP 192.168.1.9.
34617 > 192.168.1.91.42829: UDP, length 4
The client originated the transaction from an ephemeral port (34617), and sent it to the server's well known port (69).  But the server's response was NOT symmetric.  Rather than replying from port 69, the server spawned a new ephemeral port (42829) when replying to the client.  After the initial request packet, the four packets responsible for the file transfer (data, ack, data, ack) were exchanged between two ephemeral ports.

This is strange behavior, and most flow-mangling smart devices (firewalls, NAT engines, reflexive ACLs, etc...) won't recognize the second or later packets as being related to the first.  Unlike active mode FTP, TFTP doesn't include an application header describing what's going to happen next.  Accordingly, there's nothing for a protocol-aware firewall module to monitor in order to open a pinhole for return traffic.  TFTP is a perfect candidate for a man-in-the-middle attack.

I'm not aware of any requirement for TFTP servers to act this way, but most do.  My guess about the reason for the behavior is this:  As shown here, the TFTP server merely needs to fork() itself for each client request.  The new thread opens its own socket to handle the client request, then exits when done.  Without the separate thread and socket, a single server thread would have to keep track of all clients simultaneously, and would have to demultiplex incoming client packets for each flow (because they'd all flow in the single connectionless UDP socket).

TFTP Supports Broadcast Servers
Sort of.
I expect lots of people to run into this issue as enterprises upgrade their Cisco wireless environments to 802.11N, the old gear floods onto eBay, and people attempt to load the autonomous AP code.  There's a peculiarity with the TFTP implementation on the Cisco Aironet autonomous-capable access points.  Consider my AIR-AP1231G-A-K9 trying to download its IOS software from the TFTP server.  The AP's console reports:

process_config_recovery: set IP address and config to default 10.0.0.1
process_config_recovery: image recovery
image_recovery: Download default IOS tar image tftp://255.255.255.255/c1200-k9w7-tar.default
examining image...
extracting info (274 bytes)
Premature end of tar file
ERROR: Image is not a valid IOS image archive.Loading "flash:/c1200-blah-blah-blah

What's going on here?  I'm not the first to notice this problem, but I haven't seen an explanation anywhere yet.  First, notice that the client is hitting its TFTP server at 255.255.255.255.  Cisco's documentation of this process directs that the TFTP server be configured with an address between 10.0.0.2 and 10.0.0.30.  That's because the AP will be running with 10.0.0.1, and it doesn't know exactly where server will be, so it hits the all-ones address.  Why we're not supposed to use 10.0.0.31, I don't know.  On the wire we get this:

12:50:54.498194 IP 10.0.0.1.1024 > 255.255.255.255.tftp:  31 RRQ "c1200-k9w7-tar.default" octet
Next, the server unicasts the first chunk of the IOS image to the AP (ephemeral port to ephemeral port again):
12:50:54.547794 IP 10.0.0.5.56216 > 10.0.0.1.1024: UDP, length 516
So far, so good, but we're getting to the point where things come off the rails.  The AP, having just received a packet from 10.0.0.5, should know the server's address.  But it sends the TFTP acknowledgement for the first chunk of the file to the broadcast address, not directly to the server:
12:50:54.606812 IP 10.0.0.1.1024 > 255.255.255.255.56216: UDP, length 4
The TFTP server doesn't hear these ACKs, and both ends endlessly retransmit their last packet.  I'm not clear on whether this should be reasonably expected to work, but...
  • It flies in the face of the thread-per-client model which I believe is behind the port symmetry issue
  • My antique Xyplex terminal servers use broadcast mode TFTP, and get the replies right
  • It doesn't work
Here's the whole exchange, with retransmissions tagged in red:

12:50:54.498194 IP 10.0.0.1.1024 > 255.255.255.255.tftp:  31 RRQ "c1200-k9w7-tar.default" octet
12:50:54.547794 IP 10.0.0.5.56216 > 10.0.0.1.1024: UDP, length 516
12:50:54.606812 IP 10.0.0.1.1024 > 255.255.255.255.56216: UDP, length 4
12:50:55.543296 IP 10.0.0.5.56216 > 10.0.0.1.1024: UDP, length 516
12:50:57.543421 IP 10.0.0.5.56216 > 10.0.0.1.1024: UDP, length 516
12:51:01.543679 IP 10.0.0.5.56216 > 10.0.0.1.1024: UDP, length 516
12:51:01.544614 IP 10.0.0.1.1024 > 255.255.255.255.56216: UDP, length 4
12:51:06.626855 IP 10.0.0.1.1024 > 255.255.255.255.56216: UDP, length 4
12:51:09.544176 IP 10.0.0.5.56216 > 10.0.0.1.1024: UDP, length 516
12:51:14.627345 IP 10.0.0.1.1024 > 255.255.255.255.56216: UDP, length 4
In 20 seconds, the first block of the file got sent 5 times, and was (poorly) acknowledged 4 times, before both ends gave up.

That's what happends with tftp-hpa 0.49, anyway.  Philippe Jounin's TFTPD32 v3.33 doesn't seem to mind this sort of thing:

13:41:02.083839 IP 10.0.0.1.1024 > 255.255.255.255.tftp:  31 RRQ "c1200-k9w7-tar.default" octet
13:41:02.139137 IP 10.0.0.2.4068 > 10.0.0.1.1024: UDP, length 516
13:41:02.197885 IP 10.0.0.1.1024 > 255.255.255.255.4068: UDP, length 4
13:41:02.198033 IP 10.0.0.2.4068 > 10.0.0.1.1024: UDP, length 516
13:41:02.211902 IP 10.0.0.1.1024 > 255.255.255.255.4068: UDP, length 4
13:41:02.211987 IP 10.0.0.2.4068 > 10.0.0.1.1024: UDP, length 516
13:41:02.213473 IP 10.0.0.1.1024 > 255.255.255.255.4068: UDP, length 4
So, when installing new code on your Aironet APs, be careful which TFTP daemon you use.

5 comments:

  1. I have found a simple solution to this for the lwapp to autonomous issue....

    Change /etc/default/tftpd-hpa from:
    # /etc/default/tftpd-hpa

    TFTP_USERNAME="tftp"
    TFTP_DIRECTORY="/var/lib/tftpboot"
    TFTP_ADDRESS="0.0.0.0:69"
    TFTP_OPTIONS="--secure --timeout 30"
    to:

    # /etc/default/tftpd-hpa

    TFTP_USERNAME="tftp"
    TFTP_DIRECTORY="/var/lib/tftpboot"
    TFTP_ADDRESS="255.255.255.255:69"
    TFTP_OPTIONS="--secure --timeout 30"



    Notice the changing of 0.0.0.0:69 to 255.255.255.255:69. This seemed to work without any issues.

    Enjoy,
    George

    ReplyDelete
  2. Interesting, thank you George!

    Thats a subtle change from 0.0.0.0 to 255.255.255.255. I wonder what it does inside the tftp daemon?

    I'm unable to test (I don't have any APs that need upgrading), but presume this is the same as using --address 0.0.0.0:69

    My distribution doesn't use the /etc/default style configuration parsing.

    Good find!

    ReplyDelete
  3. Ahh many many thanks for this. Saw EXACTLY these symptoms and found this page. The change from George worked like a charm and saved me having to faff about with other daemons.

    ReplyDelete
  4. really thanks about the solution. I was tearing my hair out to figure out why the tftp process fail. even if I havent test it on my ap at the moment, I believe that would be it. great...

    ReplyDelete
  5. Came across “Premature end of tar file” symptom while converting 3602i to Autonomous mode. Both Solar Winds and tftpd32/64 were causing timeouts during new image extraction process.
    Tried everything from “10.0.0.1/27″ method to config_recovery through broadcast with xxx-tar.default image but no dice.
    As a last act of desperation I’ve used one of available Cisco routers as tftp server – problem solved.
    It seems the root cause lies in an actual implementation of tftp client in APs ROM mode but I’m open to correction here.

    Wacha

    ReplyDelete