CS 144 Checkpoint 4 - Interoperating in the world

“This checkpoint is about testing your TCP implementation in the real world and measuring the long-term statistics of a particular Internet path.”

If you have a correct implementation and passed all the previous tests, you might not need to write any code for this task. However, if you are not using the standard recommended development environment, you might encounter weird environment issues. I am using an arm64 Ubuntu 23.10 devcontainer on my M1 Pro Macbook. The TCP packets I receive somehow all have incorrect checksums. It took me quite a while to figure out the problem. Eventually, I managed to pass the test on a VPS without changing any code.

There are a lot of supporting code provided for this checkpoint. I find it helpful to understand the whole codebase from end (IP) to end (Byte Stream).

Let’s get started!

The TUN device is a virtual network device provided by the kernel that allows us to send and receive IP packets. We can create a TUN device by

bash

# Create a TUN device
sudo ip tuntap add mode tun user pcloud dev tun144
# Add an IP address to the TUN device
sudo ip addr add "169.254.144.1/24" dev tun144
# Bring up the TUN device
sudo ip link set dev tun144 up
# Add a route to the TUN device
sudo ip route change  "169.254.144.0/24" dev tun144 rto_min 10ms

Checkout scripts/tun.sh for more details.

We can then get the file descriptor of the TUN device by opening the /dev/net/tun file, and we can adjust the settings using the ioctl system call. Here is the implementation in tun.cc

tun.cc

static constexpr const char* CLONEDEV = "/dev/net/tun";

TunTapFD::TunTapFD( const string& devname, const bool is_tun )
  : FileDescriptor( ::CheckSystemCall( "open", open( CLONEDEV, O_RDWR | O_CLOEXEC ) ) )
{
  struct ifreq tun_req
  {};

  tun_req.ifr_flags = static_cast<int16_t>( ( is_tun ? IFF_TUN : IFF_TAP ) | IFF_NO_PI ); // no packetinfo

  // copy devname to ifr_name, making sure to null terminate

  strncpy( static_cast<char*>( tun_req.ifr_name ), devname.data(), IFNAMSIZ - 1 );
  tun_req.ifr_name[IFNAMSIZ - 1] = '\0';

  CheckSystemCall( "ioctl", ioctl( fd_num(), TUNSETIFF, static_cast<void*>( &tun_req ) ) );
}

Now, we can send IP packets by writing to the TUN device, and receive IP packets by reading from the TUN device.

The TUN device only works with IP packets, so we need to wrap our TCP packets into an IP packet before sending it down to the TUN device.

tuntap_adapter.hh

class TCPOverIPv4OverTunFdAdapter : public TCPOverIPv4Adapter
{
  //! Creates an IPv4 datagram from a TCP segment and writes it to the TUN device
  void write( const TCPMessage& seg ) { _tun.write( serialize( wrap_tcp_in_ip( seg ) ) ); }
};

The wrap_tcp_in_ip function sets up the IP header and calculates the checksum for the TCP segment.

tcp_over_ip.cc

//! Takes a TCP segment, sets port numbers as necessary, and wraps it in an IPv4 datagram
//! \param[in] seg is the TCP segment to convert
InternetDatagram TCPOverIPv4Adapter::wrap_tcp_in_ip( const TCPMessage& msg )
{
  TCPSegment seg { .message = msg };
  // set the port numbers in the TCP segment
  seg.udinfo.src_port = config().source.port();
  seg.udinfo.dst_port = config().destination.port();

  // create an Internet Datagram and set its addresses and length
  InternetDatagram ip_dgram;
  ip_dgram.header.src = config().source.ipv4_numeric();
  ip_dgram.header.dst = config().destination.ipv4_numeric();
  ip_dgram.header.len = ip_dgram.header.hlen * 4 + 20 /* tcp header len */ + seg.message.sender.payload.size();

  // set payload, calculating TCP checksum using information from IP header
  seg.compute_checksum( ip_dgram.header.pseudo_checksum() );
  ip_dgram.header.compute_checksum();
  ip_dgram.payload = serialize( seg );

  return ip_dgram;
}

When we read a received IP packet from the TUN device, we need to unwrap the IP packet and extract the TCP segment.

tcp_over_ip.cc

class TCPOverIPv4OverTunFdAdapter : public TCPOverIPv4Adapter
{
  optional<TCPMessage> read()
  {
    vector<string> strs( 2 );
    strs.front().resize( IPv4Header::LENGTH );
    _tun.read( strs );

    InternetDatagram ip_dgram;
    const vector<string> buffers = { strs.at( 0 ), strs.at( 1 ) };
    if ( parse( ip_dgram, buffers ) ) {
      return unwrap_tcp_in_ip( ip_dgram );
    }
    return {};
  }
};

The unwrap_tcp_in_ip function attempts to parse a TCP segment from the IP datagram’s payload. If this succeeds, it then checks that the received segment is related to the current connection. When a TCP connection has been established, this means checking that the source and destination ports in the TCP header are correct. You can find the implementation in tcp_over_ip.cc.

In minnow, all the network operations are done in a separate backgroud thread. The foreground thread (the “main thread”) runs application logic, it connects or listens, writes to and reads from a reliable data stream using the public interface of the tcp_minnow_socket class:

tcp_minnow_socket.hh

template<TCPDatagramAdapter AdaptT>
class TCPMinnowSocket
{
public:
  //! Construct from the interface that the TCPPeer thread will use to read and write datagrams
  explicit TCPMinnowSocket( AdaptT&& datagram_interface );

  //! Close socket, and wait for TCPPeer to finish
  //! \note Calling this function is only advisable if the socket has reached EOF,
  //! or else may wait foreever for remote peer to close the TCP connection.
  void wait_until_closed();

  //! Connect using the specified configurations; blocks until connect succeeds or fails
  void connect( const TCPConfig& c_tcp, const FdAdapterConfig& c_ad );

  //! Listen and accept using the specified configurations; blocks until accept succeeds or fails
  void listen_and_accept( const TCPConfig& c_tcp, const FdAdapterConfig& c_ad );

  // Inherited from FileDescriptor
  // Read into `buffer`
  void read( std::string& buffer );
  void read( std::vector<std::string>& buffers );

  // Inherited from FileDescriptor
  // Attempt to write a buffer
  // returns number of bytes written
  size_t write( std::string_view buffer );
  size_t write( const std::vector<std::string_view>& buffers );
  size_t write( const std::vector<std::string>& buffers );

  //! When a connected socket is destructed, it will send a RST
  ~TCPMinnowSocket();

  //! \name
  //! This object cannot be safely moved or copied, since it is in use by two threads simultaneously

  //!@{
  TCPMinnowSocket( const TCPMinnowSocket& ) = delete;
  TCPMinnowSocket( TCPMinnowSocket&& ) = delete;
  TCPMinnowSocket& operator=( const TCPMinnowSocket& ) = delete;
  TCPMinnowSocket& operator=( TCPMinnowSocket&& ) = delete;
  //!@}

  //! \name
  //! Some methods of the parent Socket wouldn't work as expected on the TCP socket, so delete them

  //!@{
  void bind( const Address& address ) = delete;
  Address local_address() const = delete;
  void set_reuseaddr() = delete;
  //!@}

  // Return peer address from underlying datagram adapter
  const Address& peer_address() const { return _datagram_adapter.config().destination; }
};

In practice, the TCPDatagramAdapter AdaptT is the TCPOverIPv4OverTunFdAdapter we discussed earlier who is responsible for reading and writing IP packets to the TUN device.

The background thread takes care of the back-end tasks that the kernel would perform for a TCPSocket: reading and parsing datagrams from the wire, filtering out segments unrelated to the connection, etc.

cpp

class TCPPeer {
  void receive( TCPMessage msg, const TransmitFunction& transmit )
  {
    if ( not active() ) {
      return;
    }

    // Record time in case this peer has to linger after streams finish.
    time_of_last_receipt_ = cumulative_time_;

    // If SenderMessage occupies a sequence number, make sure to reply.
    need_send_ |= ( msg.sender.sequence_length() > 0 );

    // If SenderMessage is a "keep-alive" (with intentionally invalid seqno), make sure to reply.
    // (N.B. orthodox TCP rules require a reply on any unacceptable segment.)
    const auto our_ackno = receiver_.send().ackno;
    need_send_ |= ( our_ackno.has_value() and msg.sender.seqno + 1 == our_ackno.value() );

    // Did the inbound stream finish before the outbound stream? If so, no need to linger after streams finish.
    if ( receiver_.writer().is_closed() and not sender_.reader().is_finished() ) {
      linger_after_streams_finish_ = false;
    }

    // Give incoming TCPSenderMessage to receiver.
    receiver_.receive( std::move( msg.sender ) );

    // Give incoming TCPReceiverMessage to sender.
    sender_.receive( msg.receiver );

    // Send reply if needed.
    if ( need_send_ ) {
      send( sender_.make_empty_message(), transmit );
    }
  }
};

The receive function is called when a new TCP message is received. First the receiver processes it. Then the sender processes it. Finally the sender sends a reply if needed. The transmit function is just the TCPOverIPv4OverTunFdAdapter::write function we saw earlier.

cpp

  void send( const TCPSenderMessage& sender_message, const TransmitFunction& transmit )
  {
    TCPMessage msg { sender_message, receiver_.send() };
    transmit( std::move( msg ) );
    need_send_ = false;
  }

How does the main thread and the background thread communicate? They use the socketpair syscall to create a pair of connected Unix-domain sockets. The syscall returns two file descriptors, one for each end of the socket. The two threads can communicate by writing to and reading from these file descriptors.

tcp_minnow_socket_impl.hh

//! \brief Call [socketpair](\ref man2::socketpair) and return connected Unix-domain sockets of specified type
//! \param[in] type is the type of AF_UNIX sockets to create (e.g., SOCK_SEQPACKET)
//! \returns a std::pair of connected sockets
template<std::derived_from<Socket> SocketType>
inline std::pair<SocketType, SocketType> socket_pair_helper( int domain, int type, int protocol = 0 )
{
  std::array<int, 2> fds {};
  CheckSystemCall( "socketpair", ::socketpair( domain, type, protocol, fds.data() ) );
  return { SocketType { FileDescriptor { fds[0] } }, SocketType { FileDescriptor { fds[1] } } };
}

//! \param[in] datagram_interface is the underlying interface (e.g. to UDP, IP, or Ethernet)
template<TCPDatagramAdapter AdaptT>
TCPMinnowSocket<AdaptT>::TCPMinnowSocket( AdaptT&& datagram_interface )
  : TCPMinnowSocket( socket_pair_helper<LocalStreamSocket>( AF_UNIX, SOCK_STREAM ),
                     std::move( datagram_interface ) )
{}

Who calls the TCPPeer::receive function? The background thread is running an event loop. The poll syscall is used to wait for events on the TUN device and the socketpair. When a new packet is received, or new data is sent from the main thread, the TCPPeer class is notified. Checkout the EventLoop::wait_next_event function inside eventloop.cc, and TCPMinnowSocket<AdaptT>::_initialize_TCP inside tcp_minnow_socket_impl.hh for more details. Essentially, there are three events to handle:

  1. Incoming datagram received (needs to be given to TCPPeer::receive method)
  2. Outbound bytes received from the main thread via a write() call (needs to be read from the socketpair and given to TCPPeer)
  3. Incoming bytes reassembled by the Reassembler (needs to be read from the inbound_stream and written to the socketpair back to the main thread)

Now let’s put everything together, what’s happening behind these 10 lines of code?

webget.cc

auto socket = CS144TCPSocket();
socket.connect( { host, "http"s } );
socket.write( std::format( "GET {} HTTP/1.1\r\n"s, path ) );
socket.write( std::format( "Host: {}\r\n"s, host ) );
socket.write( std::format( "Connection: close\r\n\r\n"s ) );
string buffer;
while ( !socket.eof() ) {
  socket.read( buffer );
  cout << buffer;
}
socket.wait_until_closed();

socket.connect starts the three-way handshake with the server, it sends the SYN packet and waits for the SYN-ACK packet.

tcp_minnow_socket_impl.hh

std::cerr << "DEBUG: minnow connecting to " << c_ad.destination.to_string() << "...\n";

if ( not _tcp.has_value() ) {
  throw std::runtime_error( "TCPPeer not successfully initialized" );
}

_tcp->push( [&]( auto x ) { _datagram_adapter.write( x ); } );

if ( _tcp->sender().sequence_numbers_in_flight() != 1 ) {
  throw std::runtime_error( "After TCPConnection::connect(), expected sequence_numbers_in_flight() == 1" );
}

_tcp_loop( [&] { return _tcp->sender().sequence_numbers_in_flight() == 1; } );
if ( _tcp->inbound_reader().has_error() ) {
  std::cerr << "DEBUG: minnow error on connecting to " << c_ad.destination.to_string() << ".\n";
} else {
  std::cerr << "DEBUG: minnow successfully connected to " << c_ad.destination.to_string() << ".\n";
}

If the connection is successful, it launches the background thread to handle the event loop.

Everytime we call socket.write, the main thread writes the data to the socketpair, the background thread reads the data from the socketpair and then sends it to the TUN device. Everytime a packet is received from the TUN device, the TCP sender and TCP receiver we implemented is called to handle the packet.

Everytime we call socket.read, the background thread sends back the reassembled data to the main thread via the socketpair.

Finally, when we call socket.wait_until_closed, the main thread shutdown the socketpair and join the background thread.

socket.cc

template<TCPDatagramAdapter AdaptT>
void TCPMinnowSocket<AdaptT>::wait_until_closed()
{
  shutdown( SHUT_RDWR );
  if ( _tcp_thread.joinable() ) {
    std::cerr << "DEBUG: minnow waiting for clean shutdown... ";
    _tcp_thread.join();
    std::cerr << "done.\n";
  }
}

That’s it! Hopefully this helps you debug your TCP implementation in the real world.

This concludes Checkpoint 4.