OSI Layers, Routing Protocols, and How Packets Actually Move

Contents

  1. The OSI Model — all 7 layers
  2. Encapsulation & the data unit journey
  3. Layer 3 in depth: IP, subnetting, ARP
  4. Layer 4 in depth: TCP vs UDP
  5. Routing protocols: RIP, OSPF, BGP
  6. Modern angle: SDN and overlay networks
  7. Why this matters for ML systems

Network fundamentals are one of those things that get taught once in school and then quietly drift into "I'll re-learn this when I need it" territory. But if you're building or optimizing distributed systems — especially ML inference pipelines — the OSI model stops being a trivia question and starts being a debugging checklist. This is my attempt to write the reference I wish I'd had.

The OSI Model — All 7 Layers

The Open Systems Interconnection model partitions network communication into seven abstraction layers. Each layer depends only on the layer below it and exposes a clean interface upward. In practice, TCP/IP collapses these into four, but the seven-layer model gives you precise vocabulary for where things break.

# Layer Data Unit Key Protocols / Examples What it does
7 Application Data HTTP, HTTPS, DNS, SMTP, FTP, gRPC User-facing services; defines message format and app-level semantics.
6 Presentation Data TLS/SSL, JPEG, Protobuf encoding Translation, encryption, compression. Ensures sender and receiver agree on data format.
5 Session Data NetBIOS, RPC, NFS Manages sessions (open/close/checkpoint). Often baked into L7 protocols today.
4 Transport Segment TCP, UDP, QUIC, SCTP End-to-end delivery, port multiplexing, flow control, reliability (TCP) or not (UDP).
3 Network Packet IP (v4/v6), ICMP, OSPF, BGP Logical addressing (IP), routing across networks. Routers live here.
2 Data Link Frame Ethernet, Wi-Fi (802.11), ARP, PPP Node-to-node delivery on a single link. MAC addressing, error detection (CRC). Switches live here.
1 Physical Bit Ethernet cable, fiber, 802.11 radio, USB Electrical, optical, or radio transmission. Bit timing and signal encoding.
Mnemonic (top-down): "All People Seem To Need Data Processing" — Application, Presentation, Session, Transport, Network, Data Link, Physical.

Encapsulation & the Data Unit Journey

When you send an HTTP request, each layer wraps the data from the layer above in its own header (and sometimes trailer). This wrapping is called encapsulation. On the receiving end, each layer strips its header and passes the payload up.

[HTTP Request Body]                       ← Application data
[TCP header | HTTP data]                  ← Segment (L4 adds port numbers, seq no.)
[IP header | TCP header | HTTP data]      ← Packet (L3 adds src/dst IP)
[ETH header | IP | TCP | HTTP | ETH CRC] ← Frame  (L2 adds MAC addresses, CRC)
[bits on the wire / optical signal]       ← Physical transmission

Each header adds overhead. A minimal HTTP/1.1 request over TCP/IP/Ethernet carries at least 54 bytes of header before a single byte of application data. In high-throughput systems (ML serving, HFT), this matters.

Layer 3 In Depth: IP, Subnetting, ARP

IP Addressing

IPv4 uses 32-bit addresses written in dotted-decimal notation (192.168.1.1). IPv6 uses 128-bit addresses in hex (2001:db8::1). An IP address consists of a network prefix and a host identifier; CIDR notation specifies the prefix length: 10.0.0.0/24 means the first 24 bits are the network, leaving 8 bits for hosts (254 usable addresses in this subnet).

Subnetting

Subnetting divides a network into smaller blocks to reduce broadcast domains and improve security. A subnet mask (255.255.255.0 = /24) ANDed with an IP address yields the network address. Key subnetting math:

ARP — Address Resolution Protocol

IP gives you a logical destination, but frames need MAC addresses to traverse a single link. ARP bridges this gap. When host A needs to send to 192.168.1.5:

  1. A broadcasts: "Who has 192.168.1.5? Tell AA:BB:CC:DD:EE:FF."
  2. The target replies with its MAC address.
  3. A caches the mapping in its ARP table (arp -n to inspect).

Gratuitous ARP (sending an ARP reply without a request) is used for conflict detection and is the basis of many L2 failover schemes.

Layer 4 In Depth: TCP vs UDP

TCP — Transmission Control Protocol

  • Connection-oriented (3-way handshake: SYN, SYN-ACK, ACK)
  • Reliable: retransmits lost segments
  • Ordered delivery via sequence numbers
  • Flow control (receiver window) + congestion control (CWND, slow start, CUBIC/BBR)
  • 20-byte minimum header
  • Use: HTTP, databases, SSH

UDP — User Datagram Protocol

  • Connectionless, fire-and-forget
  • No retransmission, no ordering guarantees
  • 8-byte header
  • Application layer must handle reliability if needed
  • Use: DNS, QUIC, video streaming, ML parameter servers
  • Latency: ~50–100µs less round-trip vs TCP in LAN

TCP Congestion Control — the short version

TCP tries to infer available bandwidth without explicit feedback. Slow start begins conservatively and doubles the congestion window (CWND) each RTT. Upon packet loss (inferred by timeout or triple duplicate ACK), CWND is slashed. Modern algorithms like BBR (Bottleneck Bandwidth and RTT) model the network explicitly and avoid filling queues, achieving lower latency and better throughput on lossy or long-haul paths.

Routing Protocols: RIP, OSPF, BGP

Routing protocols let routers automatically discover paths and react to topology changes. They fall into two categories:

RIP — Routing Information Protocol

RIP is a distance-vector protocol: each router broadcasts its full routing table to neighbors every 30 seconds. Metric is hop count (max 15; 16 = unreachable). Convergence is slow — the "count to infinity" problem can leave stale routes for minutes. RIPv2 adds CIDR and authentication. RIP is rarely used in production today except in small or legacy networks.

OSPF — Open Shortest Path First

OSPF is a link-state protocol. Each router floods Link State Advertisements (LSAs) throughout the AS, building a complete topology map (LSDB — Link State Database). Every router then independently runs Dijkstra's SPF algorithm on the LSDB to compute shortest paths. Key properties:

BGP — Border Gateway Protocol

BGP is the routing protocol of the internet. It's a path-vector protocol: instead of hop counts or link costs, BGP advertises full AS paths, enabling policy-based routing. A few key concepts:

Quick comparison: RIP = simple, slow, legacy. OSPF = fast convergence, hierarchical, widely used inside ISPs and data centers. BGP = the internet's glue, policy-driven, not designed for speed.

Modern Angle: SDN and Overlay Networks

Software-Defined Networking decouples the control plane (deciding where traffic goes) from the data plane (actually forwarding packets). A centralized controller programs forwarding rules into switches via protocols like OpenFlow. This makes network topology programmable — critical for cloud data centers where VMs spin up and down constantly.

Overlay networks (VXLAN, Geneve, GRE) build virtual L2 networks on top of L3 infrastructure. Kubernetes CNI plugins (Flannel, Calico, Cilium) use overlays or direct routing to give each pod its own IP regardless of which physical host it runs on. In ML clusters, RDMA over Converged Ethernet (RoCE) bypasses the kernel networking stack entirely to get GPU-to-GPU latencies in the single-digit microseconds.

Why This Matters for ML Systems

Distributed ML training and inference are fundamentally network-bound workloads once you scale beyond a single node. A few concrete implications: