OSI Layers & Routing Protocols

Network fundamentals are one of those things that get taught once in school and then quietly drift into "I'll re-learn this when I need it" territory. But if you're building or optimizing distributed systems — especially ML inference pipelines — the OSI model stops being a trivia question and starts being a debugging checklist. This is my attempt to write the reference I wish I'd had.

The OSI Model — All 7 Layers

The Open Systems Interconnection model partitions network communication into seven abstraction layers. Each layer depends only on the layer below it and exposes a clean interface upward. In practice, TCP/IP collapses these into four, but the seven-layer model gives you precise vocabulary for where things break.

#	Layer	Data Unit	Key Protocols / Examples	What it does
7	Application	Data	HTTP, HTTPS, DNS, SMTP, FTP, gRPC	User-facing services; defines message format and app-level semantics.
6	Presentation	Data	TLS/SSL, JPEG, Protobuf encoding	Translation, encryption, compression. Ensures sender and receiver agree on data format.
5	Session	Data	NetBIOS, RPC, NFS	Manages sessions (open/close/checkpoint). Often baked into L7 protocols today.
4	Transport	Segment	TCP, UDP, QUIC, SCTP	End-to-end delivery, port multiplexing, flow control, reliability (TCP) or not (UDP).
3	Network	Packet	IP (v4/v6), ICMP, OSPF, BGP	Logical addressing (IP), routing across networks. Routers live here.
2	Data Link	Frame	Ethernet, Wi-Fi (802.11), ARP, PPP	Node-to-node delivery on a single link. MAC addressing, error detection (CRC). Switches live here.
1	Physical	Bit	Ethernet cable, fiber, 802.11 radio, USB	Electrical, optical, or radio transmission. Bit timing and signal encoding.

Mnemonic (top-down): "All People Seem To Need Data Processing" — Application, Presentation, Session, Transport, Network, Data Link, Physical.

Encapsulation & the Data Unit Journey

When you send an HTTP request, each layer wraps the data from the layer above in its own header (and sometimes trailer). This wrapping is called encapsulation. On the receiving end, each layer strips its header and passes the payload up.

[HTTP Request Body]                       ← Application data
[TCP header | HTTP data]                  ← Segment (L4 adds port numbers, seq no.)
[IP header | TCP header | HTTP data]      ← Packet (L3 adds src/dst IP)
[ETH header | IP | TCP | HTTP | ETH CRC] ← Frame  (L2 adds MAC addresses, CRC)
[bits on the wire / optical signal]       ← Physical transmission

Each header adds overhead. A minimal HTTP/1.1 request over TCP/IP/Ethernet carries at least 54 bytes of header before a single byte of application data. In high-throughput systems (ML serving, HFT), this matters.

Layer 3 In Depth: IP, Subnetting, ARP

IP Addressing

IPv4 uses 32-bit addresses written in dotted-decimal notation (192.168.1.1). IPv6 uses 128-bit addresses in hex (2001:db8::1). An IP address consists of a network prefix and a host identifier; CIDR notation specifies the prefix length: 10.0.0.0/24 means the first 24 bits are the network, leaving 8 bits for hosts (254 usable addresses in this subnet).

Subnetting

Subnetting divides a network into smaller blocks to reduce broadcast domains and improve security. A subnet mask (255.255.255.0 = /24) ANDed with an IP address yields the network address. Key subnetting math:

Number of subnets when borrowing n bits: 2ⁿ
Hosts per subnet: 2^{host bits} − 2 (subtract broadcast and network address)
VLSM (Variable Length Subnet Masking) lets you carve different-sized blocks from the same space

ARP — Address Resolution Protocol

IP gives you a logical destination, but frames need MAC addresses to traverse a single link. ARP bridges this gap. When host A needs to send to 192.168.1.5:

A broadcasts: "Who has 192.168.1.5? Tell AA:BB:CC:DD:EE:FF."
The target replies with its MAC address.
A caches the mapping in its ARP table (arp -n to inspect).

Gratuitous ARP (sending an ARP reply without a request) is used for conflict detection and is the basis of many L2 failover schemes.

Layer 4 In Depth: TCP vs UDP

TCP — Transmission Control Protocol

Connection-oriented (3-way handshake: SYN, SYN-ACK, ACK)
Reliable: retransmits lost segments
Ordered delivery via sequence numbers
Flow control (receiver window) + congestion control (CWND, slow start, CUBIC/BBR)
20-byte minimum header
Use: HTTP, databases, SSH

UDP — User Datagram Protocol

Connectionless, fire-and-forget
No retransmission, no ordering guarantees
8-byte header
Application layer must handle reliability if needed
Use: DNS, QUIC, video streaming, ML parameter servers
Latency: ~50–100µs less round-trip vs TCP in LAN

TCP Congestion Control — the short version

TCP tries to infer available bandwidth without explicit feedback. Slow start begins conservatively and doubles the congestion window (CWND) each RTT. Upon packet loss (inferred by timeout or triple duplicate ACK), CWND is slashed. Modern algorithms like BBR (Bottleneck Bandwidth and RTT) model the network explicitly and avoid filling queues, achieving lower latency and better throughput on lossy or long-haul paths.

Routing Protocols: RIP, OSPF, BGP

Routing protocols let routers automatically discover paths and react to topology changes. They fall into two categories:

IGP (Interior Gateway Protocols) — operate within a single autonomous system (AS): RIP, OSPF, IS-IS, EIGRP.
EGP (Exterior Gateway Protocols) — operate between ASes: BGP.

RIP — Routing Information Protocol

RIP is a distance-vector protocol: each router broadcasts its full routing table to neighbors every 30 seconds. Metric is hop count (max 15; 16 = unreachable). Convergence is slow — the "count to infinity" problem can leave stale routes for minutes. RIPv2 adds CIDR and authentication. RIP is rarely used in production today except in small or legacy networks.

OSPF — Open Shortest Path First

OSPF is a link-state protocol. Each router floods Link State Advertisements (LSAs) throughout the AS, building a complete topology map (LSDB — Link State Database). Every router then independently runs Dijkstra's SPF algorithm on the LSDB to compute shortest paths. Key properties:

Metric: interface cost (inversely proportional to bandwidth by default)
Converges in seconds after a link failure
Hierarchical: organizes routers into areas; Area 0 is the backbone. Inter-area routing goes through Area Border Routers (ABRs).
Supports ECMP (Equal Cost Multi-Path) — load-balances across multiple equal-cost routes
Uses multicast (224.0.0.5/224.0.0.6) to exchange LSAs, reducing unnecessary traffic vs RIP's broadcasts

BGP — Border Gateway Protocol

BGP is the routing protocol of the internet. It's a path-vector protocol: instead of hop counts or link costs, BGP advertises full AS paths, enabling policy-based routing. A few key concepts:

eBGP — between different ASes (e.g., your ISP and a CDN)
iBGP — within a single AS to propagate external routes internally
BGP selects routes using a complex decision process: local preference → AS path length → origin → MED → eBGP over iBGP → router-id. This ordering is why misconfigurations cause famous outages.
Route reflectors solve the full-mesh requirement for iBGP in large networks
BGP hijacking (e.g., the 2008 Pakistan Telecom/YouTube incident) happens when a rogue AS announces more specific prefixes. RPKI (Resource Public Key Infrastructure) is the emerging fix.

Quick comparison: RIP = simple, slow, legacy. OSPF = fast convergence, hierarchical, widely used inside ISPs and data centers. BGP = the internet's glue, policy-driven, not designed for speed.

Modern Angle: SDN and Overlay Networks

Software-Defined Networking decouples the control plane (deciding where traffic goes) from the data plane (actually forwarding packets). A centralized controller programs forwarding rules into switches via protocols like OpenFlow. This makes network topology programmable — critical for cloud data centers where VMs spin up and down constantly.

Overlay networks (VXLAN, Geneve, GRE) build virtual L2 networks on top of L3 infrastructure. Kubernetes CNI plugins (Flannel, Calico, Cilium) use overlays or direct routing to give each pod its own IP regardless of which physical host it runs on. In ML clusters, RDMA over Converged Ethernet (RoCE) bypasses the kernel networking stack entirely to get GPU-to-GPU latencies in the single-digit microseconds.

Why This Matters for ML Systems

Distributed ML training and inference are fundamentally network-bound workloads once you scale beyond a single node. A few concrete implications:

All-reduce in training (gradient synchronization) saturates east-west bandwidth. Collective communication libraries (NCCL, Gloo) are designed around the network topology — ring-allreduce optimizes for bandwidth, while tree-allreduce optimizes for latency.
P99 latency in inference serving is dominated by head-of-line blocking in TCP. This is why vLLM and TensorRT-LLM increasingly use UDP-based transports or RDMA for KV-cache transfer between nodes.
Routing misconfigurations in cloud environments (e.g., asymmetric paths causing TCP retransmits) can silently inflate p99 by 10–100x. Understanding BGP and OSPF convergence helps you distinguish "my code is slow" from "the network is flapping."
Overlay overhead: VXLAN adds 50 bytes of encapsulation per packet. For small tensor RPC calls this overhead is not trivial — batch size tuning and request coalescing matter at the network layer, not just the model layer.

OSI Layers, Routing Protocols, and How Packets Actually Move

Contents

The OSI Model — All 7 Layers

Encapsulation & the Data Unit Journey

Layer 3 In Depth: IP, Subnetting, ARP

IP Addressing

Subnetting

ARP — Address Resolution Protocol

Layer 4 In Depth: TCP vs UDP

TCP — Transmission Control Protocol

UDP — User Datagram Protocol

TCP Congestion Control — the short version

Routing Protocols: RIP, OSPF, BGP

RIP — Routing Information Protocol

OSPF — Open Shortest Path First

BGP — Border Gateway Protocol

Modern Angle: SDN and Overlay Networks

Why This Matters for ML Systems