Network fundamentals are one of those things that get taught once in school and then quietly drift into "I'll re-learn this when I need it" territory. But if you're building or optimizing distributed systems — especially ML inference pipelines — the OSI model stops being a trivia question and starts being a debugging checklist. This is my attempt to write the reference I wish I'd had.
The OSI Model — All 7 Layers
The Open Systems Interconnection model partitions network communication into seven abstraction layers. Each layer depends only on the layer below it and exposes a clean interface upward. In practice, TCP/IP collapses these into four, but the seven-layer model gives you precise vocabulary for where things break.
| # | Layer | Data Unit | Key Protocols / Examples | What it does |
|---|---|---|---|---|
| 7 | Application | Data | HTTP, HTTPS, DNS, SMTP, FTP, gRPC | User-facing services; defines message format and app-level semantics. |
| 6 | Presentation | Data | TLS/SSL, JPEG, Protobuf encoding | Translation, encryption, compression. Ensures sender and receiver agree on data format. |
| 5 | Session | Data | NetBIOS, RPC, NFS | Manages sessions (open/close/checkpoint). Often baked into L7 protocols today. |
| 4 | Transport | Segment | TCP, UDP, QUIC, SCTP | End-to-end delivery, port multiplexing, flow control, reliability (TCP) or not (UDP). |
| 3 | Network | Packet | IP (v4/v6), ICMP, OSPF, BGP | Logical addressing (IP), routing across networks. Routers live here. |
| 2 | Data Link | Frame | Ethernet, Wi-Fi (802.11), ARP, PPP | Node-to-node delivery on a single link. MAC addressing, error detection (CRC). Switches live here. |
| 1 | Physical | Bit | Ethernet cable, fiber, 802.11 radio, USB | Electrical, optical, or radio transmission. Bit timing and signal encoding. |
Encapsulation & the Data Unit Journey
When you send an HTTP request, each layer wraps the data from the layer above in its own header (and sometimes trailer). This wrapping is called encapsulation. On the receiving end, each layer strips its header and passes the payload up.
[HTTP Request Body] ← Application data [TCP header | HTTP data] ← Segment (L4 adds port numbers, seq no.) [IP header | TCP header | HTTP data] ← Packet (L3 adds src/dst IP) [ETH header | IP | TCP | HTTP | ETH CRC] ← Frame (L2 adds MAC addresses, CRC) [bits on the wire / optical signal] ← Physical transmission
Each header adds overhead. A minimal HTTP/1.1 request over TCP/IP/Ethernet carries at least 54 bytes of header before a single byte of application data. In high-throughput systems (ML serving, HFT), this matters.
Layer 3 In Depth: IP, Subnetting, ARP
IP Addressing
IPv4 uses 32-bit addresses written in dotted-decimal notation (192.168.1.1). IPv6 uses 128-bit addresses in hex (2001:db8::1). An IP address consists of a network prefix and a host identifier; CIDR notation specifies the prefix length: 10.0.0.0/24 means the first 24 bits are the network, leaving 8 bits for hosts (254 usable addresses in this subnet).
Subnetting
Subnetting divides a network into smaller blocks to reduce broadcast domains and improve security. A subnet mask (255.255.255.0 = /24) ANDed with an IP address yields the network address. Key subnetting math:
- Number of subnets when borrowing n bits: 2n
- Hosts per subnet: 2host bits − 2 (subtract broadcast and network address)
- VLSM (Variable Length Subnet Masking) lets you carve different-sized blocks from the same space
ARP — Address Resolution Protocol
IP gives you a logical destination, but frames need MAC addresses to traverse a single link. ARP bridges this gap. When host A needs to send to 192.168.1.5:
- A broadcasts: "Who has
192.168.1.5? TellAA:BB:CC:DD:EE:FF." - The target replies with its MAC address.
- A caches the mapping in its ARP table (
arp -nto inspect).
Gratuitous ARP (sending an ARP reply without a request) is used for conflict detection and is the basis of many L2 failover schemes.
Layer 4 In Depth: TCP vs UDP
TCP — Transmission Control Protocol
- Connection-oriented (3-way handshake: SYN, SYN-ACK, ACK)
- Reliable: retransmits lost segments
- Ordered delivery via sequence numbers
- Flow control (receiver window) + congestion control (CWND, slow start, CUBIC/BBR)
- 20-byte minimum header
- Use: HTTP, databases, SSH
UDP — User Datagram Protocol
- Connectionless, fire-and-forget
- No retransmission, no ordering guarantees
- 8-byte header
- Application layer must handle reliability if needed
- Use: DNS, QUIC, video streaming, ML parameter servers
- Latency: ~50–100µs less round-trip vs TCP in LAN
TCP Congestion Control — the short version
TCP tries to infer available bandwidth without explicit feedback. Slow start begins conservatively and doubles the congestion window (CWND) each RTT. Upon packet loss (inferred by timeout or triple duplicate ACK), CWND is slashed. Modern algorithms like BBR (Bottleneck Bandwidth and RTT) model the network explicitly and avoid filling queues, achieving lower latency and better throughput on lossy or long-haul paths.
Routing Protocols: RIP, OSPF, BGP
Routing protocols let routers automatically discover paths and react to topology changes. They fall into two categories:
- IGP (Interior Gateway Protocols) — operate within a single autonomous system (AS): RIP, OSPF, IS-IS, EIGRP.
- EGP (Exterior Gateway Protocols) — operate between ASes: BGP.
RIP — Routing Information Protocol
RIP is a distance-vector protocol: each router broadcasts its full routing table to neighbors every 30 seconds. Metric is hop count (max 15; 16 = unreachable). Convergence is slow — the "count to infinity" problem can leave stale routes for minutes. RIPv2 adds CIDR and authentication. RIP is rarely used in production today except in small or legacy networks.
OSPF — Open Shortest Path First
OSPF is a link-state protocol. Each router floods Link State Advertisements (LSAs) throughout the AS, building a complete topology map (LSDB — Link State Database). Every router then independently runs Dijkstra's SPF algorithm on the LSDB to compute shortest paths. Key properties:
- Metric: interface cost (inversely proportional to bandwidth by default)
- Converges in seconds after a link failure
- Hierarchical: organizes routers into areas; Area 0 is the backbone. Inter-area routing goes through Area Border Routers (ABRs).
- Supports ECMP (Equal Cost Multi-Path) — load-balances across multiple equal-cost routes
- Uses multicast (
224.0.0.5/224.0.0.6) to exchange LSAs, reducing unnecessary traffic vs RIP's broadcasts
BGP — Border Gateway Protocol
BGP is the routing protocol of the internet. It's a path-vector protocol: instead of hop counts or link costs, BGP advertises full AS paths, enabling policy-based routing. A few key concepts:
- eBGP — between different ASes (e.g., your ISP and a CDN)
- iBGP — within a single AS to propagate external routes internally
- BGP selects routes using a complex decision process: local preference → AS path length → origin → MED → eBGP over iBGP → router-id. This ordering is why misconfigurations cause famous outages.
- Route reflectors solve the full-mesh requirement for iBGP in large networks
- BGP hijacking (e.g., the 2008 Pakistan Telecom/YouTube incident) happens when a rogue AS announces more specific prefixes. RPKI (Resource Public Key Infrastructure) is the emerging fix.
Modern Angle: SDN and Overlay Networks
Software-Defined Networking decouples the control plane (deciding where traffic goes) from the data plane (actually forwarding packets). A centralized controller programs forwarding rules into switches via protocols like OpenFlow. This makes network topology programmable — critical for cloud data centers where VMs spin up and down constantly.
Overlay networks (VXLAN, Geneve, GRE) build virtual L2 networks on top of L3 infrastructure. Kubernetes CNI plugins (Flannel, Calico, Cilium) use overlays or direct routing to give each pod its own IP regardless of which physical host it runs on. In ML clusters, RDMA over Converged Ethernet (RoCE) bypasses the kernel networking stack entirely to get GPU-to-GPU latencies in the single-digit microseconds.
Why This Matters for ML Systems
Distributed ML training and inference are fundamentally network-bound workloads once you scale beyond a single node. A few concrete implications:
- All-reduce in training (gradient synchronization) saturates east-west bandwidth. Collective communication libraries (NCCL, Gloo) are designed around the network topology — ring-allreduce optimizes for bandwidth, while tree-allreduce optimizes for latency.
- P99 latency in inference serving is dominated by head-of-line blocking in TCP. This is why vLLM and TensorRT-LLM increasingly use UDP-based transports or RDMA for KV-cache transfer between nodes.
- Routing misconfigurations in cloud environments (e.g., asymmetric paths causing TCP retransmits) can silently inflate p99 by 10–100x. Understanding BGP and OSPF convergence helps you distinguish "my code is slow" from "the network is flapping."
- Overlay overhead: VXLAN adds 50 bytes of encapsulation per packet. For small tensor RPC calls this overhead is not trivial — batch size tuning and request coalescing matter at the network layer, not just the model layer.