ThinkLab

What Does the Web Consist Of?

Who uses it?
Who provides the content?
How do they communicate?
How do we find the content?
How is the content organized?
How is it displayed?

Web Components

Infrastructure: Clients, Servers (DNS, CDN, Datacenters)
Content:
- URL: naming content
- HTML: formatting content
Protocol for exchanging information: HTTP

Why Is There Nothing About the Network?

Clients, servers, and routers operate at multiple layers:
- Transport
- Network
- Datalink
- Physical
Web protocols (e.g., HTTP) exist at the Application layer, abstracting away lower layers.

What We Want

Example:

URL: http://123.xyz
Request: HTTP Request → GET /index.html
Response: HTTP Response with content

URL: Uniform Resource Locator

Format:

protocol://host-name[:port]/directory-path/resource

Protocol: http, ftp, https, smtp, rtsp, etc.
Host-name: DNS name or IP address
Port: defaults to protocol’s standard port (HTTP: 80, HTTPS: 443)
Directory path: hierarchical, reflecting file system
Resource: identifies the desired resource

Examples:

File system: https://github.com/eecs489staff/slides/blob/main/04-HTTPandWeb.pptx
Program execution: https://www.google.com/search?q=eecs489

Hyper Text Transfer Protocol (HTTP)

Client-server architecture
- Server is “always on” and “well-known”
- Clients initiate contact
Request/reply protocol (synchronous)
Runs over TCP, Port 80
Stateless
ASCII format (before HTTP/2)

Example: HTML

<!DOCTYPE html>
<html>
<body>

<h1>My First Heading</h1>
<p>My first paragraph.</p>

</body>
</html>

Client-Server Interaction – HTTP/0.9

Establish TCP connection
Client request
Server response
Close connection

Result: return HTML file.

Performance Goals

User: fast downloads, high availability
Content provider: happy users, cost-effective infrastructure
Network: avoid overload

Solutions

Improve networking protocols (HTTP, TCP, etc.)
Caching and replication
Exploit economies of scale (e.g., webhosting, CDNs, datacenters)

HTTP Performance

Most web pages = multiple objects (HTML + images, CSS, JS, etc.)
Naive retrieval: one item at a time
New TCP connection for each small object → inefficient

Object Request Response Time

RTT (Round-Trip Time): time for packet to travel client ↔ server
Response time = 2 RTT + Transmission Time

Non-Persistent Connections

Default in HTTP/1.0
Each object requires 2 RTT + Δ
Very inefficient

The Web: History

HTTP/1.1 (1997)
- Persistent connections (multiple requests/responses per connection)
- Performance & security improvements
HTTP/2 (2015)
- Multiplexing (multiple requests/responses concurrently)
- Binary protocol
- Server push
HTTP/3 (2022)
- Built on QUIC over UDP
- Solves head-of-line blocking

Techniques for Improving Performance

Concurrent Requests

Multiple parallel connections
Client & provider benefit
Network load increases

Persistent Connections

Maintain TCP connection across multiple requests
Avoid setup/teardown overhead
Default in HTTP/1.1

Pipelined Requests & Responses

Multiple requests sent in batches
FIFO responses → head-of-line blocking issue
Priority & preemption needed

Scorecard

n small objects:
- One-at-a-time: ~2n RTT
- Concurrent (m): ~2[n/m] RTT
- Persistent: ~(n+1) RTT
- Pipelined: ~2 RTT
n large objects (size F):
- Dominated by throughput BC
- One-at-a-time: nF/BC
- Concurrent (m): nF/(mBC), if mBC ≤ BL
- Pipelined/persistent: nF/BC

Caching

Why?

Exploits locality of reference
Highly effective, though limited by unique requests

How?

If-modified-since header
Response headers:
- Expires
- No-cache

Where?

Client/browser
Forward proxies (near clients)
Reverse proxies (near servers)
CDNs

HTTP Methods (HTTP/1.1)

GET, HEAD
POST: send info (e.g., forms)
PUT: upload file
DELETE: delete file

Client-to-Server Communication

HTTP Request Message

Request line: method, resource, version
Headers: metadata (e.g., Host, User-Agent)
Body: optional (e.g., form data)

Example:

GET /somedir/page.html HTTP/1.1
Host: www.someschool.edu
User-agent: Mozilla/4.0
Connection: close
Accept-language: fr

Server-to-Client Communication

HTTP Response Message

Status line: version, status code, phrase
Headers: metadata (e.g., Content-Type, Date)
Body: data

Example:

HTTP/1.1 200 OK
Connection: close
Date: Thu, 06 Jan 2017 12:00:15 GMT
Server: Apache/1.3.0 (Unix)
Last-Modified: Mon, 22 Jun 2006 ...
Content-Length: 6821
Content-Type: text/html

<data>

HTTP Is Stateless

Each request-response independent
Advantages: scalability, easy failure handling, high request rate
Disadvantages: some applications need state (e.g., shopping cart)

State in Stateless Protocols: Cookies

Client stores small state for server
Sent with future requests
Can provide authentication

Example:

Set-Cookie: XYZ
Cookie: XYZ

Beyond Cookies

Marketing and tracking concerns
Example: FLoC (Federated Learning of Cohorts) in Google Chrome

Summary

Persistence: reuse TCP connections
Pipelining: batch requests, ordered responses
Concurrent requests: multiple TCP connections
Multiplexing: many streams, fully interleaved
HTTP/1.1: text-based → replaced by HTTP/2 (binary) → HTTP/3 (QUIC/UDP)
Performance improvements: pipelining, batching, caching, CDNs, datacenters

计算机网络（三）

What Does the Web Consist Of?

Web Components

Why Is There Nothing About the Network?

What We Want

URL: Uniform Resource Locator

Hyper Text Transfer Protocol (HTTP)

Example: HTML

Client-Server Interaction – HTTP/0.9

Performance Goals

Solutions

HTTP Performance

Object Request Response Time

Non-Persistent Connections

The Web: History

Techniques for Improving Performance

Concurrent Requests

Persistent Connections

Pipelined Requests & Responses

Scorecard

Caching

Why?

How?

Where?

HTTP Methods (HTTP/1.1)

Client-to-Server Communication

Server-to-Client Communication

HTTP Is Stateless

State in Stateless Protocols: Cookies

Beyond Cookies

Summary

探索主题