What Does the Web Consist Of?
- Who uses it?
- Who provides the content?
- How do they communicate?
- How do we find the content?
- How is the content organized?
- How is it displayed?
Web Components
-
Infrastructure: Clients, Servers (DNS, CDN, Datacenters)
-
Content:
- URL: naming content
- HTML: formatting content
-
Protocol for exchanging information: HTTP
Why Is There Nothing About the Network?
-
Clients, servers, and routers operate at multiple layers:
- Transport
- Network
- Datalink
- Physical
-
Web protocols (e.g., HTTP) exist at the Application layer, abstracting away lower layers.
What We Want
Example:
- URL:
http://123.xyz
- Request: HTTP Request →
GET /index.html
- Response: HTTP Response with content
Format:
protocol://host-name[:port]/directory-path/resource
- Protocol: http, ftp, https, smtp, rtsp, etc.
- Host-name: DNS name or IP address
- Port: defaults to protocol’s standard port (HTTP: 80, HTTPS: 443)
- Directory path: hierarchical, reflecting file system
- Resource: identifies the desired resource
Examples:
- File system:
https://github.com/eecs489staff/slides/blob/main/04-HTTPandWeb.pptx
- Program execution:
https://www.google.com/search?q=eecs489
Hyper Text Transfer Protocol (HTTP)
-
Client-server architecture
- Server is “always on” and “well-known”
- Clients initiate contact
-
Request/reply protocol (synchronous)
-
Runs over TCP, Port 80
-
Stateless
-
ASCII format (before HTTP/2)
Example: HTML
<!DOCTYPE html>
<html>
<body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
</body>
</html>
Client-Server Interaction – HTTP/0.9
- Establish TCP connection
- Client request
- Server response
- Close connection
Result: return HTML file.
- User: fast downloads, high availability
- Content provider: happy users, cost-effective infrastructure
- Network: avoid overload
Solutions
- Improve networking protocols (HTTP, TCP, etc.)
- Caching and replication
- Exploit economies of scale (e.g., webhosting, CDNs, datacenters)
- Most web pages = multiple objects (HTML + images, CSS, JS, etc.)
- Naive retrieval: one item at a time
- New TCP connection for each small object → inefficient
Object Request Response Time
- RTT (Round-Trip Time): time for packet to travel client ↔ server
- Response time =
2 RTT + Transmission Time
Non-Persistent Connections
- Default in HTTP/1.0
- Each object requires 2 RTT + Δ
- Very inefficient
The Web: History
-
HTTP/1.1 (1997)
- Persistent connections (multiple requests/responses per connection)
- Performance & security improvements
-
HTTP/2 (2015)
- Multiplexing (multiple requests/responses concurrently)
- Binary protocol
- Server push
-
HTTP/3 (2022)
- Built on QUIC over UDP
- Solves head-of-line blocking
Concurrent Requests
- Multiple parallel connections
- Client & provider benefit
- Network load increases
Persistent Connections
- Maintain TCP connection across multiple requests
- Avoid setup/teardown overhead
- Default in HTTP/1.1
Pipelined Requests & Responses
- Multiple requests sent in batches
- FIFO responses → head-of-line blocking issue
- Priority & preemption needed
Scorecard
Caching
Why?
- Exploits locality of reference
- Highly effective, though limited by unique requests
How?
-
If-modified-since header
-
Response headers:
Where?
- Client/browser
- Forward proxies (near clients)
- Reverse proxies (near servers)
- CDNs
HTTP Methods (HTTP/1.1)
- GET, HEAD
- POST: send info (e.g., forms)
- PUT: upload file
- DELETE: delete file
Client-to-Server Communication
HTTP Request Message
- Request line: method, resource, version
- Headers: metadata (e.g.,
Host, User-Agent)
- Body: optional (e.g., form data)
Example:
GET /somedir/page.html HTTP/1.1
Host: www.someschool.edu
User-agent: Mozilla/4.0
Connection: close
Accept-language: fr
Server-to-Client Communication
HTTP Response Message
- Status line: version, status code, phrase
- Headers: metadata (e.g.,
Content-Type, Date)
- Body: data
Example:
HTTP/1.1 200 OK
Connection: close
Date: Thu, 06 Jan 2017 12:00:15 GMT
Server: Apache/1.3.0 (Unix)
Last-Modified: Mon, 22 Jun 2006 ...
Content-Length: 6821
Content-Type: text/html
<data>
HTTP Is Stateless
- Each request-response independent
- Advantages: scalability, easy failure handling, high request rate
- Disadvantages: some applications need state (e.g., shopping cart)
State in Stateless Protocols: Cookies
- Client stores small state for server
- Sent with future requests
- Can provide authentication
Example:
Set-Cookie: XYZ
Cookie: XYZ
Beyond Cookies
- Marketing and tracking concerns
- Example: FLoC (Federated Learning of Cohorts) in Google Chrome
Summary
- Persistence: reuse TCP connections
- Pipelining: batch requests, ordered responses
- Concurrent requests: multiple TCP connections
- Multiplexing: many streams, fully interleaved
- HTTP/1.1: text-based → replaced by HTTP/2 (binary) → HTTP/3 (QUIC/UDP)
- Performance improvements: pipelining, batching, caching, CDNs, datacenters