
Introduction
URL shorteners look deceptively simple - take a long link, generate a tiny one, and redirect users whenever they click it.
But behind the scenes, systems like TinyURL or Bitly handle billions of redirects and must ensure zero collisions, low latency, and high availability.
Let’s walk through how to design such a system from scratch.
1. API Design: Creating and Using Short URLs
At the core, a URL shortener needs just two endpoints:
POST /shorten - Create a short URL
A user submits a long URL:

The server should:
- Generate a short code
- Store the mapping
{ shortCode → longURL } - Return the short URL
Example request:
POST /shorten
Content-Type: application/json
{
"url": "https://www.example.com/blog/why-software-scale-matters"
}
Example response:
{
"shortUrl": "https://tiny.ke/AB12cdE"
}
GET /{code} - Redirect to the original URL
When someone clicks the short link:

The service:
- Looks up the code in the database
- Issues an HTTP 301 redirect to the long URL
Example response:
HTTP/1.1 301 Moved Permanently
Location: https://www.example.com/blog/why-software-scale-matters
2. How Long Should a Short Code Be?
A good shortener must generate unique, compact, and human-friendly IDs.
A popular choice is base62 encoding:
a–z→ 26 charsA–Z→ 26 chars0–9→ 10 chars
Total = 62 characters
Using a 7-character code gives:
62⁷ ≈ 3.5 trillion combinations
3. Scaling: Handling Thousands of Requests per Second
Even with the right code space, one application server isn't enough.
At 1000 short-URL creations per second, a single node becomes a bottleneck. So we scale horizontally:

Solution: Add more servers behind a load balancer
- Requests are evenly distributed
- Each server can process URL creation in parallel
- Throughput increases linearly with the number of servers
But this introduces a new problem...
4. The Collision Problem
When multiple servers generate short codes simultaneously:

Two servers may create the same code at the same time, inserting duplicate keys into the database.
Even with base62, collisions are unavoidable unless we enforce constraints.
We need collision-free code generation across all servers.
5. Solution: Unique ID Ranges with Zookeeper
To eliminate collisions entirely, use Zookeeper to assign each server a non-overlapping numeric range.
Example:
- Server 1 → IDs
0–1M - Server 2 →
1M–2M - Server 3 →
2M–3M - ...and so on

Each server:
- Gets its own range
- Maintains a local counter
- Converts the counter to base62
- Inserts the mapping without checking the database
Since the ranges never overlap, collisions are mathematically impossible.
6. End-to-End Flow
Here's how the entire creation pipeline works:

- User calls
POST /shorten - Request hits the load balancer
- Load balancer forwards it to any app server
- That server increments its counter
- Converts the number to base62
- Inserts
{ shortCode → longURL }into Cassandra - Returns the short URL
7. Handling Redirects Efficiently (100× More Load)
Redirects happen far more often than creations.
If your service creates 1 million URLs per day, it might serve 100 million redirects.
Hitting the database for every redirect would be too slow.
Solution: Add a Redis cache layer

Flow:
- User clicks short URL
- Load balancer → app server
- Server checks Redis
- Cache hit? Return redirect immediately
- Cache miss? Query Cassandra
- Cache the result
- Issue 301 redirect
Redis keeps hot URLs in-memory, reducing lookup time to microseconds.
Conclusion
A URL shortener may look like a tiny product, but designing it for real-world scale involves:
- Clean API design
- Efficient base62 short-code generation
- Collision-free distributed ID allocation
- Load-balancing across multiple servers
- Zookeeper-managed ID ranges
- Fast Redis caching for redirect traffic
These patterns apply not just to URL shorteners, but to any system dealing with high write-volume + extreme read-volume workloads.