← Back to System Design

System Design Glossary

Key terms from the System Design course, linked to the lesson that introduces each one.

5,528 terms.

#

A

ABAC policy engine
is a dedicated component that evaluates authorization policies written in a specialized language.
Lesson 936ABAC Policy Engines
Abort on their own
another participant might be committing
Lesson 573The Blocking Problem in 2PC
Aborted
The transaction was rolled back (could happen from any state)
Lesson 572Participant State Transitions
Absolute deadline (recommended)
Lesson 1112HTTP Header-Based Propagation
Absolute time
"Expire at 3:00 PM today"
Lesson 156Time-Based Expiration (TTL)
Abstraction layers
hide implementation details behind clean boundaries.
Lesson 38Design for Change
Abuse detection
Identify tenants hitting limits frequently or attempting to bypass restrictions
Lesson 1825Monitoring and Analytics Per Tenant
Accelerates learning
Without defensive behavior, teams uncover root causes faster and identify systemic weaknesses that might affect other areas.
Lesson 1351Blameless Postmortem Culture
Accent-Insensitive Indexing
Build tries that strip diacritics for matching but preserve them for display.
Lesson 1768Typeahead for Multi-Language Support
Accept Bounded Overage
Design for 10-20% overage buffer; adjust advertised limits accordingly (limit=90, enforce at 100).
Lesson 981Race Conditions in Distributed Counters
Accept phase
, the proposer sends its value (or adopts a value learned in Phase 1) to acceptors.
Lesson 612The Two-Phase Protocol
Accept requests
and allow temporary inconsistency (AP system)
Lesson 532Why Eventual Consistency Exists
Accept staleness
(serve old data temporarily)
Lesson 155Cache Invalidation Problem
Accept temporary inconsistency
Data might be out of sync for seconds, minutes, or longer
Lesson 583Alternative: Best Effort with Eventual Consistency
Accept webhook requests
from providers at dedicated endpoints (e.
Lesson 1693Delivery Receipt Tracking
Acceptable
Provide value even if degraded
Lesson 1061Fallback Strategies
Acceptable Degradation
Define what "graceful" means.
Lesson 1073Bulkhead Sizing: Balancing Isolation and Utilization
Acceptable seconds
→ CDN edge caching for global users
Lesson 130Choosing the Right Caching Layer
Acceptable storage cost
More than incremental but manageable
Lesson 1404Differential Backups
Acceptors
form the voting body that decides which proposal to accept.
Lesson 610The Three Roles in Paxos
Accepts writes
that will be reconciled later
Lesson 315Basically Available: Prioritizing Uptime
Access control
You can grant permissions at the database or collection level
Lesson 383Collections and Databases
Access Control List
is a table attached to a resource that specifies exactly which users or groups have what permissions on that specific resource.
Lesson 937Access Control Lists (ACLs)
Access controls
Role-based permissions on tables and columns
Lesson 764Data Governance and Quality
Access logs
capture the who, what, and when of each request: timestamps, endpoints hit, HTTP methods, status codes, response times, and client identifiers.
Lesson 890Logging and Metrics CollectionLesson 1135Log Retention and Volume Management
access tokens
that expire quickly (minutes to hours) for API requests, paired with **refresh tokens** that last much longer (days to weeks) but are only used to obtain new access tokens.
Lesson 915Token Expiration and Refresh TokensLesson 1615Signed URLs and Token-Based Access
Access-Based Extension
Implement a "last accessed" timestamp.
Lesson 1573Handling Never-Expiring Pastes
Account balance checks
Quorum reads (R=QUORUM) balance consistency and availability
Lesson 563Tunable Consistency in Practice
Account balance updates
(CP): Require strong consistency across all regions before confirming transaction
Lesson 510Real Systems: Multi-Region Trade-offs
Accuracy is non-negotiable
If your business requires absolutely correct results (financial reporting, compliance audits, regulatory data), Lambda's batch layer provides a "source of truth" that corrects any errors from the speed layer.
Lesson 755When to Choose Lambda vs Kappa
Accurate expiration tracking
TTL policies work from the moment of creation
Lesson 1559Write Path: Synchronous vs Asynchronous Storage
Accurate percentiles
for that specific service instance (no bucketing error)
Lesson 1186Summary Metrics
ACID
stands for **Atomicity, Consistency, Isolation, and Durability** — four properties that ensure your database transactions are safe, predictable, and reliable even when things go wrong.
Lesson 303ACID Properties Overview
ACID guarantees
Strong consistency when preferences affect billing or compliance.
Lesson 1721Preference Storage Strategy
ACID properties
(Atomicity, Consistency, Isolation, Durability) across these boundaries becomes extraordinarily complex.
Lesson 1489Cross-Partition Transactions
Acknowledgment flows
track these stages through a series of confirmations from various components in the delivery chain.
Lesson 1718Acknowledgment and Confirmation Flows
Acknowledgments (`acks`)
Controls durability vs speed.
Lesson 724Kafka Performance Tuning
ACLs
excel when you need per-resource control—like giving specific users access to specific documents without creating roles for every combination.
Lesson 937Access Control Lists (ACLs)
Acquisition
Thread requests connection → Pool checks for idle connection → Validates if configured → Marks as active → Returns to application
Lesson 270Connection Lifecycle in a Pool
Action
What operation they attempted (read, write, delete)
Lesson 944Auditing and Compliance for Authorization
Action attributes
What operation is being attempted
Lesson 935Attribute-Based Access Control (ABAC) Introduction
Action Items
Concrete follow-ups with owners and deadlines—"Add canary deployment step" or "Increase monitoring coverage for X"
Lesson 1304Blameless PostmortemsLesson 1350What is a Postmortem?Lesson 1352Postmortem Structure and Action Items
Actionable
Should trigger automatic remediation (failover, alerts)
Lesson 1339Health Checks and Failure Detection
Actionable alerts only
Every alert should have a clear action.
Lesson 1171Log Review and Alert Fatigue
Actionable metrics
directly inform decisions:
Lesson 1215Avoiding Vanity Metrics
Actions
(eager): count, collect, save—trigger actual computation
Lesson 768Apache Spark Overview
Activation
Promote read replicas, restore from backups if needed
Lesson 1437Failover and Failback Procedures
Active
While active, the span can collect attributes, events, and status updates
Lesson 1231Span Lifecycle and Structure
Active checks
catch problems *before* users are affected.
Lesson 99Active vs Passive Health Checks
Active connection count
Number of in-flight HTTP requests to prevent overwhelming the host with concurrent connections
Lesson 1848Politeness Table and Per-Host State
Active connections
How many requests each server is currently handling
Lesson 92Least Response Time AlgorithmLesson 1175Gauge MetricsLesson 1184Gauge Metrics
Active health checks
work like a heartbeat monitor—the load balancer regularly sends test requests to each server (like pinging "Are you alive?
Lesson 99Active vs Passive Health ChecksLesson 180DNS-Based Request Routing
Active invalidation on write
means your system explicitly deletes or updates cache entries the instant you modify the source data.
Lesson 157Active Invalidation on Write
Active-Active (Dual-Operation)
Both regions actively serve traffic simultaneously.
Lesson 1436Active-Passive vs Active-Active DR
Active-Passive (Standby Failover)
Your primary region handles all traffic.
Lesson 1436Active-Passive vs Active-Active DR
Acyclic
No circular dependencies (Task A can't depend on Task B if B depends on A)
Lesson 766Apache Airflow Fundamentals
Adaptation
If bandwidth drops, player seamlessly switches to lower quality mid-stream
Lesson 1602Adaptive Bitrate Streaming (ABR)
Adaptive
Increase sample rate when error rates rise
Lesson 1164Sampling for High-Volume Logs
Adaptive bitrate streaming
encodes the same video at multiple quality levels (240p, 480p, 720p, 1080p, etc.
Lesson 193CDN for Video StreamingLesson 1630Live Streaming Architecture
Adaptive Bitrate Streaming (ABR)
is your foundation.
Lesson 1618Optimizing for Mobile Networks
Adaptive freshness
If a page hasn't changed in 10 consecutive crawls, exponentially back off the recrawl frequency.
Lesson 1873Handling Recrawls and Freshness
Adaptive sampling
Increase sample rate when detecting anomalies or errors (sample failures more aggressively than successes)
Lesson 1217Sampling for Expensive Metrics
Adaptive sync
Increase frequency when approaching limits
Lesson 1802Synchronization Strategies for Local Caches
Adaptive timeouts
continuously measure actual request latencies and adjust timeout values based on **percentile calculations**—typically P95 or P99.
Lesson 1117Adaptive Timeouts Based on Historical Latency
Add logging and monitoring
to track latency, throughput, and error rates
Lesson 40Measure Before Optimizing
Add smart filtering
Use correlation IDs to group related failures, avoiding duplicate alerts for the same incident.
Lesson 1171Log Review and Alert Fatigue
Add/update a score
O(log N) complexity
Lesson 359Redis for Leaderboards and Counting
Adding new endpoints
Existing routes continue unchanged
Lesson 1905Breaking vs Non-Breaking Changes
Adding optional fields
to requests: Old clients omit them; new clients can include them
Lesson 1905Breaking vs Non-Breaking Changes
Adding RAM
improves caching (more data in memory = faster queries)
Lesson 54Scaling Databases: Special Considerations
Adds metadata
Automatically includes it in the `grpc-timeout` header
Lesson 1104gRPC Timeout Propagation
Adjust TTL values
(keep hot data longer)
Lesson 129Cache Hit Ratio Optimization
Adjusts frequency caps
dynamically (reduce from 5/day to 2/day if engagement drops)
Lesson 1729Analytics-Driven Optimization
Advance notice requirements
SLAs typically specify how far ahead you must announce maintenance (e.
Lesson 1328Scheduled Maintenance and Availability Accounting
Advanced features
Proxies can provide connection pooling, query caching, lag-aware routing, and even query rewriting.
Lesson 222Proxy-Based Read-Write Splitting
Advantage
Strong consistency — if the primary fails immediately after a write, the replica has the exact same data.
Lesson 203Synchronous Replication ExplainedLesson 1534Rate Limiting for URL Creation
Advantages of summaries
Lesson 1186Summary Metrics
Affected scope
Which service instances, regions, or user segments
Lesson 1293Alert Context and Enrichment
Age/freshness matters most
→ TTL
Lesson 153Choosing an Eviction Policy
Agent overhead
consuming precious CPU and memory on production hosts
Lesson 1252Sampling Strategies Overview
Agent resource usage
Tracing agents running on each host consume CPU and memory to process, buffer, and forward spans.
Lesson 1259Network and Agent Overhead
Aggregate at the edge
Group similar values.
Lesson 1210Cardinality Management
Aggregate metrics
Count events by type, group by user, calculate percentiles automatically
Lesson 1137What is Structured Logging
Aggregate related events
Instead of alerting on every individual timeout, alert when timeout rate exceeds 5% over 5 minutes.
Lesson 1171Log Review and Alert Fatigue
Aggregation accuracy
Test that roll-ups, percentiles, and rate calculations produce correct results.
Lesson 1218Testing Metric Pipelines
Aggregation and Roll-Ups
(lesson 1179) and **Metrics Federation and Long-Term Storage** (lesson 1206).
Lesson 1270Monitoring Resolution and Retention Tradeoffs
Aggregation overhead
Computing sums, averages, or counts across large datasets requires reading many documents into memory.
Lesson 408Query Performance Limitations
Aggregation structures
that count documents matching each facet value
Lesson 1775Faceted Search and Filters
Aggregation tables
(also called **summary tables** or **rollup tables**) store pre-computed metrics and summaries.
Lesson 294Aggregation TablesLesson 297Denormalization in Practice
Aggregations
Compute running totals, averages, or counts over time windows.
Lesson 722Kafka Streams API
Aggressive caching
Use local caches more heavily for hot tenants to reduce Redis load
Lesson 1823Hot Tenant ProblemLesson 1875HTTP Methods: GET, POST, PUT, DELETE Semantics
agreement
, **validity** (the agreed value was actually proposed by someone), and **termination** (the decision completes eventually)—all while nodes crash and networks partition.
Lesson 599What Is Distributed Consensus?Lesson 608The Problem Paxos Solves
Alert enrichment
solves this by packaging critical context *inside* the alert itself.
Lesson 1293Alert Context and EnrichmentLesson 1295Testing Alerts and Dry Runs
alert fatigue
, and it's dangerous because real incidents get missed in the noise.
Lesson 1171Log Review and Alert FatigueLesson 1287Actionability: Every Alert Needs a Runbook
Alert firing
Does the condition correctly trigger the alert?
Lesson 1295Testing Alerts and Dry Runs
Alert on symptoms
to know *when* to respond.
Lesson 1286Symptoms vs Causes
Alert quality matters
Actionable, low-noise alerts (remember alert fatigue?
Lesson 1297On-Call Fundamentals and Rotation Models
Alerting logic
Trigger threshold violations deliberately and confirm alerts fire within expected timeframes.
Lesson 1218Testing Metric Pipelines
Alertmanager
(separate component): Handles alert routing and notifications
Lesson 1198Prometheus Architecture and Data Model
Alerts
are urgent, actionable signals that require immediate human intervention to prevent or mitigate user impact.
Lesson 1285Alert vs Notification
Alerts become actionable
"Users can't log in" is clearer than "Database CPU at 87%"
Lesson 1313Monitoring and Observability for SRE
Algorithm type
(token bucket, sliding window, etc.
Lesson 1819Per-Tenant Configuration Storage
all
replicas fail, you can fall back to reading from the primary database (accepting the increased load) or return degraded service errors, depending on your requirements.
Lesson 227Handling Replica Failures in RoutingLesson 425Tunable Consistency LevelsLesson 877The API Gateway Bottleneck RiskLesson 1321Redundancy and Parallel Availability
all relevant shards
(or sometimes all shards if you can't determine which ones hold relevant data).
Lesson 255Scatter-Gather PatternLesson 1780Distributed Query Coordination
Allow N test requests
through (e.
Lesson 1060Half-Open State Testing
Allowed
`GET user:12345` retrieves a user by their ID
Lesson 342No Secondary Indexes or Query Language
Allowlisting
Define which fields are safe to log; drop everything else by default.
Lesson 1145Sensitive Data in Structured Logs
Allows recovery
(periodically tests if service recovered)
Lesson 1044The Electrical Analogy
Alphabet reduction
(map rare characters to shared slots)
Lesson 1759Trie Space Optimization Techniques
Also Layer 4
can operate at the transport layer for pure TCP/UDP load balancing
Lesson 111NGINX as a Load Balancer
Alternative Service Paths
Route to a backup service or simplified version.
Lesson 1061Fallback Strategies
Always available
DNS servers respond to queries even if they haven't received the latest updates yet
Lesson 500DNS Systems (AP)
Always include units
Never make engineers guess.
Lesson 1182Metric Naming Conventions
Always return 429
, not 503 or 500
Lesson 960Rate Limit Response Codes
Always use HTTPS
Tokens in transit over HTTP are trivial to intercept
Lesson 931OAuth2 Security Best Practices
Always-On Protection
Unlike scrambling to respond after an attack begins, CDN-based protection is continuously active at the edge, closest to attack sources.
Lesson 195CDN for DDoS Protection
Always-writable
Writes succeed even during failures
Lesson 375Sloppy Quorum and Hinted Handoff
Amazon SES
, or **Mailgun** that handle the heavy lifting:
Lesson 1686Email Notifications
Amazon SQS
or **Apache Kafka** distribute file processing tasks across worker pools, ensuring reliable, scalable, and fault-tolerant job handling for image and video operations.
Lesson 1604Message Queue for Processing Jobs
Amazon SQS/SNS
are managed cloud services from AWS.
Lesson 665Overview of Message Broker Landscape
Analytics and metrics
– view counts or "likes" don't need instant global accuracy
Lesson 318When to Choose ACID or BASE
Analytics dashboards
don't need real-time precision
Lesson 317ACID vs BASE Tradeoffs
Analytics events
Recording the same action multiple times skews metrics
Lesson 1001Side Effects and Idempotency
Analytics payloads
If logging metadata, increase per-request size accordingly
Lesson 1499Bandwidth Requirements for Redirects
Analytics pipeline
– writes the event to a data warehouse for reporting
Lesson 1725Analytics Pipeline Architecture
Analyze access patterns
to cache the right data
Lesson 129Cache Hit Ratio Optimization
Analyze real traffic patterns
from production (not just synthetic tests)
Lesson 40Measure Before Optimizing
Android (FCM)
Uses API keys for authentication.
Lesson 1684Push Notifications: Mobile and Web
Answer
Yes, because Bob → member_of → Team Y → can_view → Document X
Lesson 938Relationship-Based Access Control (ReBAC)
Anti-entropy
is a continuous background reconciliation process where replicas compare their data and synchronize differences.
Lesson 369Anti-Entropy and Merkle Trees
Anti-pattern
Drawing service boundaries on a whiteboard without considering your org chart.
Lesson 819Team Structure and Conway's Law
Anycast
is a network addressing method where multiple edge servers share the *same IP address* across different locations worldwide.
Lesson 181Anycast Routing for CDNsLesson 1616Geographic Routing and DNS
AOF (Append-Only File)
works like a detailed diary.
Lesson 351Redis Persistence: AOF Logs
AOF rewriting
compacts the log by creating a minimal command set that produces the same final state.
Lesson 351Redis Persistence: AOF Logs
AP choice
in the CAP theorem framework.
Lesson 531What is Eventual Consistency?
AP response
Keep taking orders in both cities, reconcile conflicts later (preserve availability)
Lesson 505The Partition Question: When, Not If
AP system
(prioritizing availability) might allow both customers to complete their purchase during a network partition, discovering the problem only when you try to fulfill orders.
Lesson 499Inventory Management (CP)Lesson 500DNS Systems (AP)Lesson 502Mixed Strategies: Hybrid Systems
AP systems
(like Cassandra with eventual consistency): Always respond, accepting temporary inconsistencies
Lesson 481What CAP Theorem StatesLesson 494AP Systems: Prioritizing AvailabilityLesson 512Social Media: Availability Over Consistency
Apache Cassandra
adopted Dynamo's core architecture almost wholesale: consistent hashing with virtual nodes, tunable consistency via quorum reads/writes, and gossip-based membership.
Lesson 378Dynamo's Influence on Modern SystemsLesson 554Consistency Model Examples in Real Systems
Apache Flink
is purpose-built for stream processing with true event-time processing and stateful computations.
Lesson 744Stream Processing FrameworksLesson 756Hybrid and Modern Alternatives
Apache Kafka
is designed for high-throughput event streaming and log-based messaging.
Lesson 665Overview of Message Broker LandscapeLesson 1604Message Queue for Processing Jobs
Apache ZooKeeper
coordinates distributed systems with linearizable operations.
Lesson 530Strong Consistency in Practice
API access delegation
Users grant your analytics dashboard read-only access to their Stripe data.
Lesson 920OAuth2 Fundamentals and Use Cases
API composition needs
where clients require aggregated data from multiple services
Lesson 879When to Introduce an API Gateway
API contract
defines the structured conversation between the client and rate limiter.
Lesson 1786API Contract: Request and Response Format
API Gateway
is a server that acts as a centralized front door for all client requests before they reach your backend services.
Lesson 870What is an API Gateway?Lesson 872API Gateway vs Reverse ProxyLesson 1132Correlation IDs and Request TracingLesson 1585Upload Flow Architecture Overview
API Gateway + Observability
Combine a robust API gateway with distributed tracing (via OpenTelemetry) and centralized logging.
Lesson 869Alternatives to Full Service Mesh
API Gateway Cache
Frequently-requested queries (top 5%) cached before reaching your application servers.
Lesson 1771Query Caching Strategies
API Gateway/Load Balancer
Token bucket or sliding window algorithms reject excess requests before they hit your servers
Lesson 1596Upload Rate Limiting and Quotas
API gateways
and **reverse proxies** sit in front of backend services and route incoming requests, but they serve different architectural needs.
Lesson 872API Gateway vs Reverse ProxyLesson 1239Root Span and Entry Points
API Keys
are the simplest approach.
Lesson 1818Tenant Identification and Context
API management platform
beyond basic gateway features.
Lesson 898Apigee and Enterprise API Management
API monetization
you can create pricing tiers (free, basic, premium), enforce usage quotas, generate invoices, and integrate with payment systems.
Lesson 898Apigee and Enterprise API Management
API version
evolves independently from your **SDK version**.
Lesson 1909Client SDK Versioning and Distribution
Apigee
(now Google Cloud Apigee) positions itself as a full **API management platform** beyond basic gateway features.
Lesson 898Apigee and Enterprise API Management
App Engine
(standard and flexible)
Lesson 1244Google Cloud Trace
App servers
handle business logic: validating input, generating short keys, and coordinating between storage layers.
Lesson 1552Initial Architecture Diagram
Append a salt
Add a random string to the original URL before hashing again: `hash(original_url + random_salt)`.
Lesson 1509Handling Hash Collisions
Append-Only Logs
Streams never update or delete events—they only append new ones.
Lesson 692Streams vs Traditional DatabasesLesson 1737Index Building and Updates
Append-only workloads
play to the strengths of LSM trees (the write path we covered earlier).
Lesson 418Time-Series and Time-Ordered Data
AppendEntries RPC
the fundamental replication mechanism that the leader sends periodically to all followers.
Lesson 624AppendEntries RPC: Replication MechanismLesson 635Consul: Service Discovery with Raft Consensus
Application Cache Layer
Between your API gateway and trie cluster, maintain an in-memory cache (Redis, Memcached) of the hottest queries.
Lesson 1766Caching Suggestions at Multiple Layers
Application code is simpler
less defensive validation needed
Lesson 301Schema Enforcement and Type Safety
Application controls everything
Your code explicitly manages cache reads and writes
Lesson 131Cache-Aside (Lazy Loading) Pattern
Application Gateway
Layer 7 with WAF (Web Application Firewall) capabilities
Lesson 114Cloud Load Balancers (GCP and Azure)
Application impact
Heavy instrumentation slows down services
Lesson 1228Trace Sampling Fundamentals
Application Insights
for built-in telemetry and distributed tracing
Lesson 899Azure API Management Features
Application latency
Serializing span data and sending it synchronously blocks request threads.
Lesson 1259Network and Agent Overhead
application layer
your code explicitly routes the query, not some middleware or proxy.
Lesson 221Application-Level Connection ManagementLesson 1596Upload Rate Limiting and Quotas
Application logic
Are worker threads responsive?
Lesson 101Health Check Endpoints
Application logic burden
Your app needs special code to handle these moves
Lesson 263Shard Key Immutability Problem
Application reads data
→ Load balancer distributes read requests across Replicas
Lesson 199Primary-Replica Architecture
Application receives a request
with a key (e.
Lesson 242Directory-Based Sharding
Application rewrites
if query patterns differ significantly
Lesson 328Migration and Legacy System Constraints
Application writes data
→ Goes to the Primary database only
Lesson 199Primary-Replica Architecture
Application-Layer Validation Burden
Lesson 407Schema Flexibility Trade-offs
Application-level
Each microservice maintains its own connection pool configuration
Lesson 1071Connection Pool Bulkheads: Database and Service Isolation
Application-Level Connection Management
(lesson 221), your application code had to decide whether each query should go to the primary or a replica.
Lesson 222Proxy-Based Read-Write Splitting
Application-level coordination
Handle multi-step operations in application code with compensation logic
Lesson 261Distributed Transactions Across Shards
Application-Level Retry Logic
If a read-after-write fails (you write to primary, then read stale data from replica), your code detects this mismatch and retries the read against the primary.
Lesson 219Application-Level Consistency Patterns
Apply filters before sorting
Filters reduce the dataset; sorting operates on that reduced set.
Lesson 1896Combining Pagination, Filtering, and Sorting
Apply limits/pagination
after combining
Lesson 254Cross-Shard Queries and Joins
Appropriate log levels
Use `debug` for verbose loops, not `info` or `error`
Lesson 1167Avoid Log Explosion
Approximate counting
Use probabilistic data structures (HyperLogLog) when exact counts aren't critical
Lesson 977Algorithm Implementation PatternsLesson 1785Non-Functional Requirements: Accuracy vs Performance
Approximate limits
The actual global limit may be exceeded (up to N×local_limit in worst case)
Lesson 979Centralized vs Decentralized Approaches
Approximate limits acceptable
Use **Fixed Window Counter** (suffers from boundary issues but very efficient)
Lesson 975Algorithm Selection Criteria
Approximation
Apply less precise algorithms (gossip-based) for tenants exceeding thresholds
Lesson 1823Hot Tenant Problem
Architecture
The big-picture structure (like deciding if your city is a grid or has winding streets)
Lesson 1What Is System Design?
Archival becomes trivial
move `logs_2023_06` to cold storage without touching active shards.
Lesson 249Time-Based Sharding
Archiving analytics
before deletion (preserves click history)
Lesson 1532Expiration and Time-to-Live
Array-based representations
replace pointer-heavy node structures with packed arrays, improving cache locality.
Lesson 1776Typeahead Index Optimization
As you mature
You discover that "success rate" hides important nuances.
Lesson 1284Iterating on SLIs and SLOs
Ask recursive resolver
(ISP or public DNS like 8.
Lesson 1856DNS Resolution Fundamentals for Crawlers
Assign to healthy workers
The frontier redistributes them to active workers in the next pull cycle
Lesson 1866Worker Health Monitoring and Failover
Async
Queue thumbnail generation, image optimization, backup to storage — user doesn't wait
Lesson 654When to Use Async vs Sync
Async flush to disk
The OS eventually flushes data from page cache to physical disk in the background
Lesson 713Kafka's Write Path and Durability
Async Processing
For non-critical operations, accept the request immediately and process it asynchronously.
Lesson 1042Idempotency vs Performance Tradeoffs
Async reconciliation
Regions track locally, sync periodically (every 5-10 seconds) to redistribute quotas.
Lesson 987Multi-Region Rate Limiting Challenges
Asynchronous checking
Don't slow down legitimate URL creation—queue suspicious URLs for deeper analysis
Lesson 1540Spam and Malicious Link Detection
Asynchronous logging
Use buffered, non-blocking loggers that write in background threads, preventing I/O from blocking request handling.
Lesson 1133Logging Performance ImpactLesson 1134Synchronous vs Asynchronous LoggingLesson 1143Performance Impact of Structured Logging
Asynchronous processing
means you immediately acknowledge the upload, store the raw file, and queue processing tasks (like transcoding, thumbnail generation) for background workers to handle later.
Lesson 1598Synchronous vs Asynchronous ProcessingLesson 1698Message Queue for Decoupling
asynchronous replication
, when your application writes data to the primary database, the primary acknowledges the write as "successful" immediately—without waiting for replica databases to confirm they've copied the data.
Lesson 204Asynchronous Replication ExplainedLesson 205Semi-Synchronous ReplicationLesson 217Semi-Synchronous Replication Trade-offsLesson 1354Synchronous vs Asynchronous ReplicationLesson 1356Asynchronous Replication: Speed and RiskLesson 1364Choosing a Replication Mode
Asynchronous transmission
Never block request threads waiting for trace data to be sent.
Lesson 1259Network and Agent Overhead
At creation time
, store an `expires_at` timestamp alongside each URL.
Lesson 1532Expiration and Time-to-Live
at least one
replica to confirm the write before declaring success to the client.
Lesson 217Semi-Synchronous Replication Trade-offsLesson 1357Semi-Synchronous Replication
At Read Time
When a user requests their feed, fetch *both*:
Lesson 1655Celebrity Follower Caching
At rest
means data sitting in storage—on disk, tape, or cloud object storage.
Lesson 1409Backup Encryption and Security
At-least-once with durability
RabbitMQ, SQS, Azure Service Bus
Lesson 676Choosing Between Message Broker Technologies
at-most-once
delivery semantics: a message is delivered either zero or one time, never more.
Lesson 673NATS and Lightweight MessagingLesson 689Choosing Delivery SemanticsLesson 710Offsets and Commit Strategies
At-most-once (fast, lossy)
Redis pub/sub, basic NATS
Lesson 676Choosing Between Message Broker Technologies
Atomic broadcast
Updates either succeed everywhere or nowhere
Lesson 633ZooKeeper: Coordination Service Built on Consensus
Atomic increment
Use `INCR` to increment the counter for that key
Lesson 1794Redis-Based Rate Limiting with INCR
Atomic state update
Mark complete in same transaction as the work
Lesson 1037Idempotency in Distributed Workflows
Atomic updates
new version appears instantly, no race conditions
Lesson 165Versioned Cache Keys
Atomic Write
The snapshot is written to a temporary file, then atomically renamed to replace the old snapshot
Lesson 350Redis Persistence: RDB Snapshots
Attack mitigation
Slows down brute-force attacks, DDoS attempts, and web scrapers
Lesson 955What is Rate Limiting?
Attribute-based
Filter on message metadata (e.
Lesson 658Topic Subscriptions and Filtering
Audit trail
Compliance record of what couldn't be delivered
Lesson 1705Retry and Dead Letter Queues
Audit trails
Logs provide evidence of who did what and when — critical for security and compliance.
Lesson 1127What is Logging and Why It MattersLesson 1807In-Memory vs Persistent Storage for Rate Limiting
Auditing
Easy to see "who has admin access?
Lesson 933Role-Based Access Control (RBAC) Fundamentals
Auditing requires precision
regulatory compliance systems
Lesson 518PC/EC Systems: Consistency Always
Authenticate
the provider using signatures or tokens to prevent spoofing
Lesson 1693Delivery Receipt Tracking
Authenticated Requests
Client includes the token in request headers (usually `Authorization: Bearer <token>`)
Lesson 912Token-Based Authentication Fundamentals
Authentication & Authorization
Instead of every service validating JWT tokens or checking API keys, the gateway handles it once.
Lesson 876API Gateway as a Cross-Cutting Concern Hub
Authentication and authorization
before forwarding requests
Lesson 870What is an API Gateway?
Authentication Layer
Implement OAuth2 or JWT-based authentication to verify user identity.
Lesson 1578User Accounts and Paste Management
Author relationship
Posts from close friends vs distant connections
Lesson 1665Feed Ranking FundamentalsLesson 1666Ranking Signals and Features
Authorization request
Send `code_challenge` and `code_challenge_method=S256` with the auth request
Lesson 923PKCE: Proof Key for Code Exchange
Auto-ack
Message is acknowledged automatically upon delivery (risky—enables at-most-once delivery)
Lesson 681Acknowledgment Mechanisms
Auto-ack (pre-processing)
The broker considers the message delivered as soon as it sends it to the consumer
Lesson 683Consumer Acknowledgment Timing
Auto-commit
Kafka automatically saves your progress at regular intervals (e.
Lesson 710Offsets and Commit Strategies
Auto-generate interactive docs
(Swagger UI) where developers can make real API calls
Lesson 1885API Documentation with OpenAPI/Swagger
Auto-incrementing IDs
User IDs like `user_10001`, `user_10002`, `user_10003`.
Lesson 1474Hotspot Problems in Range PartitioningLesson 1515Short URL Predictability Tradeoffs
Auto-scaling integration
Automatically adjusts to your EC2 Auto Scaling groups
Lesson 113Cloud Load Balancers (AWS ELB/ALB)
Auto-scaling policies
take this further: based on metrics like CPU usage (>70%), request rate, or response latency, the infrastructure automatically provisions new redirect servers from a template.
Lesson 1536Horizontal Scaling of Redirect Servers
Automatable
could be scripted or tooled
Lesson 1311Toil: The Enemy of Scale
Automate integrity checks
immediately after each backup completes
Lesson 1430Backup Verification and Testing
Automate toil
repetitive manual work that doesn't provide lasting value
Lesson 1307What is Site Reliability Engineering (SRE)?
Automate verification
Run checksum validation after every backup job
Lesson 1408Backup Verification and Testing
Automated extraction
from metric systems using their metadata APIs
Lesson 1216Metric Documentation and Discovery
Automated failover
uses health checks and monitoring to detect failures and trigger the switch without human intervention.
Lesson 1437Failover and Failback Procedures
Automated rollback strategies
that consider downstream impacts
Lesson 810Deployment Complexity
Automated runbook
one-button recovery with monitoring built-in
Lesson 1441Runbooks and Automation
Automated testing gates
unit tests, integration tests, and end-to-end tests must pass before code proceeds
Lesson 1314Release Engineering and Safe Deployment
Automatic cleanup
If the lock holder crashes, its session expires and the lock is automatically released
Lesson 637Distributed Locks via Consensus
Automatic expiration
Use Time-to-Live (TTL) to automatically delete inactive sessions.
Lesson 356Redis as a Session Store
Automatic instrumentation
means frameworks detect and wrap common libraries (HTTP clients, database drivers, message queues) to create spans without code changes.
Lesson 1224Automatic vs Manual InstrumentationLesson 1240OpenTelemetry OverviewLesson 1244Google Cloud Trace
Automatic load balancing
The broker distributes messages across available consumers
Lesson 661Competing Consumers Pattern
Automatic protection
during unexpected load or degradation
Lesson 972Adaptive Rate Limiting
Automatic re-election
when the current leader fails
Lesson 636Consensus for Leader Election
Automatic replica distribution
No central coordinator needed to decide where replicas live
Lesson 1466Replication with Consistent Hashing
Automatic retries
with exponential backoff
Lesson 777Workflow Orchestration Patterns
Automatic retry logic
for temporary failures
Lesson 1686Email Notifications
Automatic Sharding
Unlike manual sharding we studied earlier, CockroachDB automatically distributes your data across nodes using range-based sharding.
Lesson 334CockroachDB and Distributed SQL
Automatic Update
DNS responses now return your DR site's IP address instead
Lesson 1440DNS and Traffic Management in DR
Automation-friendly
Configuration as code, CI/CD integration
Lesson 108Hardware vs Software Load Balancers
Autonomy
Frontend teams deploy client and BFF together, moving at their own pace
Lesson 906BFF Ownership and Team Structure
Availability Goals
Social feeds are often the primary engagement driver.
Lesson 1633Non-Functional Requirements: Scale and Performance
Availability improves
If one server goes down, others can still serve requests
Lesson 68What is Data Replication?
Availability isn't all-or-nothing
CAP treats availability as "every request gets a response," but real systems degrade gracefully.
Lesson 492Limitations of CAP as a Framework
Availability loss during partitions
System becomes unavailable in affected regions
Lesson 526The Cost of Strong Consistency
Availability SLO
tells you if your service responds at all—but not *how well*.
Lesson 1278Multiple SLOs for Comprehensive Coverage
Average payload size
(bytes per request)
Lesson 26Bandwidth Estimation from Data Size
Average record size
500 bytes per post (text + metadata)
Lesson 29Database Size Growth Projection
Average Size
How big is each item?
Lesson 25Storage Estimation Basics
AVIF
instead of JPEG (30–50% smaller)
Lesson 1621Compression and Format Optimization
Avoid Sequential Hotspots
Auto-incrementing IDs or timestamps cause all new writes to hit the latest partition.
Lesson 1472Range Partition Key Selection
Avoid user-specific identifiers
as span tags—use them as searchable attributes only when essential
Lesson 1258Cardinality Explosion
Avoiding false positives
A momentary network hiccup shouldn't mark a healthy server as dead
Lesson 100Health Check Intervals and Timeouts
Avoiding hotspots
If sharding by `timestamp` alone, recent data gets hammered.
Lesson 245Composite Shard Keys
AWS App Mesh
is Amazon's managed service mesh that works seamlessly with AWS services like ECS, EKS, EC2, and Fargate.
Lesson 864AWS App Mesh and Cloud-Native Meshes
AWS Integration
| Native (Lambda, S3, etc.
Lesson 728AWS Kinesis Overview
AWS SNS → SQS
, **Google Pub/Sub with subscriptions**, and **Kafka consumer groups**.
Lesson 663Hybrid Patterns: Topic + Queue
AWS X-Ray
, **Google Cloud Trace**, and commercial offerings like Datadog APM provide:
Lesson 1251Choosing a Tracing System
Azure Active Directory
for OAuth2/OpenID Connect authentication
Lesson 899Azure API Management Features
Azure Cosmos DB
offers bounded staleness as a consistency level, popular for scenarios needing "fresh enough" data
Lesson 549Bounded StalenessLesson 554Consistency Model Examples in Real Systems
Azure Front Door
Global Layer 7 with CDN integration
Lesson 114Cloud Load Balancers (GCP and Azure)
Azure Functions
as backends for serverless APIs
Lesson 899Azure API Management Features
Azure Load Balancer
Layer 4 only, regional or cross-zone
Lesson 114Cloud Load Balancers (GCP and Azure)
Azure Monitor
for unified logging and alerting
Lesson 899Azure API Management Features

B

B-tree indexes
(balanced trees) are the most common type.
Lesson 307Indexes and Query Optimization
Backend for Frontend (BFF)
, dramatically reduces client-side complexity and network round trips.
Lesson 873Request Routing and Aggregation
Backend Independence
Services can change without forcing client updates
Lesson 887API Composition and Aggregation
Backend policies
modify how APIM calls your services
Lesson 899Azure API Management Features
Backend service changes
frequently impacting multiple clients
Lesson 879When to Introduce an API Gateway
Backend services are protected
from unauthenticated requests entirely
Lesson 883Authentication at the Gateway
Backfilling
is reprocessing historical data after fixing bugs or adding new logic.
Lesson 777Workflow Orchestration Patterns
Background task APIs
that survive app suspension
Lesson 1618Optimizing for Mobile Networks
Background users
(app open but not viewing): trigger in-app badge updates
Lesson 1681Mobile Push Notification Integration
Backoff intervals
Exponential delays between retries (1s, 2s, 4s.
Lesson 684Negative Acknowledgments and Redelivery
Backpressure control
Consumers naturally regulate their consumption speed
Lesson 697Push vs Pull Consumption Models
Backup 1
Nightly full backups on network-attached storage (NAS)
Lesson 1407The 3-2-1 Backup Rule
Backup 2
Weekly snapshots uploaded to cloud object storage (different region)
Lesson 1407The 3-2-1 Backup Rule
Backup and disaster recovery
Replicas serve as live backups
Lesson 198What is Database Replication?
Backup challenges
Each shard needs its own backup strategy.
Lesson 264Operational Complexity of Sharded Systems
Backup monitoring
tracks the health of your backup processes continuously, while **alerting** notifies teams immediately when something goes wrong.
Lesson 1410Backup Monitoring and Alerting
Backup storage tiers
match different storage technologies to different access patterns, balancing speed and cost.
Lesson 1405Backup Storage TiersLesson 1429Geographic Backup Distribution
Backup window time
Copying 100 MB of changes beats copying 10 TB every night
Lesson 1403Incremental Backups
Backups
are point-in-time copies of data designed for **disaster recovery**.
Lesson 1401Backup vs Replication vs Snapshots
backward compatibility
means new versions still support old clients.
Lesson 809Versioning and Backward CompatibilityLesson 1898Why API Versioning Matters
Backward compatible
New consumers can read old messages (add optional fields)
Lesson 725Schema Registry and Evolution
Backward recovery
Use compensating transactions to undo completed steps
Lesson 585Alternative: Saga Pattern Introduction
Backward-compatible changes
APIs must evolve without breaking existing consumers
Lesson 791Independent Deployability
Backward-compatible versioning
to avoid breaking other services
Lesson 808Team Coordination Overhead
Backwards compatible
as your API evolves
Lesson 930OAuth2 Scopes and Consent
Bad (not idempotent)
`balance = balance + 100` (running twice adds $200!
Lesson 679At-Least-Once Delivery
Bad shard key
`created_date` — recent dates create hotspots as all new writes go to one shard
Lesson 232Shard Key Selection
BadgerDB/LevelDB
Embedded databases for small-scale or single-node deployments
Lesson 1245Trace Storage Backends
Balance needed
Configure connection timeout, idle timeout, and max lifetime appropriately for your workload patterns.
Lesson 275Common Pooling Anti-Patterns
Balanced approach
`N=3, R=2, W=2`
Lesson 558N, R, W Configuration Trade-offs
Bandwidth ceiling
Even a high-speed connection (1 Gbps) can only download so much per second.
Lesson 1862Why Distribute a Web Crawler
Bandwidth cost savings
Once cached at the edge, the same file serves millions of users without touching your origin.
Lesson 1609Why CDNs Are Essential for Media Hosting
bandwidth costs
when transferring pastes between regions or to CDN edge locations.
Lesson 1562Content Compression and EncodingLesson 1621Compression and Format Optimization
Bandwidth optimization
Regional peering agreements reduce transit costs
Lesson 1616Geographic Routing and DNS
Bandwidth vs Volume
Sending full, unsampled logs gives you complete visibility but can saturate network links.
Lesson 1159Log Aggregation Performance Considerations
Banking and payments
– transferring money requires exact balances, no partial updates
Lesson 318When to Choose ACID or BASE
Bare minimum
4 servers (risky!
Lesson 1333N+1 and N+2 Redundancy
BASE
offers a more relaxed approach designed for systems that prioritize availability and partition tolerance over immediate consistency.
Lesson 314BASE Properties Overview
Base58
Removes confusing characters like `0/O` and `l/I` = 58 characters
Lesson 1500URL Length and Encoding Constraints
Base62
`[a-zA-Z0-9]` = 62 characters (alphanumeric, case-sensitive)
Lesson 1500URL Length and Encoding Constraints
Base62 Encoding of IDs
converts sequential database IDs (like auto-increment values) into short strings using alphanumeric characters (a-z, A-Z, 0-9).
Lesson 1551Key Generation Strategy
Base62-encoded identifier
from the previous lesson.
Lesson 1519Database Schema for URL Shortener
Base64
Adds `+` and `/` = 64 characters (URL-unsafe without encoding)
Lesson 1500URL Length and Encoding Constraints
BashOperator
Runs shell commands
Lesson 767Airflow Operators and Executors
Basic metrics
timestamp, short URL clicked
Lesson 1530Analytics and Click Tracking
Basically Available
The system guarantees availability, even if some parts fail
Lesson 314BASE Properties Overview
Batch export job
30-second timeout (legitimately slow)
Lesson 1118Per-Operation Timeout Configuration
Batch notifications
Don't send 50 pushes for 50 new posts; aggregate: "15 new posts in your feed"
Lesson 1681Mobile Push Notification Integration
Batch operations
Update multiple counters in one Redis pipeline instead of separate calls
Lesson 977Algorithm Implementation Patterns
Batch processing maximizes throughput
by collecting data into large groups before processing.
Lesson 740Latency vs Throughput Tradeoffs
Batch queue
→ bundled digest sent every N minutes
Lesson 1677Selective Push Strategies
Batch Resolution
Workers collect batches of hostnames from their URL frontier and resolve them in parallel using async DNS libraries.
Lesson 1869Scaling DNS Resolution
Batch views
– Complete, accurate datasets processed by the batch layer
Lesson 750Lambda Architecture: Serving Layer
Batch-ack
Multiple messages acknowledged together for performance
Lesson 681Acknowledgment Mechanisms
Batching and buffering
Aggregate writes before applying them
Lesson 1483Celebrity User Problem
BC asks
"How does customer support continue serving clients while the data center is down?
Lesson 1433Disaster Recovery vs Business Continuity
Be consistent
across all endpoints
Lesson 960Rate Limit Response Codes
Bearer Token Transport
The client explicitly includes the token in the `Authorization` header (e.
Lesson 918Cookie vs Bearer Token Transport
Bearer tokens
in headers require **explicit JavaScript handling**, making them immune to CSRF.
Lesson 918Cookie vs Bearer Token Transport
Before accepting a write
, a replica checks: "Have I seen all the operations this write depends on?
Lesson 548Causal Consistency Implementation
Before indexing
, check if this hash already exists
Lesson 1852Content Fingerprinting with Hashing
Begin transaction
Mark the start of atomic operations
Lesson 310Atomicity: All-or-Nothing Transactions
Best effort
Single preferred channel
Lesson 1688Channel Selection Strategy
Better approach
Use Conway's Law intentionally.
Lesson 819Team Structure and Conway's Law
Better availability
than strong consistency during network partitions
Lesson 1397Bounded Staleness Consistency
Better balance
With 150+ virtual nodes per server, data distributes more evenly across the ring
Lesson 363Virtual Nodes and Load Distribution
Better bandwidth efficiency
Your servers send data once to the CDN, which distributes it thousands of times
Lesson 125CDN as Edge Caching Layer
Better cache hit rates
Origin Shield maintains a larger, warmer cache
Lesson 179Origin Shield: Protecting Origin Servers
Better distribution
A simple `user_id` hash might still produce hotspots if some users generate far more data than others.
Lesson 245Composite Shard Keys
Better for ephemeral/batch jobs
that finish quickly
Lesson 1197Pull vs Push Metrics Collection Models
Better geo-distribution
– Distance to replicas doesn't slow down writes
Lesson 1356Asynchronous Replication: Speed and Risk
Better monetization
You can offer tiered pricing that reflects actual infrastructure costs, not arbitrary request counts.
Lesson 992Cost-Based Rate Limiting
Better monitoring
Track saga progress through orchestrator state
Lesson 591Orchestration-Based Sagas
Better partition tolerance
The system stays operational during network splits
Lesson 560Eventual Consistency with Quorums
Better resource utilization
when systems are healthy
Lesson 972Adaptive Rate Limiting
Better scalability
No central bottleneck
Lesson 979Centralized vs Decentralized Approaches
Better storage (SSDs)
reduces disk I/O bottlenecks
Lesson 54Scaling Databases: Special Considerations
Better throughput
especially for write-heavy workloads
Lesson 136Write-Behind (Write-Back) Caching Pattern
Better tradeoff decisions
When you know the simple solution, you can justify why each added complexity is necessary
Lesson 34Start Simple: The Minimum Viable Design
Better user experience
(pages load quickly everywhere)
Lesson 53Geographic Distribution Benefits
Betweenness centrality
Measures how often a node sits on shortest paths between others—who's the essential bridge?
Lesson 468Graph Algorithms: PageRank and Centrality
BFS
for general-purpose crawlers that need fresh, diverse content across many domains (like search engines).
Lesson 1830Breadth-First vs Depth-First Crawling
BGP routing decides
Internet routers use Border Gateway Protocol (BGP) to determine the "shortest" path based on network hops
Lesson 181Anycast Routing for CDNs
Bidirectional Navigation
Your observability UI should let you:
Lesson 1249Integrating Traces with Logs and Metrics
Billing
Processes payments, generates invoices
Lesson 815Domain-Driven Design and Bounded Contexts
Billing accuracy
Track exactly how many requests each tenant made to charge appropriately
Lesson 1825Monitoring and Analytics Per Tenant
Binary blobs
let you store images, serialized objects, or encrypted data.
Lesson 341Data Types and Value Complexity
Binary Data
Raw bytes for storing files, encrypted data, or binary content
Lesson 390BSON Format and Data Types
Bitrate variants
Even at the same resolution, encode at different bitrates (e.
Lesson 1601Video Transcoding Fundamentals
Blackbox Monitoring
observes your system as an external user would.
Lesson 1266Blackbox vs Whitebox Monitoring
blameless culture
means treating system failures as organizational learning moments rather than individual mistakes deserving punishment.
Lesson 1317Blameless Culture and Learning from FailureLesson 1350What is a Postmortem?
Blazing fast reads
feeds are pre-built, just fetch and display
Lesson 1638Push (Write-Time) Feed Model
Blazing fast writes
no database latency in the critical path
Lesson 136Write-Behind (Write-Back) Caching Pattern
Blob Store
Use distributed object storage (S3, HDFS) to hold actual HTML/content, keyed by hash
Lesson 1870Content Storage and Deduplication
Block full scans
Reject requests without filters on massive tables
Lesson 1897Performance Considerations and Limits
Block or flag
Reject malicious URLs immediately or flag suspicious ones for review
Lesson 1540Spam and Malicious Link Detection
Block producers
Slow down the application (dangerous—can freeze critical paths)
Lesson 1155Log Buffering and Backpressure
Block reserved words
Reject codes like `api`, `admin`, `delete`, `stats` that conflict with your service's routes
Lesson 1514Custom Short URL Support
Block storage
works like a traditional hard drive attached to a server.
Lesson 1588Object Storage vs Block Storage
Block storage use cases
(rare for media):
Lesson 1588Object Storage vs Block Storage
Block the producer
(defeating the purpose of async communication)
Lesson 647Message Queue Basics
Block/mute graph
(separate storage or cached lookup)
Lesson 1653Selective Fanout Optimization
Blocked relationships
(either party blocked the other)
Lesson 1653Selective Fanout Optimization
Blocking
If the crawl delay hasn't elapsed, the queue remains blocked until the timer expires
Lesson 1845Back Queue: Politeness Enforcement
Blocking producers
creates a domino effect: if Service A blocks waiting for queue space, its own queues fill, forcing *its* callers to block.
Lesson 1080Queue Saturation and Backpressure Loss
bloom filter
is a space-efficient probabilistic data structure that answers: "Is this key *definitely not* in this SSTable?
Lesson 416Read Path and Bloom FiltersLesson 429Read Path and Bloom FiltersLesson 1853Bloom Filters for URL Seen Checking
Body modification
Convert JSON to XML, rename fields (`clientId` → `customer_id`), filter unnecessary data
Lesson 882Request and Response Transformation
Booking systems
– preventing double-bookings for flights, hotels, or appointments
Lesson 318When to Choose ACID or BASE
BookKeeper
(storage layer): Distributed log storage system called "bookies" that durably stores messages
Lesson 730Apache Pulsar Architecture
books
(blobs) sit on shelves (object storage), while the **library catalog** (metadata database) tells you what books exist, who checked them out, and where to find them.
Lesson 1590Metadata Database DesignLesson 1874What REST Means: Resource-Oriented Architecture
Boolean operators
to combine terms.
Lesson 1739Boolean Query Operators
Bottleneck
The central store becomes a single point of failure and a throughput limit.
Lesson 1793Centralized vs Distributed Rate Limiting
Bottleneck potential
The shared store becomes a performance constraint at scale
Lesson 979Centralized vs Decentralized Approaches
Bottlenecks
If 80% of traces show Service B taking longest (from **critical path analysis**), it's your slowest link.
Lesson 1229Service Dependency Graphs
Bounded convergence
provides a concrete promise: "All replicas will converge within X milliseconds after the last write.
Lesson 533Convergence Guarantees
bounded staleness
provides a middle ground between strong and eventual consistency by guaranteeing a maximum lag for replicas.
Lesson 549Bounded StalenessLesson 554Consistency Model Examples in Real Systems
Bracket notation
(more structured):
Lesson 1892Filtering Query Parameters
Breadth coverage
Ensures you sample widely across the web early
Lesson 1830Breadth-First vs Depth-First Crawling
Breadth-First Search (BFS)
crawls all pages at one "level" before moving deeper.
Lesson 1830Breadth-First vs Depth-First Crawling
Breaking changes
in backend services directly impact clients
Lesson 870What is an API Gateway?Lesson 1905Breaking vs Non-Breaking Changes
Broadcasts
the query to all relevant shards (often all of them)
Lesson 1769Horizontal Scaling of Search Infrastructure
broker
is a Kafka server that stores data and serves client requests.
Lesson 700Kafka Overview and Core ComponentsLesson 704Brokers and Cluster Architecture
Brokers
(compute layer): Stateless servers that handle client connections, message routing, and subscription management
Lesson 730Apache Pulsar Architecture
Browser-friendly
Works seamlessly in browsers, cURL, and documentation
Lesson 1899URI Versioning (Path-Based)
Browser/CDN Layer
Ultra-popular queries (top 0.
Lesson 1771Query Caching Strategies
Bucketing strategy matters
You define boundaries upfront (e.
Lesson 1185Histogram Metrics
Budget
Open-source trades engineer time for licensing costs.
Lesson 1251Choosing a Tracing System
Budget consistently underused
Your SLO might be too conservative; consider tightening it or investing saved engineering effort elsewhere
Lesson 1279Error Budgets: The Core Concept
Budget Constraints
Open-source solutions (Kong Community, Nginx) minimize licensing costs but increase operational overhead.
Lesson 901Choosing the Right API Gateway TechnologyLesson 1260Cost-Benefit Analysis
Budget depleted
Freeze risky changes, focus on reliability improvements
Lesson 1279Error Budgets: The Core Concept
Budget ratio
= retries / total requests
Lesson 1029Retry Budgets and Rate Limiting
Budget remaining
Deploy freely, take calculated risks
Lesson 1279Error Budgets: The Core Concept
Bugs surface immediately
during development, not in production
Lesson 301Schema Enforcement and Type Safety
Build
a new microservice that implements that feature
Lesson 822The Strangler Fig Pattern for Migration
Build confidence
in your fault tolerance mechanisms
Lesson 1343What is Chaos Engineering?
Build muscle memory
for your runbooks and playbooks without customer impact
Lesson 1345Starting with Game Days
Build verification
compile-time checks, linting, security scanning
Lesson 1314Release Engineering and Safe Deployment
Built-in atomic operations
`INCR`, `EXPIRE`, Lua scripts prevent race conditions
Lesson 1807In-Memory vs Persistent Storage for Rate Limiting
Built-in data structures
Store complex session data like shopping carts using Redis hashes or lists.
Lesson 356Redis as a Session Store
Built-in health checks
Automated monitoring of target health
Lesson 113Cloud Load Balancers (AWS ELB/ALB)
Built-in Observability
Envoy generates detailed metrics, logs, and distributed traces out-of-the-box.
Lesson 115Envoy Proxy Architecture
Built-in Proxy
A lightweight, native Go proxy that comes with Consul.
Lesson 863Consul Connect: HashiCorp's Approach
Bulk Operations
Enable users to delete multiple pastes, change privacy settings in batch, or export their content— all requiring authorization checks to prevent unauthorized access.
Lesson 1578User Accounts and Paste Management
Bulkhead Pattern
isolates different parts of your system into separate resource pools—just like the watertight compartments (bulkheads) in a ship's hull.
Lesson 1337Bulkhead Pattern for Fault Isolation
Bulkheads provide resource isolation
Even if one circuit breaker trips, the bulkhead ensures that failure is contained to a specific resource pool (thread pool, connection pool, or semaphore).
Lesson 1085Preventing Cascades with Circuit Breakers and Bulkheads
Burst allowance
You can spend $2,000 in one day, but it still counts against your monthly total
Lesson 994Quota Management and Burst AllowancesLesson 1824Tiered Rate Limiting
Burst allowances
Allow short spikes above the baseline rate
Lesson 885Rate Limiting and Throttling
Burst credits
Allow exceeding the steady-state rate temporarily if quota headroom exists
Lesson 994Quota Management and Burst Allowances
Burst scenarios
Send traffic that rapidly hits the limit to verify bucket/counter algorithms respond correctly
Lesson 997Testing and Monitoring Rate Limiters
Bursty traffic
Accept thousands of requests instantly, process them gradually over time
Lesson 650Temporal Decoupling
Bursty with quiet periods
**Token Bucket**
Lesson 975Algorithm Selection Criteria
Business analytics
track API usage patterns: which clients call which endpoints most, geographic distribution, peak usage times, and feature adoption rates.
Lesson 890Logging and Metrics Collection
Business Continuity
is the broader organizational strategy for keeping critical business operations running during *and* after any disruption—including non-technical issues like pandemics, supply chain failures, or key personnel losses.
Lesson 1433Disaster Recovery vs Business Continuity
Business impact
Shorter windows mean faster recovery attempts but more probing traffic during actual outages.
Lesson 1059Timeout Windows and Reset Logic
Business Impact Analysis (BIA)
is the structured process of identifying what each service's downtime and data loss actually *cost* your organization, so you can set appropriate RPO/RTO targets and justify the investment needed to meet them.
Lesson 1420Business Impact Analysis for RPO/RTO
Business impact drives urgency
A P0 for a payments service during Black Friday demands instant all-hands response.
Lesson 1298Incident Severity Levels and Escalation
Business Insights
Beyond technical health, monitoring can track business metrics like transaction volumes or user sign-ups.
Lesson 1262What is Monitoring and Why It Matters
Business priorities
Admin edits override user edits
Lesson 1383Application-Level Conflict Resolution
Business priority rules
VIP customers, loyalty tiers, regulatory requirements
Lesson 1387Custom Merge Functions
Business rules
Always replicate content from premium users or verified creators
Lesson 1631Multi-Region Replication Strategy
Business units
Separate data by department, brand, or subsidiary
Lesson 1452List-Based Partitioning
Business-critical events
User sign-ups, purchases, authentication successes/failures, permission changes.
Lesson 1129What to Log vs What Not to Log
Business-Focused Boundaries
Services map to business domains (inventory, shipping, recommendations), not technical layers.
Lesson 781What are Microservices?
Byzantine failures
are the nightmare scenario.
Lesson 602Crash-Stop vs Byzantine Failures

C

Cache control issues
You can't force clients to refresh cached DNS records
Lesson 116DNS-Based Load Balancing
Cache extension
Serve stale data temporarily
Lesson 1303Incident Mitigation vs Fix
Cache frequent queries
Expensive aggregations or reports should be pre-computed or cached
Lesson 1897Performance Considerations and Limits
Cache Hit Rates
measure how often results come from cache versus requiring expensive index lookups.
Lesson 1777Query Performance Monitoring
Cache invalidation complexity
You must decide when to remove or update cached entries.
Lesson 132Cache-Aside: Pros and Cons
Cache Invalidation Problem
you learned about earlier by providing a simple, time-based invalidation strategy.
Lesson 156Time-Based Expiration (TTL)
Cache is optional
The database is the source of truth; cache failures don't break your system
Lesson 131Cache-Aside (Lazy Loading) Pattern
Cache Key
Typically the paste ID (`cdn.
Lesson 1569CDN Integration for Paste Delivery
Cache key structure
Include user ID, action, and resource.
Lesson 951Caching Authorization Decisions
Cache metadata
Store it in your database to avoid re-fetching
Lesson 1538Link Preview and Metadata
Cache Miss Handling
If no robots.
Lesson 1861Robots.txt Caching and Parsing
Cache operations
Record cache hits, misses, or invalidations
Lesson 1234Span Events and Logs
Cache reads
20ms timeout (should be fast)
Lesson 1118Per-Operation Timeout Configuration
Cache results
Store threat verdicts to avoid repeated API calls for the same domain
Lesson 1540Spam and Malicious Link Detection
Cache subscribers receive event
across different services or regions
Lesson 158Event-Based Invalidation
Cache tagging
means attaching labels (tags) to cache entries when you store them.
Lesson 164Cache Tagging and Grouping
Cache tags/keys
Use consistent naming so one invalidation command affects all layers
Lesson 163Multi-Level Cache Invalidation
Cache warming
solves this problem by stocking your kitchen *before* opening the doors.
Lesson 140Cache Warming StrategiesLesson 161Cache Warming StrategiesLesson 1611Multi-Tier Caching Architecture
Cache-aside
Application manages cache population
Lesson 133Read-Through Caching Pattern
cache-control headers
from backend responses to guide caching decisions.
Lesson 888Caching at the GatewayLesson 1569CDN Integration for Paste Delivery
Cached inheritance maps
Pre-compute effective permissions for performance
Lesson 939Permission Inheritance and Hierarchies
Cached Responses
Return stale but acceptable data from a cache.
Lesson 1061Fallback Strategies
Caching aggressively
Cache celebrity data at multiple levels (CDN, application cache) with longer TTLs.
Lesson 257Celebrity Problem in Social Graphs
Caching is critical
With 99%+ operations being reads, aggressive caching at multiple levels (CDN, application cache, database query cache) becomes non-negotiable for performance and cost.
Lesson 1636Capacity Estimation: Feed Reads vs Writes
Caching layers
Place celebrity data in dedicated cache clusters
Lesson 1483Celebrity User Problem
Caching Strategy
Expired links must be evicted from cache (lesson 1502) to prevent serving dead redirects.
Lesson 1504Link Expiration and Retention Policies
Caching-friendly
Different versions can have independent cache policies
Lesson 1899URI Versioning (Path-Based)
Calculate
remaining time budget after processing
Lesson 1113Cross-Protocol Deadline Handling
Calculate percentiles
from this distribution (P50, P95, P99)
Lesson 1117Adaptive Timeouts Based on Historical Latency
Calculate remaining budget
after accounting for time already spent
Lesson 1098Per-Hop Timeout Budgets
Calculate split point
– find the midpoint key in the partition's range
Lesson 1475Dynamic Range Splitting
Calculate the average load
across all nodes
Lesson 1468Bounded Loads Extension
Calculate the hash
of the incoming file's content
Lesson 1622Deduplication Strategies
Calculates remaining time
Downstream services see how much budget is left
Lesson 1104gRPC Timeout Propagation
Campaign ID
Which marketing campaign or notification type
Lesson 1726Aggregation and Reporting
Camunda
provides a visual workflow designer and execution engine, popular in enterprise settings.
Lesson 598Saga Frameworks and Real-World Adoption
CAN-SPAM
(US) require verifiable records showing when users opted out, what they opted out of, and that you stopped messaging them accordingly.
Lesson 1728Opt-Out and Compliance Tracking
Cancel downstream requests
if they're still pending
Lesson 1115Deadline Exceeded Error Handling
Cancel the stale request
for "ama"—its results are obsolete.
Lesson 1763Debouncing and Request Optimization
Cancels requests
proactively when budget expires
Lesson 1101Timeout Propagation in Service Meshes
Cannot aggregate
percentiles across multiple instances (you can't average p99s meaningfully)
Lesson 1186Summary Metrics
Cannot aggregate across services
(you can't average p95s meaningfully)
Lesson 1177Summary Metrics
Cannot be trusted
because clients can be modified, bypassed, or malicious
Lesson 1789Client-Side vs Server-Side Rate Limiting
Cannot revoke mid-lifetime
Valid token stays valid until expiration
Lesson 916Session vs Token Tradeoffs
CAP Availability
= Both branches stay open and answer questions using whatever information they have locally, even if it's outdated
Lesson 485Availability in CAP Context
CAP Theorem
(also called Brewer's Theorem) states that any distributed database system can simultaneously guarantee **at most two** of these three properties:
Lesson 481What CAP Theorem StatesLesson 484Consistency in CAP Context
Capacity
Full utilization of all resources
Lesson 1332Active-Active vs Active-Passive Redundancy
Capacity model
| Pre-provisioned per shard | Broker disk/network based |
Lesson 728AWS Kinesis Overview
Capacity Planning
Understanding resource utilization trends helps you scale proactively, not reactively during an outage.
Lesson 1262What is Monitoring and Why It MattersLesson 1323Mean Time Between Failures (MTBF)Lesson 1825Monitoring and Analytics Per Tenant
Captures the timeout context
at the edge proxy
Lesson 1101Timeout Propagation in Service Meshes
Capturing slow requests
that exceed latency thresholds
Lesson 1254Tail-Based Sampling
Carbon
The listener daemon that receives metrics over the network (typically via plaintext protocol or pickle)
Lesson 1202Graphite Time-Series Database
Cardinality limits
Intentionally push high-cardinality metrics to ensure your system rejects or samples them appropriately—preventing production label explosions.
Lesson 1218Testing Metric Pipelines
Careful shard key selection
Design so related data lives on the same shard (like keeping all data for a user together)
Lesson 261Distributed Transactions Across Shards
Carrier filtering
Spam filters may block messages with certain keywords or patterns
Lesson 1685SMS Notifications
Cart abandonment rate
UX and performance quality signals
Lesson 1196Business vs Technical Metrics
Cart updates are low-stakes
If two data centers briefly disagree about your cart contents, no money has changed hands yet
Lesson 498Shopping Cart Systems (AP)
Cascade options
You can configure what happens on delete—cascade the delete to child records, set foreign keys to NULL, or reject the operation
Lesson 300Foreign Keys and Referential Integrity
Cascading delays
Slow downstream services block upstream callers, even when the top-level request already timed out
Lesson 1096Why Timeouts Must Propagate
Cascading effects
Too short, and you risk repeatedly hammering a struggling service.
Lesson 1059Timeout Windows and Reset LogicLesson 1125Timeout Testing and Chaos Engineering
Cascading failures
Service A's error might be caused by Service B, which is actually failing because of Service C
Lesson 807Debugging and TroubleshootingLesson 1043What Is a Circuit Breaker?
Cascading invalidation
Application triggers invalidation at each layer sequentially
Lesson 163Multi-Level Cache Invalidation
Case-insensitive
means they're treated identically.
Lesson 1518Case Sensitivity Considerations
Cassandra, Elasticsearch, Kafka
(as a buffer), and in-memory stores for testing.
Lesson 1241Jaeger Architecture and Components
Catch up to real-time
, then switch traffic to the new output
Lesson 754Event Log Replay in Kappa
Catch-up
Other replicas begin replicating from the newly promoted primary
Lesson 207Replica Promotion and Failover Basics
Category diversity
Include URLs spanning different topics, languages, and regions.
Lesson 1828Seed URLs and Starting Point
Caution mode
Increase code review scrutiny
Lesson 1281Error Budget Policies
CDN checks its cache
at the nearest edge location
Lesson 1569CDN Integration for Paste Delivery
CDN distribution
Push celebrity content to edge locations
Lesson 1483Celebrity User Problem
CDN Edge Cache
Deploy regional edge servers that cache popular prefix responses.
Lesson 1766Caching Suggestions at Multiple Layers
CDN edge locations
using the push model you studied earlier, minimizing propagation delay.
Lesson 1630Live Streaming Architecture
CDN offloading
Popular links can be cached at edge, dramatically reducing origin bandwidth
Lesson 1499Bandwidth Requirements for Redirects
CDN provider
(Cloudflare, AWS CloudFront, Fastly have different rates)
Lesson 30CDN Bandwidth and Cost Estimation
CDN-friendly
Static image assets can be edge-cached globally
Lesson 1539QR Code Generation
Celebrities (> threshold)
Use **fanout-on-read**.
Lesson 1648Hybrid Fanout Strategy
Celebrity engagement
(how often they post) also matters.
Lesson 1658Fanout Strategy Selection Criteria
Celebrity Post Cache
When a celebrity posts, store it in a dedicated, shared cache keyed by the celebrity's ID
Lesson 1655Celebrity Follower Caching
Celebrity post latency
Track high-follower accounts separately—they're outliers
Lesson 1657Measuring Fanout Performance
Celebrity posts
Often 1-5 seconds to reach millions (high priority)
Lesson 1671Real-Time Requirements for Social Feeds
Celebrity User Problem
when one key generates disproportionate traffic that a single partition cannot handle, creating a **hot spot** that degrades performance for everyone hitting that partition.
Lesson 1483Celebrity User Problem
Celebrity/influencer effect
creates hotspots when specific entities generate massive traffic.
Lesson 234Data Distribution and Hotspots
Centrality measures
identify key players:
Lesson 468Graph Algorithms: PageRank and Centrality
Centralization
Update "order shipped" message once, affects all future notifications instantly—no code deploys needed.
Lesson 1701Template Service for Content
Centralized Aggregation
uses log shipping tools (Fluentd, Logstash) or streaming platforms (Kafka) to funnel events to a single data store—often a time-series database or specialized SIEM (Security Information and Event Management) system.
Lesson 954Distributed Auth Audit Logging
Centralized control
The monitoring system controls scrape intervals and targets
Lesson 1197Pull vs Push Metrics Collection Models
Centralized logging
means all services send their logs to a single, unified system where you can search, filter, and analyze them together.
Lesson 1148Centralized Logging ArchitectureLesson 1169Centralized vs Localized Logging
Centralized logic
Instead of every microservice implementing routing rules, one proxy handles it for all connections.
Lesson 222Proxy-Based Read-Write SplittingLesson 591Orchestration-Based Sagas
Centralized management
Change a role's permissions once, affects all users with that role
Lesson 933Role-Based Access Control (RBAC) Fundamentals
Centralized policies
Apply rate limiting, authentication differently per version
Lesson 1907Gateway-Level Version Routing
Centralized Policy Definition
Set timeout rules in YAML or configuration APIs once, apply everywhere.
Lesson 1126Timeout Configuration in Service Mesh
Centralized Politeness Service
Workers consult a shared politeness table (Redis or similar) before crawling.
Lesson 1868Coordinating Politeness Across Workers
Centralized State
All servers query the same counters
Lesson 980Redis-Based Distributed Rate Limiting
Centralized validation
Route token checks through a single service with one authoritative clock.
Lesson 949Clock Skew and Token Validation
Certificate Management
You only need to install and renew SSL certificates on the load balancer, not across dozens or hundreds of backend servers.
Lesson 118SSL/TLS Termination at Load BalancersLesson 861Istio: Architecture and Components
Certificate management simplicity
Instead of distributing SSL certificates to dozens of microservices, you manage them in one place —the gateway.
Lesson 891SSL/TLS Termination
Chain propagation
→ Each node applies update, forwards downstream
Lesson 1373Chain Replication
Chain vulnerability
one corrupted backup breaks the chain
Lesson 1422Incremental Backup Strategy
Challenge at scale
Comparing every new fingerprint against billions of stored ones is impractical.
Lesson 1855Near-Duplicate Detection with Simhash
Change Data Capture
monitors databases for INSERT, UPDATE, and DELETE operations, turning these changes into events that flow through streaming pipelines.
Lesson 776Change Data Capture Tools
Change Data Capture (CDC)
Maintain the current state of database rows
Lesson 712Log Compaction
Changelog
highlighting breaking changes
Lesson 1909Client SDK Versioning and Distribution
Changelog Semantics
Kafka can mirror database changes.
Lesson 720Log Compaction
Changing data types
Switching `age` from integer to string breaks parsing logic
Lesson 1905Breaking vs Non-Breaking Changes
Channel
Email, SMS, push, in-app
Lesson 1726Aggregation and Reporting
Channel availability
Is the push notification token valid?
Lesson 1703Channel Routing Logic
Channel opt-ins
email enabled, SMS disabled, push enabled
Lesson 1702User Preferences Lookup
Channel Provider Abstraction
defines a common contract that all vendors must implement:
Lesson 1690Channel Provider AbstractionLesson 1695Fallback and Retry Logic
Channel Selection Strategy
(lesson 1688):
Lesson 1695Fallback and Retry Logic
Channel-specific formats
(HTML for email, plain text for SMS)
Lesson 1701Template Service for Content
Channel-specific workers
that handle the actual sending:
Lesson 1696Notification System High-Level Architecture
Chaos Monkey
randomly terminates virtual machine instances in production.
Lesson 1348Chaos Engineering Tools
Character limits
160 characters for standard GSM; Unicode reduces to 70
Lesson 1685SMS Notifications
Character Normalization
Store multiple normalized forms of each query.
Lesson 1768Typeahead for Multi-Language Support
Character set size
How many distinct characters can you use?
Lesson 1500URL Length and Encoding Constraints
Chargebacks and refunds
burden for the business
Lesson 1002The Double-Charge Problem
Chatty inter-service calls
Service A calling services B, C, and D synchronously to complete a single user request creates tight coupling.
Lesson 824Avoiding Distributed Monoliths
Check availability
Query your database to see if `ceo-blog` already exists
Lesson 1514Custom Short URL SupportLesson 1531Custom Aliases and Vanity URLs
Check before executing
Each step first queries: "Did I already finish?
Lesson 1037Idempotency in Distributed Workflows
Check constraints
Does this value meet custom business rules (e.
Lesson 305Consistency Guarantees
Check existence
in the database with a quick lookup
Lesson 1512Random String Generation
Check limit
If the returned value exceeds your threshold, reject the request
Lesson 1794Redis-Based Rate Limiting with INCR
Check locally first
If the counter exists and is within the current time window, increment it in-memory (nanoseconds, not milliseconds)
Lesson 1801Local Caching for Performance
Check memtable
if found, return immediately
Lesson 416Read Path and Bloom Filters
Check positions
For each matching doc, verify "learning" appears exactly one position after "machine"
Lesson 1751Phrase Queries and Positional Indexes
Check robots.txt regularly
Sites update their rules; cache the file but refresh it periodically (every 24 hours is common).
Lesson 1831Robots.txt and Crawl Etiquette
Check SSTables on disk
immutable files stored in GFS, potentially many of them
Lesson 449Read Path and Compaction
Check the cache first
When your app needs data, it looks in the cache
Lesson 131Cache-Aside (Lazy Loading) Pattern
Check the current time
using your local clock
Lesson 1110Calculating Remaining Time
Check the local cache
for this ID
Lesson 1714Client-Side Deduplication
Check the MemStore first
– Since recent writes live in memory, HBase looks here before touching disk
Lesson 437HBase Read Path and Bloom Filters
Check the MemTable
the in-memory structure holding recent writes
Lesson 449Read Path and Compaction
Check the memtable first
(in-memory write buffer) — fastest lookup
Lesson 429Read Path and Bloom Filters
Check the stored progress
Load the workflow state
Lesson 1016Idempotency for Multi-Step Operations
Checkout timeout
(or *connection wait timeout*): How long a thread will wait to acquire a connection from the pool before giving up.
Lesson 272Connection Timeouts and Limits
Checkpoint after each step
Store completion markers (e.
Lesson 1037Idempotency in Distributed Workflows
Checkpoint the frontier periodically
to durable storage.
Lesson 1849URL Frontier Persistence and Recovery
Checks
if this event ID has been processed before
Lesson 1035Idempotency in Event Processing
Checksum validation
Comparing cryptographic hashes of backed-up data against originals
Lesson 1408Backup Verification and Testing
Choose Availability (AP)
Accept writes on both sides of the partition.
Lesson 483The CAP Tradeoff During Partitions
Choose Consistency (CP)
Reject write requests until the partition heals.
Lesson 483The CAP Tradeoff During Partitions
Choose counters
when you need maximum URL brevity, can handle centralized ID coordination, and predictability isn't a security concern (most URL shorteners).
Lesson 1516Counter-Based vs UUID Approaches
Choose UUIDs
when you need truly distributed generation without coordination, security through obscurity matters, or you're operating at extreme scale where multiple data centers must generate IDs independently.
Lesson 1516Counter-Based vs UUID Approaches
Choose wide-column stores when
Lesson 419Wide-Column vs Document Stores
Choreography
distributes logic across services—each listens for events and decides what to do next.
Lesson 592Choreography vs Orchestration Tradeoffs
Chronological feeds
display posts in time order (newest first).
Lesson 1644Feed Personalization and Ranking Requirements
Chubby
, a distributed lock service similar to ZooKeeper, for critical coordination tasks: discovering tablet servers, storing schema information, and managing master election.
Lesson 439Google BigTable Architecture
Circuit breaker integration
Does a slow dependency correctly trip the breaker before exhausting resources?
Lesson 1125Timeout Testing and Chaos Engineering
Circuit breaker per dependency
Payments failing won't flood the system with doomed retries
Lesson 1085Preventing Cascades with Circuit Breakers and Bulkheads
Circuit Breaking
When a downstream service becomes unhealthy, the proxy can "trip" a circuit breaker— temporarily stopping requests to that service to prevent cascading failures, similar to an electrical circuit breaker protecting your home.
Lesson 839Data Plane: Proxy ResponsibilitiesLesson 840Data Plane: Envoy Proxy FundamentalsLesson 877The API Gateway Bottleneck RiskLesson 1167Avoid Log Explosion
Circular transactions
Money moving A → B→ C→ A to legitimize stolen funds
Lesson 474Fraud Detection Through Pattern Matching
Classification models
detect categories like violence, adult content, hate symbols
Lesson 1629Content Moderation at Scale
Clean separation
Backend services focus on business logic, not version negotiation
Lesson 1907Gateway-Level Version Routing
Cleaner URLs
Resource paths stay stable; `/users/123` is always the same user
Lesson 1902Content Negotiation with Media Types
Cleanup
The hints are deleted from the temporary nodes
Lesson 1372Sloppy Quorums and Hinted Handoff
Cleanup after expiration
old keys can be purged to save space
Lesson 1004Server-Side State for Idempotency
Cleanup job effectiveness
Instrument your scheduled deletion tasks (from lesson 1568) to emit metrics like `pastes_deleted_per_run`, `cleanup_duration_ms`, and `failed_deletion_count`.
Lesson 1574Monitoring Expiration and Storage Health
Cleanup job failures
Job didn't run, crashed, or deleted zero records when expired data exists
Lesson 1574Monitoring Expiration and Storage Health
Clear accountability
The team that needs the feature builds it
Lesson 906BFF Ownership and Team Structure
Clear boundaries
Each team's communication pattern becomes a service boundary
Lesson 788Organizational Alignment: Conway's Law
Clear consistency model
Leader always has the latest data
Lesson 71Single-Leader Replication Model
Clear interfaces
Simple APIs with obvious behaviors reduce integration errors
Lesson 1315Simplicity as a Core Value
Clear popularity hierarchies
Some items are consistently more popular
Lesson 147Least Frequently Used (LFU)
Clear service boundaries
(well-defined APIs)
Lesson 794Team Autonomy and Ownership
Clearer Dependencies
In a monolith, modules can silently depend on each other in tangled ways.
Lesson 797Improved Code Maintainability
Click count
The simplest metric—increment a counter per redirect.
Lesson 1505Analytics and Tracking Requirements
Clicked
The user took action on a link or button within the notification (e.
Lesson 1724Notification Analytics Events
ClickHouse
Columnar database offering excellent compression and fast analytical queries at lower cost than Elasticsearch
Lesson 1245Trace Storage Backends
Client → Gateway (HTTP)
→ Gateway translates → **Gateway → Service (gRPC)**
Lesson 874Protocol Translation
Client automatically includes cookie
in every subsequent request
Lesson 909Session-Based Authentication Fundamentals
Client creates code challenge
Hash the verifier with SHA256, then base64url-encode it
Lesson 923PKCE: Proof Key for Code Exchange
Client Credentials Flow
solves this by allowing a service to authenticate using its own identity.
Lesson 925Client Credentials Flow
Client diversity is high
Your iOS, Android, and web apps have fundamentally different data needs, screen sizes, or performance constraints.
Lesson 908When to Use BFF Pattern
Client errors
(400, 401, 404) → ignore, these aren't service health issues
Lesson 1048Failure Thresholds and Detection
Client initiates
Opens WebSocket, sends subscription request
Lesson 1915GraphQL Subscriptions for Real-Time Data
Client needs are similar
If all your clients consume roughly the same data and services, a shared gateway is simpler.
Lesson 908When to Use BFF Pattern
Client optimization
Each BFF tailors responses perfectly for its client (mobile gets compact JSON, web gets richer data)
Lesson 904BFF vs Single Gateway Tradeoffs
Client overwhelm
Even if the response arrives, the client's browser or application must parse and render massive datasets, freezing the UI and consuming device resources unnecessarily.
Lesson 1887Why Pagination Is Essential at Scale
Client retry windows
Most well-behaved clients retry failed requests within seconds to minutes, not days
Lesson 1012Idempotency Key Expiration Strategy
Client sends a write
→ Goes only to the leader
Lesson 71Single-Leader Replication Model
Client SLAs
If you promise clients they can safely retry for 48 hours, honor that
Lesson 1012Idempotency Key Expiration Strategy
Client Storage
Client stores the token (typically in memory, localStorage, or a cookie)
Lesson 912Token-Based Authentication Fundamentals
client-side
(browser downloads data, filters locally), **server-side** (every keystroke triggers a backend query), or use a **hybrid approach**.
Lesson 1762Client-Side vs Server-Side TypeaheadLesson 1789Client-Side vs Server-Side Rate Limiting
Client-side rate limiting
during recovery detection
Lesson 1081Thundering Herd After Recovery
Client-side rendering
delivers raw code to browsers, which apply highlighting via JavaScript libraries (`highlight.
Lesson 1575Syntax Highlighting and Language Detection
Client-side timeout
The maximum time a client will wait for a response before giving up
Lesson 1090Client-Side vs Server-Side Timeouts
Client-side timeouts
protect the caller from waiting indefinitely—they're about resource management and user experience.
Lesson 1123Client-Side vs Server-Side Timeout Enforcement
Client-side timestamps
The client remembers the timestamp of its last write and includes it in read requests, ensuring it only reads from sufficiently up-to-date replicas.
Lesson 1359Read-Your-Writes Consistency with Replicas
Client-side tracking
The client remembers the last-seen transaction ID or timestamp.
Lesson 1360Monotonic Reads Across Replicas
Client-to-Server (External)
Between end users and your application servers
Lesson 78Load Balancer Placement in Architecture
Client/Gateway
2s total budget
Lesson 1097The Timeout Chain Problem
Clients send reads
→ Can go to leader OR any follower
Lesson 71Single-Leader Replication Model
clock drift
, and **concurrent load**—conditions where subtle bugs hide.
Lesson 988Testing Distributed Rate LimitersLesson 1381Limitations of Last-Write-Wins
Clock skew tolerance (leeway)
Accept tokens within a grace period (e.
Lesson 949Clock Skew and Token Validation
Closed → Open
Send enough failures to breach the threshold, then verify the breaker opens and fast-fails subsequent requests
Lesson 1065Testing Circuit Breaker Behavior
Closed circuit
Retries execute normally for transient errors
Lesson 1030Combining Retries with Circuit Breakers
Closed to Open
when failures exceed your configured threshold within a time window.
Lesson 1050State Transition Mechanics
Closeness centrality
How quickly can someone reach everyone else?
Lesson 468Graph Algorithms: PageRank and Centrality
Cloud Monitoring
(formerly Stackdriver, GCP) are fully managed metrics services that automatically collect, store, and visualize metrics from your cloud resources.
Lesson 1204Cloud-Native Metrics: CloudWatch and Stackdriver
Cloud-native
Use managed services (SQS/SNS, GCP Pub/Sub) for lower operational burden
Lesson 676Choosing Between Message Broker Technologies
Cloud-Native BigTable
itself evolved into Google Cloud Bigtable, the managed service version.
Lesson 450BigTable's Influence on Modern Systems
Cloud-native options
(CloudWatch, Stackdriver) scale elastically but at higher cost.
Lesson 1208Choosing a Metrics System for Your Scale
CloudWatch
(AWS) and **Cloud Monitoring** (formerly Stackdriver, GCP) are fully managed metrics services that automatically collect, store, and visualize metrics from your cloud resources.
Lesson 1204Cloud-Native Metrics: CloudWatch and Stackdriver
Clustered or covering
(stores URL directly in index)
Lesson 1521Indexing Strategy for Fast Lookups
Clustering
connects multiple RabbitMQ nodes together so they act as one logical broker, sharing metadata and providing redundancy.
Lesson 668RabbitMQ Clustering and High Availability
CNAME
Alias pointing to another domain (adds extra lookup!
Lesson 1856DNS Resolution Fundamentals for Crawlers
Coalescing
Multiple identical pending requests merge into one, reducing redundant work.
Lesson 1914DataLoader and Batching Solutions
Coarse-grained
Works at the connection level, not the request level
Lesson 116DNS-Based Load Balancing
Coarse-grained authorization
makes broad decisions at high levels (e.
Lesson 940Coarse-Grained vs Fine-Grained Authorization
CockroachDB
, 2PC coordinates the commit behind the scenes.
Lesson 576When 2PC is Used in Practice
Code Exchange
The authorization server sends back a temporary **authorization code** to your app's redirect URL.
Lesson 922Authorization Code Flow
Code maintainability
Consistent patterns reduce mental overhead
Lesson 1877Singular vs Plural Resource Names
Code pollution
Libraries clutter application code; service meshes keep business logic clean
Lesson 830Service Mesh vs Library-Based Solutions
Code reviews
Check logging statements for exposed secrets
Lesson 1163Avoid Logging Sensitive Data
Coding
is like being the carpenter who hammers nails, installs drywall, and connects the plumbing pipes.
Lesson 3System Design vs Coding
Cognitive Load Reduction
A developer can hold the entire service's logic in their head.
Lesson 797Improved Code Maintainability
Cold data
is older timeline content that users rarely view.
Lesson 1663Hot and Cold Timeline Data
cold standby
is a backup that exists only as stored data—backups, snapshots, or archived configurations.
Lesson 1417Hot Standby vs Cold StandbyLesson 1443DR Cost Optimization
Cold tier (91+ days)
Object storage like S3 Glacier.
Lesson 1165Log Retention Policies
Cold/archived (30-90 days)
Keep only trace metadata and aggregated statistics for compliance or trend analysis.
Lesson 1246Trace Data Retention Policies
Collaborative editing
→ Causal consistency (preserve cause-effect relationships)
Lesson 553Choosing Consistency Levels
Collaborative filtering
"Users who bought X also bought Y" traverses purchase edges
Lesson 457Use Cases: Social Networks and Recommendations
Collect
all words in the subtree below 'e'
Lesson 1758Trie Data Structure for Prefix Matching
Collect training data
Log queries, returned results, their positions, and which ones users clicked
Lesson 1781Machine Learning for Ranking
Collection scans
Without the right index, queries must examine every document.
Lesson 408Query Performance Limitations
Collection validation
Emit known metric values from test services and verify they appear in your metrics backend (Prometheus, InfluxDB, etc.
Lesson 1218Testing Metric Pipelines
Collector
receives span data from instrumented applications via HTTP, Kafka, or RabbitMQ
Lesson 1242Zipkin Architecture and Design
Collision risk
Truncation increases collision probability (two different URLs producing the same short code).
Lesson 1508Hash-Based Generation Approach
Collision-free
Practically guaranteed uniqueness across all nodes
Lesson 1520Primary Key Selection: Auto-Increment vs UUID
Collision-resistant
Virtually impossible for two different pages to produce the same hash
Lesson 1852Content Fingerprinting with Hashing
Column keys
identify attributes (like "name" or "email")
Lesson 444Data Model: Sparse, Distributed, Multi-Dimensional Map
Column-oriented storage flips this
it groups all values from the *same column* together on disk.
Lesson 414Column-Oriented Storage Benefits
Columnar layouts
Values, timestamps, and tags are stored separately.
Lesson 1269Time Series Databases for Metrics
Columnar storage
storing columns together instead of rows, perfect for aggregations
Lesson 760Data Warehouse ArchitectureLesson 1530Analytics and Click Tracking
Command
The actual operation to execute (e.
Lesson 623Log Structure and Entries
Comments/likes
Sub-second updates feel "live"
Lesson 1671Real-Time Requirements for Social Feeds
Commit Latency
Time between a write being proposed and committed.
Lesson 643Monitoring and Operating Consensus Clusters
Commit Log (Write-Ahead Log)
The write is immediately appended to a sequential log file stored in GFS.
Lesson 448Write Path: MemTable and Commit Logs
Commit on their own
another participant might have voted "no"
Lesson 573The Blocking Problem in 2PC
Commit or rollback
Either save all changes permanently or undo everything
Lesson 310Atomicity: All-or-Nothing Transactions
Commit phase
The coordinator sends the final decision (`COMMIT` or `ABORT`) to all participants, who then execute it.
Lesson 569The Coordinator Role in 2PCLesson 5752PC Performance Characteristics
Committed use discounts
lower rates for guaranteed traffic volumes
Lesson 191CDN Provider Feature Comparison
Committing
means saving your current offset position back to Kafka.
Lesson 710Offsets and Commit Strategies
Common optimization strategies
Lesson 129Cache Hit Ratio Optimization
Common schedule
Full backup weekly, differential backups daily.
Lesson 1423Differential Backup Strategy
Common Schema
Standards like ECS (Elastic Common Schema) define field names (`http.
Lesson 1136Logging Libraries and Standards
Communicate degradation
Let users know when features are limited (optional but honest)
Lesson 1336Graceful Degradation
Communication Becomes Explicit
Teams communicate through well-defined APIs instead of navigating a shared codebase.
Lesson 798Organizational Alignment
Communications Lead
Manages all outbound communication—status updates to stakeholders, customer notifications, and executive briefings.
Lesson 1300Incident Command System (ICS)
Compare-and-Set (CAS)
operations like `WATCH` in Redis allow optimistic locking: watch a key, prepare a transaction, and execute only if the key hasn't changed.
Lesson 1800Race Conditions and Concurrency Control
Compare-and-Swap (CAS)
Updates a record only if its current value matches what you expect.
Lesson 1015Conditional Writes for Idempotency
Comparison
`_gt`, `_gte`, `_lt`, `_lte` (greater/less than)
Lesson 1892Filtering Query Parameters
Comparison operators
let you match values:
Lesson 393MongoDB Query Language Basics
Compensation stack
tracking which rollbacks to execute if failure occurs
Lesson 597Saga State Management and Persistence
Competing Consumers Pattern
solves this by adding multiple consumer instances that all read from the *same* queue.
Lesson 661Competing Consumers Pattern
Complete
the span when work finishes, recording duration and metadata
Lesson 1223Instrumentation Basics
Complete audit trail
You know *why* something is in its current state
Lesson 691Events as First-Class Citizens
Complete rewrites
The codebase becomes so tangled that starting over feels easier than fixing it
Lesson 2Why System Design Matters
Completion
When the operation finishes, the end time is recorded
Lesson 1231Span Lifecycle and StructureLesson 1586Multipart Upload for Large Files
Complex aggregation
Combining partial edits from multiple sources intelligently
Lesson 1387Custom Merge Functions
Complex business logic
Where partial success is unacceptable
Lesson 322Transaction Requirements and Trade-offs
Complex data synchronization
Must handle multi-region writes and conflicts
Lesson 1436Active-Passive vs Active-Active DR
Complex filtering
"Find all users who want email AND push for mentions BUT not marketing.
Lesson 1721Preference Storage Strategy
Complex Joins and Relationships
Lesson 320When SQL Is the Right Choice
Complex queries
Join with user metadata, analyze patterns
Lesson 1807In-Memory vs Persistent Storage for Rate Limiting
Complex relationships
that change often
Lesson 405When Joins Are Required
Complex restoration
must apply backups sequentially
Lesson 1422Incremental Backup Strategy
Complex routing logic
where clients would otherwise need to know about many backend service locations
Lesson 879When to Introduce an API Gateway
Complex service meshes
If a request touches 10+ microservices, tracing becomes essential.
Lesson 1260Cost-Benefit Analysis
Complex Traffic Management Needs
Lesson 868When Service Mesh Adds Value
Compliance mandates
GDPR, HIPAA, or industry regulations may require 1–7 years.
Lesson 1165Log Retention Policies
Compliance requires audit trails
You need to reconstruct state at any historical moment
Lesson 1427Continuous Data Protection
Compliance windows
Some industries require request deduplication for specific audit periods
Lesson 1012Idempotency Key Expiration Strategy
Components
The individual parts that do specific jobs (like databases that store data, servers that handle requests, or caches that speed things up)
Lesson 1What Is System Design?
Composability
means these services can be combined in different ways—like LEGO bricks—to build diverse experiences without rewriting code.
Lesson 800Reusability and Composability
Composite indexes
for multi-column filters (`user_id, created_at`)
Lesson 278Index Strategy for Large TablesLesson 1563Indexing for Ownership and Search
Composite keys
Balance distribution and ordering needs
Lesson 703Partitioning Strategies and Key Selection
Compound indexes
cover multiple fields together, like indexing `(country, city)` to efficiently query "all users in Paris, France.
Lesson 385Indexing in Document Stores
Compressed image formats
(WebP, AVIF) with aggressive optimization
Lesson 1618Optimizing for Mobile Networks
Compressed tries
merge single-child node chains into edge labels, dramatically reducing memory footprint.
Lesson 1776Typeahead Index Optimization
Compute batch views
(aggregations, reports, analytics)
Lesson 748Lambda Architecture: Batch Layer
Compute-intensive
Each query may need to parse and validate data
Lesson 759Schema-on-Write vs Schema-on-Read
Computes optimal send windows
(User X engages 5x more at 8 PM than 10 AM)
Lesson 1729Analytics-Driven Optimization
Con
Larger index size, slower writes (more data to maintain)
Lesson 279Covering IndexesLesson 1518Case Sensitivity Considerations
Concurrency Limiter
for strict limits
Lesson 975Algorithm Selection Criteria
Concurrent Conflicts
Different distributed transactions might touch the same data in different orders across services, creating deadlocks.
Lesson 566What is a Distributed Transaction?
Concurrent Reads
Multiple read operations can scan different SSTables simultaneously without coordination or locks.
Lesson 427SSTables and Immutable Storage
Concurrent writes
Two users editing the same document from different datacenters—one entire edit vanishes without trace
Lesson 1381Limitations of Last-Write-Wins
Conditional execution
Only run if not completed
Lesson 1037Idempotency in Distributed Workflows
Config servers
Store metadata about which data lives where
Lesson 396Sharding in MongoDBLesson 398Config Servers and mongos Routers
Configurable policies
Your crawler maintains per-host settings—either from `robots.
Lesson 1842Politeness Budget and Crawl Delay
Configuration burden
Each bulkhead needs tuning for size, timeouts, and thresholds—multiply this by dozens of dependencies
Lesson 1076Bulkhead Tradeoffs: Complexity and Resource Overhead
Configuration changes
Should we add this new node to the cluster?
Lesson 599What Is Distributed Consensus?Lesson 617Why Paxos Is Difficult in Practice
Configuration distribution
Pushes routing rules, policies, and traffic management settings to all Envoy proxies
Lesson 861Istio: Architecture and Components
Configuration Management
Store service configs keyed by service name.
Lesson 720Log CompactionLesson 846Control Plane: API and User Interface
Configuration stores
Retain current config values, not all changes
Lesson 712Log Compaction
Conflict Detection
Store metadata (like update timestamps or version numbers) with your data.
Lesson 219Application-Level Consistency Patterns
Conflict resolution
Use distributed ID generation (from lesson 1511) to avoid collisions across regions
Lesson 1535Multi-Region Deployment
Conflict resolution complexity
after partition heals
Lesson 494AP Systems: Prioritizing Availability
Conflicting entries
(different commands at the same log index)
Lesson 629Log Inconsistencies and Repair
Conflicting writes
happen on both nodes
Lesson 1340Split-Brain Problem
Confluent Schema Registry
(for Kafka) and cloud-native equivalents like AWS Glue Schema Registry.
Lesson 725Schema Registry and Evolution
Connection Acquisition Patterns
Lesson 273Connection Pool Monitoring
Connection failures
Simulate network partitions or complete service unavailability
Lesson 858Fault Injection for Testing
Connection interruption handling
requires resilience:
Lesson 1618Optimizing for Mobile Networks
Connection level
Specific applications get different limits
Lesson 285Query Timeout and Statement Limits
Connection limits
cap how many simultaneous connections a service accepts.
Lesson 852Circuit Breaking at the Mesh Level
Connection persists
Stays open until client disconnects or unsubscribes
Lesson 1915GraphQL Subscriptions for Real-Time Data
Connection pool configuration
Libraries allow multiple named pools with different size limits
Lesson 1071Connection Pool Bulkheads: Database and Service Isolation
Connection refused
Service briefly overloaded, recovering in seconds
Lesson 1020Why Retries Are Necessary in Distributed Systems
Connection Registry
When a feed update arrives, lookup which server holds the user's connection and route the message accordingly
Lesson 1674Connection Management at Scale
Connection timeout
limits how long you'll wait to establish a TCP connection with the remote service.
Lesson 1088Connection Timeout vs Request Timeout
Connection validation
is the practice of testing a connection before handing it to application code.
Lesson 271Connection Validation and Stale Connections
Consecutive error count
"Open after 5 straight failures" — detects immediate, persistent issues.
Lesson 1057Failure Detection and Counting
consensus algorithms
(like Zab or Raft) that guarantee: if you get an answer, it's the globally consistent answer.
Lesson 501Distributed Locking Services (CP)Lesson 607Consensus vs Consistency Models
Consider increasing timeout
or breaking the request into smaller chunks
Lesson 1115Deadline Exceeded Error Handling
Consider the 99th percentile
Don't tune for average latency.
Lesson 1091Default Timeout Pitfalls
Consistency across services
If your payments service uses `payment_processed_total`, don't let checkout use `checkout_txn_count`.
Lesson 1182Metric Naming Conventions
Consistency check
Each AppendEntries includes the `prevLogIndex` and `prevLogTerm` of the entry immediately before the new ones.
Lesson 629Log Inconsistencies and Repair
Consistency complexity
cache and database temporarily out of sync
Lesson 136Write-Behind (Write-Back) Caching Pattern
Consistency isn't one-size-fits-all
CAP's "consistency" means linearizability—the strongest guarantee.
Lesson 492Limitations of CAP as a Framework
Consistency models
answer: *"What guarantees does the system provide about the order and visibility of reads and writes?
Lesson 607Consensus vs Consistency Models
Consistency options
– You can choose sync/async per your needs (learned in lessons 1354-1357)
Lesson 1365Single-Leader Replication Topology
Consistency requirements
How correct must your data appear?
Lesson 553Choosing Consistency Levels
Consistency trade-offs
You might over-allow requests during sync windows
Lesson 1791Single Data Center vs Distributed Setup
Consistency tradeoffs
(synchronous cross-region replication adds latency)
Lesson 1334Geographic Redundancy and Multi-Region
Consistency with changing data
If items are inserted/deleted during pagination, cursors keep you at the logical position.
Lesson 1889Cursor-Based Pagination
Consistent behavior
Users experience uniform rate limiting regardless of which server handles their request
Lesson 979Centralized vs Decentralized Approaches
Consistent data formats
(like the W3C Trace Context you've already learned)
Lesson 1240OpenTelemetry Overview
Consistent enforcement
across all services without code changes
Lesson 859Rate Limiting at Service Boundaries
Consistent hashing helps
Minimizes data movement when adding/removing nodes
Lesson 258Resharding and Data Migration
Consistent performance
Whether you're on page 1 or page 10,000, the database performs an index seek—always fast
Lesson 1890Keyset Pagination
Consistent Policy Enforcement
A user shouldn't be able to bypass limits by hitting different endpoints.
Lesson 1782Rate Limiter Service Overview
Consistent prefix
Reads never see out-of-order writes
Lesson 554Consistency Model Examples in Real Systems
Consistent security policies
applied uniformly to all routes
Lesson 883Authentication at the GatewayLesson 891SSL/TLS Termination
Consistently
enforce all constraints immediately
Lesson 567The ACID Problem in Distributed Systems
Constraint boundaries
– If your limit is 1,000 QPS and you estimate 980, don't round to 1,000
Lesson 32Rounding and Approximation Techniques
Constraints
`NOT NULL` prevents missing values, `UNIQUE` prevents duplicates
Lesson 301Schema Enforcement and Type Safety
Consul
, and **ZooKeeper** all use consensus-based leader election to coordinate distributed operations safely.
Lesson 636Consensus for Leader ElectionLesson 638Configuration Management with Consensus
Consul Clients
Lightweight agents on each node, forward registrations to servers
Lesson 635Consul: Service Discovery with Raft Consensus
Consul Servers
Run Raft consensus, maintain the service catalog
Lesson 635Consul: Service Discovery with Raft Consensus
Consult bloom filters
for each SSTable — probabilistic check to skip entire files
Lesson 429Read Path and Bloom Filters
ConsumeKafka
/ **PublishKafka**: Integrate with Kafka
Lesson 775Apache NiFi for Data Flow
Consumer
Retrieves and processes messages at its own pace
Lesson 646The Producer-Consumer Model
Consumer (Worker Pool)
Multiple workers poll the queue, process files, then acknowledge completion.
Lesson 1604Message Queue for Processing Jobs
Consumer groups
solve this by allowing multiple consumers to coordinate and share the workload.
Lesson 708Consumer Groups and Parallel Consumption
Consumer retrieves schema
Consumers fetch the schema by ID to deserialize messages
Lesson 725Schema Registry and Evolution
Contact multiple nodes
to establish agreement
Lesson 526The Cost of Strong Consistency
Container orchestration platforms
like Kubernetes don't run themselves.
Lesson 811Infrastructure and Tooling Costs
Content Aggregation
News aggregators, job boards, or real estate platforms crawl multiple sources to compile listings in one place.
Lesson 1826What is a Web Crawler
Content delivery
– blog posts, images, videos tolerate brief staleness
Lesson 318When to Choose ACID or BASE
Content Delivery Network (CDN)
is a geographically distributed network of servers that cache and deliver static content (like images, videos, CSS, JavaScript files) from locations physically closer to your users.
Lesson 168What is a CDN and Why Use It
Content discovery
"Products similar to what you viewed" follows category and attribute relationships
Lesson 457Use Cases: Social Networks and Recommendations
Content features
post type (video/text), topic category, recency
Lesson 1668Machine Learning for Feed Ranking
Content feed
(AP): Display cached posts immediately; eventual consistency is acceptable
Lesson 510Real Systems: Multi-Region Trade-offs
content hashing
generating a unique fingerprint of a file's actual content (not its filename or metadata).
Lesson 1622Deduplication StrategiesLesson 1870Content Storage and Deduplication
Content negotiation
works the same way: the client tells the server which response format it prefers (JSON, XML, HTML, etc.
Lesson 1882Content Negotiation and Accept Headers
Content safety
Does the notification meet policy guidelines?
Lesson 1699Notification Processing Workers
Content Store
Blob storage for raw HTML/documents
Lesson 1732Crawling and Document Collection
Content type detection
lets you filter out non-HTML content early and decide what's worth processing.
Lesson 1833Content Type Detection
Content type preferences
Do you interact more with videos, photos, or text?
Lesson 1665Feed Ranking Fundamentals
Content-based
Filter on actual message payload (e.
Lesson 658Topic Subscriptions and Filtering
Context Maps
Document how contexts relate and integrate with each other
Lesson 815Domain-Driven Design and Bounded Contexts
Context switching
juggling multiple terminals and dashboards
Lesson 1441Runbooks and Automation
Context-aware
Lambda receives HTTP details (headers, body, path parameters) as structured input
Lesson 895AWS API Gateway and Serverless Integration
Context-aware queries
Understanding that "Washington" might mean a person, city, or state based on surrounding relationships
Lesson 475Knowledge Graphs and Semantic Networks
Contextual decisions
Merge based on user roles, time zones, or external state
Lesson 1387Custom Merge Functions
Continue existing context
if the caller already sent trace headers (distributed tracing across organizational boundaries)
Lesson 1239Root Span and Entry Points
Continued revenue
Your system still processes 7/8 of transactions
Lesson 266Shard Failure and Partial Outages
Continues sequentially
through remaining transactions
Lesson 591Orchestration-Based Sagas
Continuous Computation
Your processing logic runs continuously, waiting for the next event rather than starting and stopping on a schedule.
Lesson 737What is Stream Processing?
Continuous Consumption
Streams are designed for consumers to read continuously, processing events as they arrive.
Lesson 692Streams vs Traditional Databases
Continuous training
Retrain regularly on fresh click data to adapt to changing user intent
Lesson 1781Machine Learning for Ranking
Contributing Factors
Additional conditions that enabled or worsened the incident.
Lesson 1352Postmortem Structure and Action Items
Controlled staleness
→ Distributed cache with short TTLs
Lesson 130Choosing the Right Caching Layer
Conversion rate
Percentage of visitors who complete a desired action
Lesson 1196Business vs Technical Metrics
Conversion rates
Calculate cost-per-action, not just cost-per-send
Lesson 1694Channel Costs and Economics
Converts to absolute time
The deadline becomes a timestamp (not a duration)
Lesson 1104gRPC Timeout Propagation
Conway's Law
tells us that system architecture should match team structure—if you have multiple autonomous teams, a monolith fights against that.
Lesson 821When to Transition from Monolith to Microservices
Cookie Transport
The server sets an HTTP cookie containing the token.
Lesson 918Cookie vs Bearer Token Transport
Cooperative (Incremental) Rebalancing
Lesson 717Rebalancing Protocol and Strategies
Coordinate delivery
Use a notification orchestrator that sends requests to each channel's dedicated service (push notification service, SMS gateway, email sender, in-app storage).
Lesson 1689Multi-Channel Delivery
Coordinated deployments
when changes must go live together
Lesson 808Team Coordination Overhead
Coordination overhead explodes
Simple operations that once happened in a single transaction now require multiple services to coordinate.
Lesson 802Distributed System Complexity
Coordination required
Nodes must agree before responding, which takes time
Lesson 493CP Systems: Prioritizing Consistency
Coordinator-Worker Pattern
splits your web crawler into two distinct roles:
Lesson 1863Coordinator-Worker Pattern for Crawling
Copy-on-Write
The parent process continues serving requests while the child writes the snapshot to disk.
Lesson 350Redis Persistence: RDB Snapshots
Corner cases everywhere
Handling preemptions, retries, and failures creates a combinatorial explosion of edge cases
Lesson 617Why Paxos Is Difficult in Practice
Correlate easily
Join logs across services using `correlation_id` without parsing strings
Lesson 1137What is Structured Logging
Correlation
Connect logs from multiple services using correlation IDs
Lesson 1169Centralized vs Localized Logging
correlation ID
is a unique identifier (often a UUID) attached to a request when it enters your system.
Lesson 1132Correlation IDs and Request TracingLesson 1158Correlation IDs Across ServicesLesson 1161Context-Rich Logging
Correlation IDs
linking requests to saga instances
Lesson 597Saga State Management and Persistence
CORS considerations
Bearer tokens require proper CORS headers because JavaScript makes the request.
Lesson 918Cookie vs Bearer Token Transport
Cortex
offers similar capabilities with a focus on multi-tenancy.
Lesson 1206Metrics Federation and Long-Term Storage
Cost Constraints
Every "nine" of availability roughly multiplies infrastructure costs.
Lesson 1276Setting Realistic SLOs
Cost control
Limits expensive operations (database queries, third-party API calls)
Lesson 955What is Rate Limiting?Lesson 1577Paste Editing and Version History
Cost flexibility
Use smaller, cheaper commodity hardware instead of expensive high-end machines
Lesson 44What is Horizontal Scaling?
Cost inefficiency
Database storage is typically 10-20× more expensive than object storage
Lesson 1550Object Storage for Paste Content
Cost optimization
Mix powerful and modest instances
Lesson 86Weighted Round Robin
Cost reduction
Storing 1% of debug logs instead of 100% can reduce your logging bill by 99%
Lesson 1164Sampling for High-Volume Logs
Cost savings
Less bandwidth and compute at the origin
Lesson 179Origin Shield: Protecting Origin Servers
Cost tradeoff
You're trading increased backend load (~5-10% more requests) for better user-facing latency
Lesson 1031Hedged Requests and Speculative Execution
Cost vs value tradeoff
What's the business value of speed?
Lesson 746Choosing Batch vs Stream
Cost-based rate limiting
charges users differently based on what their requests actually cost your system.
Lesson 992Cost-Based Rate Limiting
Cost-effective at scale
Built on cheap object storage (like AWS S3, Azure Data Lake Storage)
Lesson 758Data Lake Fundamentals
Cost-effective scaling
Replicas are cheaper than sharding or massive vertical scaling
Lesson 1522Read-Heavy Workload and Database Scaling
Cost-sensitive calculations
– Bandwidth costs compound; 10% error means real money
Lesson 32Rounding and Approximation Techniques
Costs skyrocket
Redundancy and safeguards have diminishing returns
Lesson 1310Embracing Risk: The 100% Availability Trap
Count-based windows
group by number of events (e.
Lesson 741Windowing in Stream Processing
counter
is a metric type that represents a monotonically increasing value—it only goes up (or resets to zero).
Lesson 1174Counter MetricsLesson 1179Aggregation and Roll-UpsLesson 1183Counter Metrics
Counter Resets
When a service restarts, counters reset to zero.
Lesson 1187Rate Calculations from Counters
Counters are maintained
– It keeps a rolling window of recent call results (e.
Lesson 1045The Three States: Closed
covering index
is an index that contains *all* the columns needed to satisfy a query.
Lesson 279Covering IndexesLesson 284Aggregation Query Optimization
Covering indexes
When aggregations vary but use consistent columns
Lesson 284Aggregation Query Optimization
CP (Consistency over Availability)
systems we learned about in the CAP theorem — you sacrifice some availability to maintain perfect consistency.
Lesson 522What is Strong Consistency?
CP response
Stop taking orders until communications restore (preserve consistency)
Lesson 505The Partition Question: When, Not If
CP system
(prioritizing consistency) instead refuses to complete a purchase unless it can guarantee the inventory count is accurate and up-to-date.
Lesson 499Inventory Management (CP)Lesson 502Mixed Strategies: Hybrid Systems
CPU and Memory
Every sidecar proxy is a separate process consuming resources.
Lesson 834Service Mesh Performance OverheadLesson 841Data Plane: Performance and Latency Overhead
CPU and memory utilization
tighten limits when resources are strained
Lesson 972Adaptive Rate Limiting
CPU caches (L1/L2/L3)
are tiny, ultra-fast memory banks built directly into your processor:
Lesson 127CPU and Disk Caching Layers
CPU constraints
Parsing HTML, extracting links, and computing content fingerprints are CPU-intensive.
Lesson 1862Why Distribute a Web Crawler
CPU cores × 2
While one query waits for disk I/O, another can use the CPU
Lesson 269Pool Size Configuration
CPU cycles
spent converting objects to JSON strings
Lesson 1143Performance Impact of Structured Logging
CPU limits
Maximum CPU cores or time slices (e.
Lesson 1072CPU and Memory Bulkheads: Resource Quotas
CPU savings
QR generation involves matrix calculations and image encoding — expensive to repeat
Lesson 1539QR Code Generation
CPU upgrade
4 cores → 16 cores to handle more concurrent requests
Lesson 43What is Vertical Scaling?
CPU utilization
(current percentage)
Lesson 1175Gauge Metrics
CPU-bound tasks
Video transcoding, image processing
Lesson 971Concurrency Limiter Pattern
Crash-stop
(or fail-stop) failures are the "well-behaved" failures.
Lesson 602Crash-Stop vs Byzantine Failures
Crawl delay
is the enforced wait time between consecutive requests to the same domain, ensuring you don't hammer servers with rapid-fire requests.
Lesson 1842Politeness Budget and Crawl Delay
Crawl freshness
is about keeping your index up-to-date while respecting resource limits—you can't recrawl the entire web constantly.
Lesson 1835Crawl Freshness Requirements
Crawl-delay
Minimum seconds between requests (integrates with your politeness budget)
Lesson 1861Robots.txt Caching and Parsing
Crawling
is the process of systematically discovering and downloading web pages (or internal documents) so they can be indexed later.
Lesson 1732Crawling and Document Collection
Crawling/Ingesting
content from various sources
Lesson 1730What is a Search Engine?
Create
inserts a new document into a collection, often auto-generating a document ID if you don't provide one.
Lesson 387CRUD Operations on DocumentsLesson 1223Instrumentation BasicsLesson 1542Pastebin System Overview
Create an ephemeral resource
A client creates a special node/key with a **session/lease**
Lesson 637Distributed Locks via Consensus
Create cascading failures
when dependent systems (cache clusters, storage nodes) become overloaded
Lesson 1654Fanout Rate Limiting
Create ephemeral nodes
Automatically deleted when the client disconnects (useful for leader election)
Lesson 633ZooKeeper: Coordination Service Built on Consensus
Create new context
if this is a fresh request (generate `trace_id` and `span_id`)
Lesson 1239Root Span and Entry Points
Create new partition
– split the range at that key (e.
Lesson 1475Dynamic Range Splitting
Create sequential nodes
Auto-numbered for implementing distributed queues or locks
Lesson 633ZooKeeper: Coordination Service Built on Consensus
Create, Read, Update, Delete
the four fundamental operations you perform on data.
Lesson 387CRUD Operations on Documents
Creating records
`POST /orders` — repeating creates duplicate orders
Lesson 1006Natural Idempotency vs Engineered Idempotency
Creation
A span is created when an operation starts, recording the start time and operation name
Lesson 1231Span Lifecycle and Structure
Credits or refunds
if SLAs are breached
Lesson 191CDN Provider Feature Comparison
Critical (P0/P1)
These are "drop everything" alerts.
Lesson 1291Alert Severity Levels
Critical (seconds)
SMS + Push notification (e.
Lesson 1688Channel Selection Strategy
Critical data
(payments, inventory counts, user passwords): Use CP strategies.
Lesson 502Mixed Strategies: Hybrid Systems
Critical financial transactions
might favor strong consistency (CP-leaning), refusing to proceed if data sync is uncertain
Lesson 488CAP as a Spectrum, Not Binary
Critical operations
20% of total capacity, no rate limit
Lesson 974Rate Limiting with Priority Queues
critical path
is a dependency chain whose failure would cause the most severe cascading damage to your system.
Lesson 1082Critical Path IdentificationLesson 1442Dependency Mapping and Critical Path Analysis
Critical Path Analysis
examines your trace data to identify the *longest chain* of dependent spans — the bottleneck sequence that, if optimized, would actually reduce total response time.
Lesson 1227Critical Path AnalysisLesson 1229Service Dependency GraphsLesson 1232Span Relationships and HierarchyLesson 1442Dependency Mapping and Critical Path Analysis
Critical requests
(login, checkout, emergency services): minimal or no throttling
Lesson 995Graceful Degradation Through Throttling
Critical resources
Do we have disk space, memory?
Lesson 101Health Check Endpoints
Critical vs non-critical paths
Protecting payment processing from search analytics failures
Lesson 1076Bulkhead Tradeoffs: Complexity and Resource Overhead
Critical/Urgent queue
Security alerts, payment failures, password resets
Lesson 1700Priority Queues and Urgency Levels
Cross-boundary queries
If you allow fuzzy matching or typo correction, you might need to query multiple shards in parallel and merge results.
Lesson 1764Distributed Trie Architecture
Cross-collection operations
Since document stores discourage joins, fetching related data often requires multiple round-trips or application-level logic, multiplying latency.
Lesson 408Query Performance Limitations
Cross-cutting concerns
(authentication, rate limiting, logging) get duplicated across services
Lesson 870What is an API Gateway?
Cross-cutting concerns becoming duplicated
across services (authentication, rate limiting, logging)
Lesson 879When to Introduce an API Gateway
Cross-Region Strategy
Data center affinity vs global session replication
Lesson 947Distributed Session Management
Cross-team meetings
to agree on API contracts
Lesson 808Team Coordination Overhead
Crystal clear
Version is immediately visible in URLs and logs
Lesson 1899URI Versioning (Path-Based)
CSRF (Cross-Site Request Forgery)
attacks because browsers auto-send them even from malicious sites.
Lesson 918Cookie vs Bearer Token Transport
Cumulative counts
Many implementations store cumulative counts: "≤10ms", "≤50ms", etc.
Lesson 1185Histogram Metrics
Current metrics
The actual value that triggered the alert (e.
Lesson 1293Alert Context and Enrichment
current state
of entities in a database—like "User balance: $100.
Lesson 691Events as First-Class CitizensLesson 1175Gauge Metrics
Current step index
in the saga sequence
Lesson 597Saga State Management and Persistence
Current temperature
CPU temperature that varies with load
Lesson 1184Gauge Metrics
Cursor-based pagination
replaces numeric offsets with **opaque tokens** (cursors) that encode a specific position in the dataset.
Lesson 1889Cursor-Based Pagination
Custom claims
(your application data):
Lesson 913JWT Structure and Claims
Custom Conflict Resolution
For shopping carts, you might merge conflicting versions (combine items).
Lesson 219Application-Level Consistency Patterns
Custom expiration
(1 hour, 1 day, 1 week, 1 month, etc.
Lesson 1565Expiration Requirements and TTL Basics
Custom logic
You can use business rules, not just hash functions
Lesson 242Directory-Based ShardingLesson 702Producers and Message Publishing
Custom rules per endpoint
(optional overrides)
Lesson 1819Per-Tenant Configuration Storage
Custom signals
(user interest, business priorities, content type)
Lesson 1844Front Queue: Priority Management
Customer ID
(e-commerce order systems)
Lesson 244Entity-Based Sharding
Customer lifetime value
Long-term business health
Lesson 1196Business vs Technical Metrics
Customer Service
owns user profiles—it manages authentication and preferences
Lesson 817Identifying Service Boundaries by Data Ownership
Cypher
(used by Neo4j) is designed to look like the graph patterns you're searching for.
Lesson 456Graph Query Languages: Cypher and GremlinLesson 465Variable-Length Paths

D

DAG
is the blueprint of your pipeline.
Lesson 766Apache Airflow Fundamentals
Dagster
represent the next generation.
Lesson 773Prefect and Dagster for Modern Workflows
Daily backups
(sons): Keep 7–14 days
Lesson 1406Backup Retention Policies
Dangling references
Creating records pointing to non-existent entities
Lesson 262Referential Integrity Across Shards
Dashboard Architecture
Build a user dashboard that queries pastes by `user_id` with pagination.
Lesson 1578User Accounts and Paste Management
Dashboards
are collections of panels organized into rows.
Lesson 1200Grafana for Metrics Visualization
Data Aggregation
Collect streams from edge locations into a central analytics cluster
Lesson 726Multi-Datacenter Replication
Data Change Rate
drives minimum frequency.
Lesson 1424Backup Scheduling and Frequency
Data corruption
can occur when the partition heals and nodes try to merge state
Lesson 1340Split-Brain Problem
Data distribution
Choose partition keys that spread data evenly across nodes (avoid hot spots)
Lesson 423Primary Key Components
Data divergence
makes reconciliation complex or impossible
Lesson 1340Split-Brain Problem
Data efficiency
matters on metered connections:
Lesson 1618Optimizing for Mobile Networks
Data filtering
Remove internal metadata or sensitive information
Lesson 882Request and Response Transformation
Data flow
How information moves between these components (like planning how packages get from warehouses to customers)
Lesson 1What Is System Design?
Data keys
(like user IDs, session tokens, cache keys)
Lesson 1458Mapping Keys and Nodes to the Ring
Data Lakes
prioritize **flexibility** by storing raw, unprocessed data.
Lesson 762Query Performance TradeoffsLesson 763Cost and Storage Efficiency
Data Locality
Comply with regulations requiring data processing in specific regions
Lesson 726Multi-Datacenter Replication
Data locality matters
Accessing nearby memory addresses is dramatically faster
Lesson 127CPU and Disk Caching Layers
Data loss is guaranteed
earlier writes disappear completely
Lesson 1380Last-Write-Wins (LWW) Strategy
Data loss occurs
– Recent writes haven't reached replicas yet
Lesson 1356Asynchronous Replication: Speed and Risk
Data loss risk
if cache crashes before flushing, recent writes are lost
Lesson 136Write-Behind (Write-Back) Caching Pattern
Data Mining
Companies crawl e-commerce sites, news outlets, or social platforms to gather pricing data, trends, or public sentiment.
Lesson 1826What is a Web Crawler
Data Ownership
Each microservice has its own database or schema.
Lesson 781What are Microservices?
Data quality is guaranteed
queries can trust the data types
Lesson 301Schema Enforcement and Type Safety
Data replication
means storing the same data on multiple servers (nodes) instead of keeping it in just one place.
Lesson 68What is Data Replication?Lesson 1334Geographic Redundancy and Multi-RegionLesson 1338Stateless vs Stateful Redundancy
Data Residency and Compliance
Different countries have laws requiring user data to stay within their borders (GDPR in Europe, data sovereignty in China).
Lesson 1435Multi-Region Architecture for DR
Data sprawl
Multiple copies, outdated versions, unknown lineage
Lesson 764Data Governance and Quality
Data structures
Complex operations happen at memory speed
Lesson 349Redis In-Memory Storage Model
Data tables
listing the top N error messages
Lesson 1152The ELK Stack: Kibana
Data transformation
work to fit new schemas
Lesson 328Migration and Legacy System Constraints
Data types
Are values the correct type (numbers, strings, booleans)?
Lesson 886Request Validation
data warehouse
is a centralized repository designed specifically for analytical workloads, not day-to-day transactions.
Lesson 757Data Warehouse FundamentalsLesson 1530Analytics and Click Tracking
Data Warehouses
use **pre-aggregation** and **indexing** to optimize query speed.
Lesson 762Query Performance TradeoffsLesson 763Cost and Storage Efficiency
Database cache
Reduces query load at the source
Lesson 120Caching Hierarchy Overview
Database call
500ms (leaving buffer for Service B)
Lesson 1097The Timeout Chain Problem
Database connection pool utilization
Lesson 1175Gauge Metrics
Database connections
You have 100 connection pool slots
Lesson 971Concurrency Limiter Pattern
Database connectivity
Can we read/write?
Lesson 101Health Check Endpoints
Database load
More UPDATE queries mean higher CPU, I/O, and lock contention
Lesson 296Write Amplification Costs
Database load spikes
– Every feed read hits the database with complex queries
Lesson 1637Pull (Read-Time) Feed Model
Database overload
Sudden spike in identical queries
Lesson 159Cache Stampede Problem
Database proxy layer
Tools like PgBouncer or ProxySQL can enforce per-user or per-application connection limits
Lesson 1071Connection Pool Bulkheads: Database and Service Isolation
Database query results
→ Database query cache or Redis
Lesson 130Choosing the Right Caching Layer
Database replication
is the process of copying data from a **primary (master) database** to one or more **replica (slave) databases**.
Lesson 198What is Database Replication?
Database restarts
invalidate all existing connections
Lesson 271Connection Validation and Stale Connections
Database strain
– every feed refresh triggers complex joins and aggregations
Lesson 1647Fanout-on-Read (Pull Model)
Database tier
10-30 second intervals, 5-10 second timeouts (databases can have legitimate temporary slowdowns)
Lesson 100Health Check Intervals and Timeouts
Database-modifying functions
`SELECT my_update_function()` looks like a read but isn't
Lesson 223Detecting Read vs Write Queries
Database-side connection limits
might force-close old connections
Lesson 271Connection Validation and Stale Connections
Datadog's metric summaries
to maintain these catalogs.
Lesson 1216Metric Documentation and Discovery
Dataflow
Google's managed service
Lesson 772Apache Beam Programming Model
DataFrames
provide a higher-level abstraction with schema awareness (like database tables).
Lesson 768Apache Spark Overview
DataLoader
solves this by collecting all data requests within a single execution tick, batching them into one efficient query, and caching results.
Lesson 1914DataLoader and Batching Solutions
Date
True timestamp values, not just strings that look like dates
Lesson 390BSON Format and Data Types
Date-prefixed records
Orders keyed as `2024-12-15-order-001`
Lesson 1474Hotspot Problems in Range Partitioning
Day 0
Full backup (100GB)
Lesson 1404Differential Backups
Day 4
Incremental backup captures 4 GB (changes since Day 3)
Lesson 1422Incremental Backup Strategy
Days (7)
For critical "never duplicate" scenarios
Lesson 1712Deduplication Windows and Storage
Dead letter destination
Where permanently failed messages go
Lesson 684Negative Acknowledgments and Redelivery
Dead letter handling
quarantine bad data without blocking the entire pipeline
Lesson 777Workflow Orchestration Patterns
Dead letter queue
After N failures, route to a special queue for investigation
Lesson 684Negative Acknowledgments and RedeliveryLesson 1705Retry and Dead Letter Queues
Dead Letter Queue (DLQ)
is a special holding queue where messages go after exhausting all retry attempts.
Lesson 687Dead Letter QueuesLesson 1715Retry Strategies for Failed Deliveries
Dead Letter Queues
Redirect undeliverable messages
Lesson 671ActiveMQ and Traditional Enterprise Messaging
Deadline propagation
solves this by passing an absolute deadline down the call chain instead of durations.
Lesson 1108What is Deadline Propagation
Debezium
is an open-source CDC platform built on Kafka Connect.
Lesson 776Change Data Capture Tools
Debouncing
means waiting for a brief pause in typing before sending a request.
Lesson 1763Debouncing and Request Optimization
DEBUG
Detailed diagnostic information for troubleshooting during development
Lesson 1141Log Levels in Structured Logs
Debug logs
3-7 days (expensive, high volume, rarely needed after immediate troubleshooting)
Lesson 1135Log Retention and Volume Management
Debug logs in production
Fine-grained debug statements should stay off unless actively troubleshooting—they create massive volumes and performance drag.
Lesson 1129What to Log vs What Not to Log
Debug sampling
Engineers add a debug header (`X-Trace-Debug: 1`) to specific requests during investigation.
Lesson 1256Priority and Debug Sampling
Debugging
When something breaks, logs tell you the sequence of events leading to failure.
Lesson 1127What is Logging and Why It Matters
Debugging is easier
if data exists, it passed all checks
Lesson 301Schema Enforcement and Type Safety
Decides the next step
based on the result
Lesson 591Orchestration-Based Sagas
Decimal128
High-precision decimal numbers for financial calculations
Lesson 390BSON Format and Data Types
Decision point
The coordinator collects all responses.
Lesson 569The Coordinator Role in 2PC
Decoupling
Producers and consumers don't need to know about each other or be online simultaneously
Lesson 646The Producer-Consumer ModelLesson 647Message Queue BasicsLesson 1698Message Queue for Decoupling
Decoupling is critical
Producers and consumers operate independently
Lesson 734NATS Streaming
Decrease MTTR
(fix failures faster) — automated failover, better monitoring, faster deployments
Lesson 1325Availability Formula: MTBF and MTTR Relationship
Decrements the remaining budget
automatically at each proxy
Lesson 1101Timeout Propagation in Service Meshes
Dedicated DNS Resolution Pool
Instead of each worker handling DNS independently, deploy a cluster of specialized DNS resolver services.
Lesson 1869Scaling DNS Resolution
Dedicated IP pools
with pre-warmed reputation
Lesson 1686Email Notifications
Dedicated sharding
Route hot tenants to isolated Redis instances to prevent interference
Lesson 1823Hot Tenant Problem
Deduplicate intelligently
Track which messages have been successfully delivered across channels.
Lesson 1689Multi-Channel Delivery
Deduplication Layer
Before saving, compute hash and check if it exists; if yes, just add new URL → hash reference
Lesson 1870Content Storage and Deduplication
Deep checks
provide more confidence that the service can handle real traffic, but they:
Lesson 102Shallow vs Deep Health Checks
Default Expiration Policy
Instead of allowing infinite TTL, set a "very long" default—perhaps 1 year, 5 years, or 10 years.
Lesson 1573Handling Never-Expiring Pastes
Default sort order
Define sensible defaults (often ID or creation timestamp)
Lesson 1894Sorting Query Parameters
Default to safe behavior
Pre-define fallback responses
Lesson 1336Graceful Degradation
Default values
Make new fields optional with sensible defaults
Lesson 809Versioning and Backward CompatibilityLesson 1061Fallback Strategies
Defense in depth
means implementing rate limits at every major boundary, so if one layer fails, others still protect your resources.
Lesson 962Rate Limiting at Different LayersLesson 991Hierarchical Rate Limiting
Defer costly decisions
Once domain boundaries emerge naturally through growth, you can extract services intentionally
Lesson 820When a Monolith is the Right ChoiceLesson 825Starting with a Modular Monolith
Defer side effects
until after the core operation completes successfully
Lesson 1038Side Effect Management
Define rejection criteria
thresholds that indicate your system can't handle current load
Lesson 1084Load Shedding Under Cascading Failure
Degrade performance
of trace collection pipelines
Lesson 1258Cardinality Explosion
Degraded mode
Return results from N-1 shards with a warning flag rather than failing completely—users get 95% coverage instead of nothing.
Lesson 1780Distributed Query Coordination
Degrades functionality
gracefully rather than failing completely
Lesson 315Basically Available: Prioritizing Uptime
Degrading dependencies
makes external services slow, unavailable, or return errors.
Lesson 1347Common Chaos Experiments
Degree centrality
Simply counts connections—who knows the most people?
Lesson 468Graph Algorithms: PageRank and Centrality
Delay enforcement
Before dequeuing for fetch, check if enough time has passed since the last request to that host
Lesson 1845Back Queue: Politeness Enforcement
Delayed expiration
Tokens accepted when they should be expired
Lesson 949Clock Skew and Token Validation
Delayed retry
Wait before making it available again (backoff strategy)
Lesson 684Negative Acknowledgments and RedeliveryLesson 1021Immediate Retry vs Delayed Retry
Delete before adding
Before introducing a new service, library, or pattern, ask if existing tools can solve the problem
Lesson 1315Simplicity as a Core Value
Deleted (90+ days)
Unless regulatory requirements demand it, purge everything.
Lesson 1246Trace Data Retention Policies
Deleting a record
`DELETE FROM orders WHERE id = 123` — deleting again changes nothing
Lesson 1006Natural Idempotency vs Engineered Idempotency
Deletion
When messages expire per retention policy, Kafka simply deletes entire old segment files (fast!
Lesson 711Message Retention and Log Segments
Deletion (after retention period)
Permanently remove logs that exceed legal and business requirements.
Lesson 1165Log Retention Policies
Delivered
The provider successfully delivered the message to the user's device or inbox.
Lesson 1724Notification Analytics Events
Delivery attempt
→ Message fails processing
Lesson 687Dead Letter Queues
Delivery Guarantee Requirements
Lesson 1688Channel Selection Strategy
Delivery guarantees
High-priority notifications might require a fallback: try push first, then SMS if undelivered.
Lesson 1703Channel Routing Logic
Delivery layer
WebSocket servers or polling endpoints
Lesson 1687In-App Notifications
Delivery rate
per channel (SMS vs push vs email)
Lesson 1729Analytics-Driven Optimization
Delivery Receipt Tracking
(lesson 1693).
Lesson 1695Fallback and Retry Logic
Delivery success
90–98% depending on region and carrier
Lesson 1685SMS Notifications
Delta encoding
for document IDs (store differences, not absolute values)
Lesson 1745Posting Lists and Document IDs
Delta/diff storage
saves only the changes between versions.
Lesson 1577Paste Editing and Version History
Denormalization
Duplicate data across shards to avoid cross-shard queries
Lesson 261Distributed Transactions Across ShardsLesson 1519Database Schema for URL Shortener
Denormalized
Optimizes for reads—fetch everything in one query, but updates require touching multiple records
Lesson 289Normalized vs Denormalized Schema Design
Denormalized approach
Store `seller_name` and `rating` directly in the Products table.
Lesson 288Why Denormalization?
Dense posting lists
(common terms): Bitmap compression
Lesson 1752Index Compression Techniques
Dependencies
Which tasks must complete before others start
Lesson 766Apache Airflow Fundamentals
Dependency failures
Make a downstream service unavailable to test bulkhead isolation
Lesson 1342Testing Redundancy with Fault Injection
Dependency mapping
creates a visual graph showing which services rely on which, while **critical path analysis** determines the optimal restoration sequence to minimize total recovery time.
Lesson 1442Dependency Mapping and Critical Path Analysis
Deploy
Replace or augment traditional scoring with ML predictions
Lesson 1781Machine Learning for Ranking
Deploy application code
that works with both old and new schemas
Lesson 265Schema Changes in Sharded Environments
Deploying new features
(which might occasionally fail)
Lesson 1279Error Budgets: The Core Concept
Deployment
| Fully managed | Self-managed (or managed services) |
Lesson 728AWS Kinesis Overview
Deployment complexity
Each service needs its own CI/CD pipeline, container orchestration config, and rollback strategy
Lesson 803Operational Overhead
Deployment coordination
Schema changes, software upgrades, or configuration updates must roll out across all shards.
Lesson 264Operational Complexity of Sharded Systems
Deployment pipelines
for each service with its own build, test, and release cycle
Lesson 810Deployment Complexity
deployment simplicity
advantage of monoliths for network complexity, while losing the **independent deployability** promise of microservices.
Lesson 789The Distributed Monolith Anti-PatternLesson 1864Stateless Worker Design
Deployments Are Safer
Rolling updates become straightforward—drain traffic from old instances, start new ones.
Lesson 878Stateless Gateway Design
Deprecation periods
Announce breaking changes months in advance, giving teams time to migrate
Lesson 809Versioning and Backward Compatibility
Depth limiting
restricts how many levels deep a client can nest fields.
Lesson 1916Rate Limiting and Complexity Analysis in GraphQL
Depth-First Search (DFS)
follows one path as deeply as possible before backtracking.
Lesson 1830Breadth-First vs Depth-First Crawling
Description
What the metric measures in plain language
Lesson 1216Metric Documentation and Discovery
Design for Change
when requirements shift, documented decisions help you understand what's safe to modify.
Lesson 42Document Your Decisions
Design for failure
building systems that anticipate and gracefully handle component failures
Lesson 1307What is Site Reliability Engineering (SRE)?
Details/Context
Optional additional information like which field caused the problem, validation rules violated, or trace IDs for debugging.
Lesson 1883Error Response Structure and Consistency
Detect duplicate content
to avoid redundant fetches
Lesson 1826What is a Web Crawler
Detect threshold breach
– when a partition exceeds a configured limit (e.
Lesson 1475Dynamic Range Splitting
Detection Speed vs Accuracy
Fast detection means quicker failover, but too sensitive checks cause false positives.
Lesson 1335Failover Mechanisms
Detection time
How long until you notice the failure
Lesson 1324Mean Time To Repair (MTTR)
Detection/Response Analysis
How was it detected?
Lesson 1352Postmortem Structure and Action Items
Detects
abnormal conditions (failure rates exceeding thresholds)
Lesson 1044The Electrical Analogy
Determine applicable channels
Check user preferences, channel availability, and message urgency.
Lesson 1689Multi-Channel Delivery
Determines the target partition
within the topic
Lesson 702Producers and Message Publishing
Determinism
Same URL always generates the same short code—no duplicates stored for identical links.
Lesson 1508Hash-Based Generation Approach
Deterministic
Same content always produces the same hash
Lesson 1852Content Fingerprinting with Hashing
Developer intuition
How quickly new API consumers understand your endpoints
Lesson 1877Singular vs Plural Resource Names
Development speed matters
One codebase means faster iteration, easier testing, and fewer bugs from maintaining duplicate logic.
Lesson 755When to Choose Lambda vs Kappa
Device offline scenarios
User reconnects after hours—sees only final state
Lesson 1713Provider-Side Deduplication
Device token registration
Store tokens when users log in on mobile
Lesson 1681Mobile Push Notification Integration
Device-specific schemas
Different IoT device models may report different metrics.
Lesson 404Mobile and IoT Backend Storage
Device/Browser
Mobile vs desktop, Chrome vs Safari.
Lesson 1505Analytics and Tracking Requirements
DevOps tooling
expands dramatically: CI/CD pipelines for each service, container registries, automated testing frameworks, deployment automation, and configuration management systems all require licenses, infrastructure, and maintenance.
Lesson 811Infrastructure and Tooling Costs
DFS
for targeted crawls of specific sites or when memory is severely constrained.
Lesson 1830Breadth-First vs Depth-First Crawling
Diagnosis time
Identifying the root cause
Lesson 1324Mean Time To Repair (MTTR)
Different environments
Test restoring to alternate infrastructure
Lesson 1408Backup Verification and Testing
Different Questions Answered
Databases answer "what is the state now?
Lesson 692Streams vs Traditional Databases
Different security rules
Refresh endpoints can require additional checks (device fingerprinting, IP validation)
Lesson 915Token Expiration and Refresh Tokens
Different server specs
Use weighted distribution.
Lesson 226Load Distribution Across Replicas
Different Storage Needs
Unlike URL shorteners that store tiny key-value pairs, Pastebin stores variable-length text blobs (bytes to megabytes), introducing interesting storage and retrieval challenges.
Lesson 1542Pastebin System Overview
Difficult updates
Changing rules requires redeploying applications
Lesson 941Policy Decision Points (PDP) and Enforcement Points (PEP)
Dimension and Duration Checks
validate image resolution and video length to prevent edge cases that could crash processing systems or violate business rules.
Lesson 1599Upload Validation and Virus Scanning
Dimension tables
describe the "who, what, when, where"—customer details, product catalogs, dates.
Lesson 760Data Warehouse Architecture
Diminishing returns
Going from 5 to 7 nodes adds significant cost for just one more failure
Lesson 639Consensus Cluster Sizing Tradeoffs
Direct database read
When fetching *your own* feed, bypass caches and read directly from the authoritative source (Posts table) to include any content you just created.
Lesson 1678Read-After-Write Consistency
Direct user uploads
via pre-signed URLs
Lesson 1593Distributed File System Considerations
Direction
Points from one node to another (though some systems support undirected edges)
Lesson 452Graph Model: Nodes and Edges
Directory + Consistent Hashing
Use a directory to map logical shards, but employ consistent hashing within the directory to minimize data movement when adding shards.
Lesson 250Hybrid Sharding Strategies
directory partitioning
maintains an explicit lookup table—a "directory"—that records which partition key belongs to which physical node or partition.
Lesson 1476Directory Partitioning FundamentalsLesson 1478Directory Partitioning FlexibilityLesson 1480Hybrid Partitioning ApproachesLesson 1481Range vs Directory Tradeoffs
Directory responds
"Shard C"
Lesson 242Directory-Based Sharding
Directory-based
adds a lookup service dependency
Lesson 253Evaluating Sharding Strategy Tradeoffs
Disable optional features
entirely (turn off recommendations)
Lesson 1083Graceful Degradation Strategies
Disadvantage
Higher latency — every write operation must wait for network round-trips and replica disk writes.
Lesson 203Synchronous Replication Explained
Disadvantages of summaries
Lesson 1186Summary Metrics
Disaster Recovery (DR) plan
is your blueprint for restoring systems and data after a catastrophic event—whether that's a data center fire, ransomware attack, or natural disaster.
Lesson 1434Disaster Recovery Planning Fundamentals
Disaster resilience
A fire in your primary data center won't destroy backups stored 1,000 miles away.
Lesson 1429Geographic Backup Distribution
Discard old log entries
All entries up to and including that index can now be safely deleted
Lesson 632Log Compaction: Snapshotting
Discover gaps
in monitoring, alerts, and documentation in a controlled setting
Lesson 1345Starting with Game Days
Disk I/O saturation
Writing crawled content and persisting frontier state creates I/O bottlenecks.
Lesson 1862Why Distribute a Web Crawler
Disk space
Available storage that changes with writes and deletes
Lesson 1184Gauge Metrics
Disk Writes
After logging, changes are eventually written to the actual data files on disk.
Lesson 313Durability: Surviving System Failures
Distance constraint
Typically ≤ 100 km due to latency
Lesson 1439Data Replication for DR
Distribute to downstream calls
Split remainder among dependencies (equally or weighted by expected latency)
Lesson 1119Timeout Budget Management Across Service Chains
Distribute to Workers
Each batch is sent to a different worker instance—typically via the message queue you set up for asynchronous fanout processing.
Lesson 1652Fanout Worker Parallelization
Distributed cache
Production systems requiring both speed and horizontal scaling
Lesson 910Session Storage Options
distributed cache layer
moves the cache outside your application servers into a specialized external service.
Lesson 123Distributed Cache Layer (Redis/Memcached)Lesson 124Database Query Result Caching
Distributed caches
Systems designed for shared access across many servers
Lesson 59Externalizing State with Shared Storage
Distributed coordination services
like ZooKeeper and etcd
Lesson 493CP Systems: Prioritizing Consistency
Distributed counters without coordination
Accept that different servers might have slightly stale views
Lesson 1785Non-Functional Requirements: Accuracy vs Performance
Distributed crawling
solves this by:
Lesson 1862Why Distribute a Web Crawler
Distributed Denial-of-Service (DDoS)
attack floods your servers with overwhelming traffic from many sources, trying to make your service unavailable to legitimate users.
Lesson 195CDN for DDoS Protection
Distributed Denial-of-Service (DDoS) attack
against target servers.
Lesson 1840Politeness Requirements for Web Crawling
distributed monolith
occurs when you've adopted microservices architecture—multiple services, separate deployments —but these services remain **tightly coupled** behind the scenes.
Lesson 789The Distributed Monolith Anti-PatternLesson 824Avoiding Distributed Monoliths
Distributed ownership
Multiple teams/services must coordinate (e.
Lesson 598Saga Frameworks and Real-World Adoption
Distributed Politeness Table
Each worker maintains local politeness state but synchronizes with peers.
Lesson 1868Coordinating Politeness Across Workers
Distributed scenarios
If using multiple nodes, verify counters sync properly and don't over-allow or over-restrict
Lesson 997Testing and Monitoring Rate Limiters
Distributed storage
S3, HDFS, or database for multi-node crawlers
Lesson 1849URL Frontier Persistence and Recovery
Distributes certificates
– Securely pushes certificates to each sidecar proxy through encrypted channels
Lesson 844Control Plane: Certificate Management
Distributes traffic
evenly across healthy servers
Lesson 76What Is a Load Balancer?
Distributing root CA
information to all proxies
Lesson 851Mutual TLS (mTLS) Authentication
Distribution challenge
Requires coordination (like distributed ID generation from lesson 1511) to avoid duplicates across servers
Lesson 1516Counter-Based vs UUID Approaches
DMCA compliance workflow
(disable pastes upon valid notices)
Lesson 1581Abuse Prevention and Content Moderation
DNS failover
monitors your primary site's health and automatically updates DNS records when problems arise.
Lesson 1440DNS and Traffic Management in DR
DNS pre-resolution
and **connection pre-warming** to CDN edges
Lesson 1618Optimizing for Mobile Networks
DNS resolution
happens for *every unique domain* you crawl.
Lesson 1856DNS Resolution Fundamentals for Crawlers
DNS resolution delays
Temporary lookup failures
Lesson 1020Why Retries Are Necessary in Distributed Systems
DNS Resolver
Lookup IP addresses efficiently at scale
Lesson 1732Crawling and Document Collection
DNS round-robin
the DNS server has multiple IP addresses registered for one domain name and rotates through them in order.
Lesson 82DNS-Based Load Balancing
DNS-Based Load Balancing
(#116), but adds health awareness and geographic intelligence.
Lesson 117Global Server Load Balancing (GSLB)
DNS-based request routing
to intelligently choose the *best* server for you.
Lesson 180DNS-Based Request RoutingLesson 181Anycast Routing for CDNs
DNS/traffic routing
Update routes to point to DR site
Lesson 1437Failover and Failback Procedures
document
typically JSON or similar formats — that can contain nested objects, arrays, and varying fields.
Lesson 380Document Structure and Schema FlexibilityLesson 383Collections and Databases
Document everything
during tests—actual times, issues encountered, and procedure gaps
Lesson 1438DR Testing Strategies
Document metadata cache
Keep titles and snippets in memory
Lesson 1742Search System Architecture Overview
Document observations
what broke?
Lesson 1345Starting with Game Days
Document outcomes
Did the alert fire in time?
Lesson 1295Testing Alerts and Dry Runs
Document restore procedures
and keep them updated
Lesson 1430Backup Verification and Testing
document store
is a type of NoSQL database that stores data as complete, self-contained **documents**—typically in formats like JSON, BSON (binary JSON), or XML.
Lesson 379What Is a Document Store?Lesson 381Documents vs Rows in Relational Databases
Document stores
(like MongoDB) organize data as self-contained JSON-like documents.
Lesson 419Wide-Column vs Document Stores
Document the process
Ensure your team can restore without the one person who "knows how"
Lesson 1408Backup Verification and Testing
Document your limits
clearly in API documentation
Lesson 960Rate Limit Response Codes
Document-based sharding
splits documents into groups (e.
Lesson 1753Distributed Index Sharding
Documenting your decisions
means writing down *what* you decided, *why* you chose it, and *what alternatives you rejected*.
Lesson 42Document Your Decisions
Domain authority
(well-known sites first)
Lesson 1839FIFO vs Priority-Based Frontier
Domain constraints
Inventory can't go negative, appointments can't overlap
Lesson 1387Custom Merge Functions
Domain rules
Inventory can't go negative; bids only increase
Lesson 1383Application-Level Conflict Resolution
Domain-Based
identification extracts the tenant from the request domain (e.
Lesson 1818Tenant Identification and Context
Domain-Specific Language (DSL)
that abstracts this complexity.
Lesson 722Kafka Streams API
Don't over-index
each index slows down `INSERT/UPDATE/DELETE` operations
Lesson 278Index Strategy for Large Tables
Don't retry
4xx errors (except 408 Request Timeout, 429 Too Many Requests)
Lesson 1026Retry on Which Errors
Don't warm everything
Only cache data with proven access patterns
Lesson 161Cache Warming Strategies
Download bandwidth
Your servers send that photo to 1,000 viewers → much larger requirement
Lesson 26Bandwidth Estimation from Data Size
Downloadable files
(PDFs, ZIPs)
Lesson 173Content Types Suited for CDNs
Downsampling
for efficient historical queries
Lesson 1206Metrics Federation and Long-Term Storage
Downside
Complete processing halt during rebalance
Lesson 717Rebalancing Protocol and Strategies
Downstream service health
protect struggling dependencies
Lesson 972Adaptive Rate Limiting
Downtime
during cutover (or complex dual-write patterns)
Lesson 328Migration and Legacy System Constraints
Downtime avoidance
Users expect 24/7 availability
Lesson 258Resharding and Data Migration
Downtime risk
The record might be temporarily unavailable during the move
Lesson 263Shard Key Immutability Problem
DR asks
"How do we restore our database cluster after the data center floods?
Lesson 1433Disaster Recovery vs Business Continuity
Drop
unwanted fields or entire log entries
Lesson 1151The ELK Stack: Logstash
Drop messages
(losing data)
Lesson 647Message Queue Basics
Drop new logs
Keep historical context
Lesson 1155Log Buffering and Backpressure
Drop oldest logs
Preserve recent data
Lesson 1155Log Buffering and Backpressure
Drop this trace
Sample flag = false
Lesson 1238Span Sampling Decisions
Dropping messages
means data loss—unacceptable for critical operations like payments or orders.
Lesson 1080Queue Saturation and Backpressure Loss
Dry runs
are practice incidents where teams rehearse their response procedures without actual customer impact.
Lesson 1295Testing Alerts and Dry Runs
Dual writes problem
Writing to two systems separately (database, then broker) creates a consistency gap.
Lesson 688Transactional Semantics
Dual-read/write periods
Applications may need to check both old and new locations temporarily
Lesson 258Resharding and Data Migration
Dual-write phase
Write to both old and new shards while copying historical data
Lesson 258Resharding and Data Migration
duplicate
returns the stored result without re-executing
Lesson 1003Idempotency KeysLesson 1010Idempotency Keys for POST Requests
Duplicate Critical Fields
means intentionally copying certain data across multiple tables so you can retrieve everything you need without performing joins.
Lesson 293Duplicate Critical FieldsLesson 297Denormalization in Practice
Duplicate Detection
automatically identifies and discards messages with the same `MessageId` within a configurable time window—critical when exactly-once processing matters.
Lesson 675Azure Service Bus FeaturesLesson 1732Crawling and Document Collection
Duplicate Logic
Your "calculate daily revenue" logic exists in both the batch codebase and the streaming codebase.
Lesson 751Lambda Architecture Tradeoffs
Duplication
Every service reimplements similar rules
Lesson 941Policy Decision Points (PDP) and Enforcement Points (PEP)
Duration distribution
(p50, p95, p99)
Lesson 1265RED Method: Rate, Errors, Duration
Duration increases
→ service degrading, database slow, or resource contention
Lesson 1265RED Method: Rate, Errors, Duration
Durations
`<operation>_duration_seconds` → `request_duration_seconds`
Lesson 1182Metric Naming Conventions
During normal operation (Else)
Should I optimize for lower latency or stronger consistency?
Lesson 520Practical PACELC Analysis for Design Decisions
During partition
Now you must choose—wait for consistency (CP) or serve potentially stale data (AP)
Lesson 504Why 'Choose Two' is Oversimplified
Dynamic authorization code flow
When a user from Tenant X logs in, redirect them to *their* configured IdP.
Lesson 932Multi-Tenant OAuth2 and Identity Federation
dynamic configuration
it can update routing rules, health checks, and load balancing algorithms on-the-fly without restarts.
Lesson 115Envoy Proxy ArchitectureLesson 840Data Plane: Envoy Proxy Fundamentals
Dynamic Configuration via APIs
Envoy's xDS (discovery service) APIs allow a central control plane to push configuration changes in real-time.
Lesson 115Envoy Proxy Architecture
Dynamic content acceleration
capabilities
Lesson 191CDN Provider Feature Comparison
Dynamic Partition Splitting
Start with fewer partitions that automatically split when they grow too large.
Lesson 1485Rebalancing Partitions
Dynamic rebalancing
through directory updates without full resharding
Lesson 1480Hybrid Partitioning Approaches
Dynamic Resolution
The sidecar queries the control plane for current Service B instances
Lesson 832Service Discovery in a Mesh
Dynamic subscriptions
Subscribers can join or leave without affecting publishers
Lesson 656Pub-Sub Pattern Fundamentals
Dynamic Updates
Change timeout values without restarting services—the mesh propagates updates to all sidecars in real-time.
Lesson 1126Timeout Configuration in Service Mesh

E

E-commerce checkout
If the recommendation service fails, show a static "popular items" list instead of personalized suggestions—but keep the checkout flow working
Lesson 1336Graceful Degradation
E-commerce order processing
– inventory counts and order placement must be precise
Lesson 318When to Choose ACID or BASE
E-commerce orders
Shard key = `(region, order_id)` → enables regional queries, avoids global scans.
Lesson 245Composite Shard KeysLesson 1411Defining Recovery Point Objective (RPO)
Each cache invalidates
its local copy of product 456
Lesson 158Event-Based Invalidation
Eager deletion
reclaims storage promptly and keeps the database clean.
Lesson 1567Lazy vs Eager Deletion Strategies
Eager Deletion (Scheduled Cleanup)
Background jobs periodically scan for expired pastes and proactively remove them.
Lesson 1567Lazy vs Eager Deletion Strategies
Eager Rebalancing (Stop-the-World)
Lesson 717Rebalancing Protocol and Strategies
Early rejection
prevents wasted processing on doomed requests
Lesson 859Rate Limiting at Service BoundariesLesson 886Request Validation
Early stages
You might start with basic SLIs like overall HTTP success rate because you lack granular data or understanding of user journeys.
Lesson 1284Iterating on SLIs and SLOs
Early termination
Stop scoring after finding the top K results
Lesson 1741Search Latency and Response Time
Early-stage product
Start with a single gateway.
Lesson 908When to Use BFF Pattern
Easier data consistency
No multi-region write coordination
Lesson 1436Active-Passive vs Active-Active DR
Easier debugging
You can see exactly where a saga failed
Lesson 591Orchestration-Based Sagas
Easier Onboarding
New team members can become productive quickly by focusing on one service rather than learning an entire monolith.
Lesson 797Improved Code Maintainability
Easier Rollbacks
If something goes wrong in production, rolling back is straightforward—revert to the previous single artifact.
Lesson 783Deployment Simplicity: Monolith Advantage
Easier to change
Modifications have predictable effects, reducing risk
Lesson 1315Simplicity as a Core Value
Easier to monitor
Fewer moving parts means clearer signals and less noise
Lesson 1315Simplicity as a Core Value
Easier to recover
Fewer failure modes and clearer recovery paths
Lesson 1315Simplicity as a Core Value
Easier to understand
New team members ramp up faster; on-call engineers diagnose issues quickly
Lesson 1315Simplicity as a Core Value
Easy communication
Simpler designs are easier to explain and discuss with interviewers or teammates
Lesson 34Start Simple: The Minimum Viable Design
Easy debugging
All logs and metrics in one place
Lesson 1791Single Data Center vs Distributed Setup
Easy enumeration
All completions live in one subtree
Lesson 1758Trie Data Structure for Prefix Matching
Easy Indexing
Log storage systems automatically index JSON fields.
Lesson 1138JSON as Log Format
Easy rebalancing
Just update the directory entries—no mathematical recalculation needed
Lesson 1476Directory Partitioning Fundamentals
Easy routing
Most frameworks handle path-based routing naturally
Lesson 1899URI Versioning (Path-Based)
Easy scaling
Spin up more instances in minutes
Lesson 108Hardware vs Software Load Balancers
Easy state changes
Update permissions in one place, affect all requests
Lesson 916Session vs Token Tradeoffs
Easy to configure
uses simple configuration files, not complex GUIs
Lesson 111NGINX as a Load Balancer
Easy to reason about
for users
Lesson 967Fixed Window Counter
Easy to understand
The mental model is straightforward—everyone talks to everyone
Lesson 1369Multi-Leader Topologies: All-to-All
EC
Even during normal operation (the "Else" clause), it sacrifices **latency** for consistency
Lesson 518PC/EC Systems: Consistency Always
Economic Defense
Without rate limiting, a single misbehaving client (malicious or buggy) can rack up massive infrastructure costs or degrade service for everyone.
Lesson 1782Rate Limiter Service Overview
Edge caches
sit closest to users (geographically distributed)
Lesson 1611Multi-Tier Caching Architecture
Edge filtering
The CDN analyzes incoming requests at edge locations using rate limiting, pattern detection, and behavioral analysis.
Lesson 189DDoS Protection and Security at CDN Edge
Edit distance
measures how many single-character operations (insert, delete, substitute) transform one word into another.
Lesson 1774Spell Correction and Query Expansion
Efficient archival
Drop or move old partitions wholesale
Lesson 1473Range Partitioning Benefits
Efficient for celebrities
– avoids pushing to millions of followers
Lesson 1647Fanout-on-Read (Pull Model)
Efficient for frequent backups
hourly backups become practical
Lesson 1422Incremental Backup Strategy
Efficient range queries
Fetching all users born between 1990–1995 only touches one or two partitions
Lesson 1451Range-Based PartitioningLesson 1471Range Partitioning Fundamentals
Efficient reads
GFS handles file serving, block caching, and network optimization
Lesson 446SSTable and GFS Dependencies
Efficient resource use
All infrastructure serves production workload
Lesson 1436Active-Passive vs Active-Active DR
Egress (outgoing)
Data leaving when your system responds with redirects
Lesson 1499Bandwidth Requirements for Redirects
Egress cost savings
Cloud providers charge heavily for data leaving a region (egress fees).
Lesson 1626Geolocation-Based Storage
Egress/transfer costs
Initial replication bandwidth
Lesson 1631Multi-Region Replication Strategy
Elastic Load Balancer (ELB)
and **Application Load Balancer (ALB)**.
Lesson 113Cloud Load Balancers (AWS ELB/ALB)
Elastic response
Auto-scale hot services during traffic spikes without touching stable services
Lesson 795Independent Scaling
Election fails
→ Term ends, new term begins
Lesson 620Terms: Logical Time in Raft
Election Frequency
How often leadership changes occur.
Lesson 643Monitoring and Operating Consensus Clusters
Election Restriction
ensures new leaders have all committed entries
Lesson 630Safety Argument: Committing Entries from Current Term
Election succeeds
→ One leader for the term
Lesson 620Terms: Logical Time in Raft
Else
(no partition), choose Latency or Consistency.
Lesson 516The 'Else' Clause: Normal Operation Tradeoffs
Email addresses
effectively infinite
Lesson 1178Metric Cardinality and Labels
Email addresses or UUIDs
Guaranteed uniqueness means guaranteed cardinality explosion.
Lesson 1211Avoiding High-Cardinality Labels
Email notifications
Sending duplicate emails creates poor user experience
Lesson 1001Side Effects and Idempotency
Email Services (SendGrid, SES)
Lesson 1691Rate Limits per Channel
EmailOperator
Sends email notifications
Lesson 767Airflow Operators and Executors
Embed related data
in document stores for single-read access
Lesson 297Denormalization in Practice
Emergency debugging
Structured fields in a human-scannable format
Lesson 1166Human-Readable vs Machine-Parseable
Enable API validation tools
to ensure requests/responses match the spec
Lesson 1885API Documentation with OpenAPI/Swagger
Enable faster incident response
Automated remediation runs in seconds, not minutes
Lesson 1308The SRE Philosophy: Treating Operations as Software
Enables seamless scaling
by adding/removing servers without changing client configuration
Lesson 76What Is a Load Balancer?
Encode
Convert to Base62 for a URL-friendly short code
Lesson 1508Hash-Based Generation Approach
Encourages honesty
When engineers fear punishment, they hide mistakes or provide incomplete information.
Lesson 1351Blameless Postmortem Culture
encryption
(protecting data in transit), **authentication** (verifying identities), and **authorization** (controlling who can do what).
Lesson 727Kafka Security: Authentication and EncryptionLesson 851Mutual TLS (mTLS) Authentication
End-to-end latency
Measure the time from metric emission to dashboard visibility.
Lesson 1218Testing Metric Pipelines
End-to-end testing
amplifies this problem.
Lesson 806Testing Complexity
Endpoint-level limits
protect expensive operations differently than cheap ones (e.
Lesson 973Multi-Tier Rate Limiting
Enforced delays
Between requests from the same queue, insert a delay (e.
Lesson 1841Single-Host Queue Pattern
Enforcement
Can actually reject requests and return 429 (Too Many Requests)
Lesson 1789Client-Side vs Server-Side Rate Limiting
Enforcement in the frontier
Before dispatching a URL from a single-host queue, check the timestamp of the last request to that host.
Lesson 1842Politeness Budget and Crawl Delay
Enforces per-hop timeouts
without service code changes
Lesson 1101Timeout Propagation in Service Meshes
Engagement Metrics
Likes, comments, shares, saves, and click-through rates.
Lesson 1666Ranking Signals and Features
Engagement rate
(clicks, conversions)
Lesson 1729Analytics-Driven Optimization
Engagement signals
Likes, comments, shares from your network
Lesson 1665Feed Ranking Fundamentals
Engineering
Building automation, improving systems
Lesson 1312Measuring and Reducing Toil
Engineering Feasibility
Your SLOs must account for dependencies you can't control.
Lesson 1276Setting Realistic SLOs
Engineering teams
traditionally want perfect reliability.
Lesson 1282Error Budget as a Shared Currency
Enqueue
Add newly discovered URLs to a queue for future crawling
Lesson 1732Crawling and Document Collection
Enqueues the URL
to that specific back queue
Lesson 1846Queue Router and Host Mapping
Enrich
logs with additional context (geolocation, user lookup)
Lesson 1151The ELK Stack: Logstash
Ensure database indexes exist
for all filterable and sortable fields.
Lesson 1897Performance Considerations and Limits
Enterprise data
connects customers, products, suppliers, and regulations
Lesson 458Use Cases: Fraud Detection and Knowledge Graphs
Enterprise Integration Patterns (EIP)
design patterns for system integration:
Lesson 671ActiveMQ and Traditional Enterprise Messaging
Enterprise users
10,000 requests/hour (or custom limits)
Lesson 990Tiered Rate Limits for Different User Classes
Entity + Range
Shard by customer ID (entity-based) to keep all customer data together, then range-shard historical data by timestamp to archive old records efficiently.
Lesson 250Hybrid Sharding Strategies
Entity integrity
is a fundamental database principle: the primary key of a table must never be null or duplicated.
Lesson 299Primary Keys and Entity Integrity
Entity-based
keeps related data together for transactions
Lesson 253Evaluating Sharding Strategy Tradeoffs
Environment
production, staging, etc.
Lesson 1161Context-Rich Logging
Environment attributes
Time, location, IP address, device type
Lesson 935Attribute-Based Access Control (ABAC) Introduction
Envoy
is an open-source, high-performance proxy designed for microservices and service mesh architectures.
Lesson 115Envoy Proxy ArchitectureLesson 856Observability: Metrics Collection
Envoy Integration
For advanced scenarios, Consul Connect can configure Envoy as the sidecar proxy instead, giving you all of Envoy's sophisticated traffic management features while Consul handles service discovery and certificate management.
Lesson 863Consul Connect: HashiCorp's Approach
Envoy Proxy
is a high-performance C++ proxy originally developed at Lyft.
Lesson 897Envoy Proxy for API ManagementLesson 1062Circuit Breaker Libraries and Frameworks
Equality
`field=value` or `field_eq=value`
Lesson 1892Filtering Query Parameters
ERROR
Serious problems that require attention but the service continues
Lesson 1141Log Levels in Structured Logs
error budget
is the mathematical inverse of your SLO — it represents the amount of "failure" you can afford before breaking your reliability promise.
Lesson 1279Error Budgets: The Core ConceptLesson 1280Calculating and Tracking Error Budget
Error Budget Policies
are predefined agreements that answer these questions by establishing team behaviors and priorities tied to budget health.
Lesson 1281Error Budget PoliciesLesson 1350What is a Postmortem?
Error Code
A machine-readable identifier (e.
Lesson 1883Error Response Structure and Consistency
Error codes
– Application-specific codes that map to documentation or runbooks (e.
Lesson 1142Logging Exceptions and Stack Traces
Error information
`error=true`, `error.
Lesson 1225Span Attributes and Tags
Error injection
Force downstream services to return errors that count toward your failure threshold
Lesson 1065Testing Circuit Breaker Behavior
Error logs
30-90 days typically suffice for investigating recent incidents
Lesson 1135Log Retention and Volume Management
Error message
– The human-readable description from the exception itself.
Lesson 1142Logging Exceptions and Stack Traces
Error policies
define fallback behavior when things go wrong
Lesson 899Azure API Management Features
Error Rate SLO
tracks the percentage of requests that fail.
Lesson 1278Multiple SLOs for Comprehensive Coverage
Error responses
Return HTTP 503 for 20% of calls to test retry logic
Lesson 858Fault Injection for Testing
Errors and exceptions
Stack traces, error codes, context about what the system was attempting.
Lesson 1129What to Log vs What Not to Log
Errors spike
→ bugs deployed, dependencies failing, or capacity exceeded
Lesson 1265RED Method: Rate, Errors, Duration
Escalation paths
If the primary responder doesn't acknowledge, does it escalate?
Lesson 1295Testing Alerts and Dry Runs
Escalation policies
when alerts go unacknowledged
Lesson 1305On-Call Tooling and Automation
ETag
(entity tag) is a unique identifier for a resource version.
Lesson 121Browser Caching and HTTP HeadersLesson 1570CDN Cache Control Headers
Even load distribution
Hash function spreads keys uniformly
Lesson 1806Rate Limiting with Consistent Hashing
Event broadcasting
Publish an event that all cache layers listen to
Lesson 163Multi-Level Cache InvalidationLesson 357Redis Pub/Sub for Real-Time Messaging
Event Capture
happens at multiple layers:
Lesson 954Distributed Auth Audit Logging
Event occurs
Backend mutation triggers event
Lesson 1915GraphQL Subscriptions for Real-Time Data
Event Publishing
When someone posts, the fanout service doesn't just write to timelines—it also publishes an event (e.
Lesson 1672WebSocket Architecture for Live Updates
event sourcing
records every state change as an immutable event in an append-only log.
Lesson 586Alternative: Event Sourcing for ConsistencyLesson 720Log Compaction
Event streaming
A river flows continuously, and anyone can drink from it at any time
Lesson 690What is Event Streaming?
Event streams
are *data-oriented*.
Lesson 698Streaming vs Message Queues
Event time
when the event actually occurred (timestamp on the package)
Lesson 770Apache Flink Architecture
Event time semantics
Handle late-arriving data gracefully regardless of processing mode
Lesson 756Hybrid and Modern Alternatives
Event-driven
Each request triggers a function execution
Lesson 895AWS API Gateway and Serverless Integration
event-driven architectures
where adding a new feature often means adding a new subscriber, not modifying existing services.
Lesson 662Fan-Out with Pub-SubLesson 732Google Cloud Pub/Sub
Event-Driven Revocation
Publish revocation events to a message bus (Kafka, RabbitMQ).
Lesson 948Token Revocation at Scale
Event-driven warming
triggers cache loading when certain events occur—a new product launch, a viral post, or a scheduled sale—preloading data you *know* will be requested heavily.
Lesson 140Cache Warming Strategies
Events are facts
"OrderPlaced at 10:05am" cannot be undone, only compensated with a new event like "OrderCancelled"
Lesson 586Alternative: Event Sourcing for Consistency
Eventual consistency acceptable
(social feeds, analytics): **Asynchronous replication** works beautifully—prioritize speed and availability over immediate consistency.
Lesson 1364Choosing a Replication Mode
Eventual consistency is acceptable
A few milliseconds delay before a new short URL appears on replicas doesn't matter
Lesson 1522Read-Heavy Workload and Database Scaling
Eventual consistency OK
→ CDN or browser cache with longer TTLs
Lesson 130Choosing the Right Caching Layer
Eventual read
"Give me whatever's available on the nearest replica"
Lesson 1398Consistency Level Per-Operation
Eventual system failure
when resources are exhausted
Lesson 1211Avoiding High-Cardinality Labels
Eventually consistent
The system will become consistent eventually, but not immediately
Lesson 314BASE Properties Overview
Eventually consistent reads
(default): Lower latency, may not reflect recent writes
Lesson 554Consistency Model Examples in Real Systems
Eventually consistent session stores
Use global distributed caches (like DynamoDB Global Tables or Cassandra) that replicate session state, accepting brief inconsistency windows.
Lesson 952Cross-Region Authentication
Eventually-consistent data
(social feeds, recommendations, DNS) → **AP**
Lesson 503Choosing Between CP and AP
Evicts cold URLs automatically
Links that haven't been accessed in a while get pushed out organically
Lesson 1525Cache Eviction Policy for URL Shortener
Evolve services independently
without breaking existing clients
Lesson 882Request and Response Transformation
Exactly one consumer
should process each message
Lesson 664Choosing Between Queue and Pub-Sub
Exactly one leader
is elected per term (logical time period)
Lesson 636Consensus for Leader Election
Exactly-once
is necessary when duplicates are unacceptable and natural idempotency is hard: financial transactions, inventory updates, or billing events.
Lesson 689Choosing Delivery SemanticsLesson 718Exactly-Once Semantics (EOS)
Example (conceptual)
With RF=3, the coordinator places one copy on the primary node (determined by the partition key), then one each on the next two nodes clockwise around the ring.
Lesson 424Replication Strategy and Factor
Example aggregation table schema
Lesson 1726Aggregation and Reporting
Example concept
(descriptive, not runnable):
Lesson 456Graph Query Languages: Cypher and Gremlin
Exception type
– The class name (e.
Lesson 1142Logging Exceptions and Stack Traces
Exceptions
Capture when errors occur without ending the span
Lesson 1234Span Events and Logs
Exclusive
One consumer per topic (like Kafka with one consumer group)
Lesson 731Pulsar's Unique Features
Execute locally first
Each system commits its part independently
Lesson 583Alternative: Best Effort with Eventual Consistency
Execute statements
Perform inserts, updates, deletes
Lesson 310Atomicity: All-or-Nothing Transactions
Execution status
of each transaction (pending, completed, failed, compensated)
Lesson 597Saga State Management and Persistence
Exhausting resources
means consuming CPU, memory, disk I/O, or file descriptors until components struggle.
Lesson 1347Common Chaos Experiments
Expanding enums
Adding new status values (if clients handle unknowns gracefully)
Lesson 1905Breaking vs Non-Breaking Changes
Expected Load
Start with your baseline traffic patterns.
Lesson 1073Bulkhead Sizing: Balancing Isolation and Utilization
Expensive bandwidth
Every gigabyte served directly from your origin incurs cloud egress fees (often $0.
Lesson 1609Why CDNs Are Essential for Media Hosting
Expensive computations
you don't want to repeat for every user
Lesson 888Caching at the Gateway
Expensive queries
Complex aggregations, reporting queries
Lesson 124Database Query Result Caching
Expensive reads
Every paste retrieval loads megabytes through the database, wasting I/O bandwidth
Lesson 1550Object Storage for Paste Content
Expensive writes
celebrities with millions of followers create massive fan-out
Lesson 1638Push (Write-Time) Feed Model
Experimentation
(testing new technologies)
Lesson 1279Error Budgets: The Core Concept
Expiration
The indexed `expires_at` field enables efficient cleanup jobs that periodically delete or recycle expired links.
Lesson 1519Database Schema for URL Shortener
Expiration timestamp
(how long it's valid)
Lesson 1627Access Control and Signed URLs
Expire old entries
after a time window (e.
Lesson 1714Client-Side Deduplication
Expires
is the older header specifying an absolute date/time when the resource becomes stale (mostly replaced by `Cache-Control`).
Lesson 121Browser Caching and HTTP HeadersLesson 1570CDN Cache Control Headers
EXPLAIN
shows the query plan without running it:
Lesson 469Indexes and Query Performance
Explicit Boundaries
Clear interfaces between contexts prevent model confusion
Lesson 815Domain-Driven Design and Bounded Contexts
Explicit contracts
Each media type can have its own documented schema
Lesson 1902Content Negotiation with Media Types
Explicit Defaults
Return explicit values rather than relying on implicit behavior.
Lesson 1919API Design for Polyglot Clients and Backwards Compatibility
Explicit overrides
Allow child resources to override inherited permissions when needed
Lesson 939Permission Inheritance and Hierarchies
Explicit proxying
requires your application to be configured to send traffic directly to the proxy.
Lesson 831Transparent vs Explicit Proxying
Explicit Renewal Mechanism
Require users to actively renew pastes beyond a certain period (e.
Lesson 1573Handling Never-Expiring Pastes
Exploratory analysis
Try different schemas on the same raw data
Lesson 759Schema-on-Write vs Schema-on-ReadLesson 762Query Performance Tradeoffs
Export
The completed span is sent to your tracing backend for storage and analysis
Lesson 1231Span Lifecycle and Structure
exporters
to send data to any backend (Prometheus, Grafana, CloudWatch, etc.
Lesson 1205OpenTelemetry Metrics SDKLesson 1240OpenTelemetry Overview
Extensible
Powerful filter chain architecture allows custom logic via WebAssembly or native extensions
Lesson 840Data Plane: Envoy Proxy Fundamentals
External API calls
Third-party service limits concurrent connections
Lesson 971Concurrency Limiter Pattern
External calls
Note when you initiated a third-party API call
Lesson 1234Span Events and Logs
External system interactions
API calls (with correlation IDs), database queries that fail, third-party service responses.
Lesson 1129What to Log vs What Not to Log
Extra lookup overhead
Every request requires two hops (directory, then shard)
Lesson 242Directory-Based Sharding
Extra uncommitted entries
(from a failed leader that never committed them)
Lesson 629Log Inconsistencies and Repair
Extract content
from the fetched page (after HTML parsing)
Lesson 1852Content Fingerprinting with Hashing
Extract features
For each query-document pair, compute signals like BM25 score, document freshness, click- through rate, time-on-page, and domain authority
Lesson 1781Machine Learning for Ranking
Extract links only
– Some PDFs contain URLs worth discovering
Lesson 1833Content Type Detection
Extract the deadline
from incoming request context (header, metadata, etc.
Lesson 1110Calculating Remaining Time
Extract the notification ID
from the payload
Lesson 1714Client-Side Deduplication
Extracted functional requirements
Lesson 10Identifying Functional Requirements
Extreme performance
Purpose-built chips can handle millions of connections
Lesson 108Hardware vs Software Load Balancers
Extremely lagging replicas
can be temporarily removed from rotation
Lesson 218Lag-Aware Load Balancing

F

Fact tables
contain the measurable events or transactions—sales amounts, click counts, temperatures.
Lesson 760Data Warehouse Architecture
Fail-open defaults
when rate limiter is unreachable
Lesson 1784Non-Functional Requirements: Latency and Availability
Failback
is the reverse—returning operations from the DR site back to the restored primary site.
Lesson 1437Failover and Failback Procedures
Failed
The notification could not be delivered—device token invalid, phone number inactive, email bounced, etc.
Lesson 1724Notification Analytics Events
Failed requests
(elevated error rates)
Lesson 1286Symptoms vs Causes
Failover is lossy
– Promoting a replica means accepting data loss
Lesson 1356Asynchronous Replication: Speed and Risk
Failover Mechanisms
Use global load balancers or DNS-based routing (with health checks) to automatically redirect traffic when a region fails.
Lesson 1435Multi-Region Architecture for DR
Fails fast
– Users get instant error responses instead of hanging for 30+ seconds waiting for timeouts
Lesson 1046The Three States: Open
Failure correlation
When Service D fails, which upstream services suffer?
Lesson 1229Service Dependency Graphs
Failure count
(or rate) in the current window
Lesson 1056Circuit Breaker State Machine
Failure count threshold
is the simplest approach: open the circuit after N consecutive failures (e.
Lesson 1048Failure Thresholds and Detection
Failure Detection
If health checks fail repeatedly, the DNS service marks that endpoint as unhealthy
Lesson 1440DNS and Traffic Management in DR
Failure detection and counting
determines *how* the circuit breaker recognizes problems, accumulates evidence, and decides when a downstream service is unhealthy enough to open the circuit.
Lesson 1057Failure Detection and Counting
Failure handling
Server 3 crashed—how do other servers continue working?
Lesson 49Application Complexity Trade-offs
Failure isolation
If a replica goes down, your primary pool remains unaffected
Lesson 221Application-Level Connection ManagementLesson 648Decoupling Through Messaging
Failure patterns
reveal whether problems are isolated incidents or systemic.
Lesson 107Monitoring Health Check Metrics
Failure rate
percentage of requests failing
Lesson 1055Circuit Breaker Observability
Failure rate threshold
is more sophisticated: open when the error rate exceeds a percentage over a time window (e.
Lesson 1048Failure Thresholds and Detection
Failure Scenarios
Model what happens when a dependency hangs.
Lesson 1073Bulkhead Sizing: Balancing Isolation and Utilization
Failure sensitivity
How many failures re-open (often 1)
Lesson 1052Circuit Breaker Reset Logic
Failure threshold
How many consecutive failures trigger removal (e.
Lesson 103Marking Servers UnhealthyLesson 106Health Check False Positives and Flapping
failure thresholds
that define when enough errors have occurred to warrant opening the circuit.
Lesson 1048Failure Thresholds and DetectionLesson 1066Tuning for Production Workloads
Failures
If a middle replica fails, the chain reconnects around it (A → C).
Lesson 1362Chain Replication
Failures spread
slow backends tie up connections, exhausting resources across the system
Lesson 105Graceful Degradation and Circuit Breaking
Fair resource allocation
Power users running expensive operations don't get the same treatment as those making lightweight calls.
Lesson 992Cost-Based Rate Limiting
Fair usage
Ensures one client can't monopolize resources
Lesson 955What is Rate Limiting?
Fallback Mechanisms
WebSockets can fail.
Lesson 1672WebSocket Architecture for Live Updates
Familiar SQL interface
for developers
Lesson 332The NewSQL Value Proposition
Fan-out
One published message reaches many subscribers
Lesson 656Pub-Sub Pattern FundamentalsLesson 662Fan-Out with Pub-Sub
Fan-out broadcasting
One message reaches all interested services
Lesson 663Hybrid Patterns: Topic + Queue
Fan-out on write
Pre-compute and distribute celebrity updates across shards when they post, rather than having everyone query one shard.
Lesson 257Celebrity Problem in Social Graphs
Fan-out scenarios
One event triggers multiple downstream actions
Lesson 654When to Use Async vs Sync
Fanout Completion Time
measures how long it takes from post creation to the last follower receiving it in their feed.
Lesson 1657Measuring Fanout Performance
Fanout-on-Read
(also called the **Pull Model**), when a user creates a post, you simply store it once in a central location (like a `posts` table).
Lesson 1647Fanout-on-Read (Pull Model)Lesson 1648Hybrid Fanout StrategyLesson 1665Feed Ranking Fundamentals
Fast burn (1-hour window)
If you see 1% errors over 1 hour, you're burning budget 10× faster than sustainable—alert immediately
Lesson 1289Multi-Window and Multi-Burn-Rate Alerting
Fast failure detection
You want to know immediately when a server goes down so traffic stops routing there
Lesson 100Health Check Intervals and Timeouts
Fast prefix matching
Walk down the prefix path once
Lesson 1758Trie Data Structure for Prefix Matching
Fast reads, slower writes
Set W=N, R=1 (write to all, read from one)
Lesson 365Tunable Consistency with Quorum Reads and Writes
Fast single-user queries
All posts for one user are co-located
Lesson 1661Timeline Schema Design
Fast startup
Quickly spin up instances for development or testing
Lesson 734NATS Streaming
Fast user experience
Post creation returns in milliseconds, not seconds
Lesson 1651Asynchronous Fanout Processing
Fast user feedback
Upload returns success instantly; thumbnails appear later
Lesson 1595Thumbnail and Preview Generation Trigger
Fast writes, slower reads
Set W=1, R=N (write to one, read from all)
Lesson 365Tunable Consistency with Quorum Reads and Writes
Faster CPUs
handle complex queries more efficiently
Lesson 54Scaling Databases: Special Considerations
Faster decisions
No constant cross-team coordination for every change
Lesson 788Organizational Alignment: Conway's Law
Faster failover
Promote a replica to primary quickly to restore service for that shard's users
Lesson 266Shard Failure and Partial Outages
Faster iteration
You can quickly sketch the basic architecture and then evolve it based on your estimated load, performance requirements, or constraints
Lesson 34Start Simple: The Minimum Viable DesignLesson 820When a Monolith is the Right ChoiceLesson 906BFF Ownership and Team Structure
Faster load times
Users get files from nearby edge locations
Lesson 173Content Types Suited for CDNs
Faster operations
No system-wide reshuffling required
Lesson 1461Removing Nodes Gracefully
Faster recovery
If a server fails, its load spreads across many others instead of overwhelming a single neighbor
Lesson 363Virtual Nodes and Load Distribution
Faster releases
No waiting for other teams to be "ready"
Lesson 786Independent Deployability of Microservices
Faster than full backups
Only changed data is backed up
Lesson 1404Differential Backups
Fastest backup time
smallest data volume each run
Lesson 1422Incremental Backup Strategy
Fastest restore times
– everything's in one place
Lesson 1402Full Backups
FATAL
Critical failures that force the application to terminate
Lesson 1141Log Levels in Structured Logs
Father (weekly backups)
Keep 4-5 weekly backups (typically the last backup from each week).
Lesson 1431Backup Retention Policies
Fault injection
is the practice of intentionally causing failures in production-like environments to test whether your fault-tolerant design holds up under real conditions.
Lesson 1342Testing Redundancy with Fault Injection
Fault tolerance increases
You can survive hardware failures without data loss
Lesson 68What is Data Replication?
Favor boring technology
Proven, well-understood tools beat novel ones
Lesson 1315Simplicity as a Core Value
Feature Needs
Do you need simple routing and rate limiting, or enterprise features like monetization, developer portals, and advanced analytics?
Lesson 901Choosing the Right API Gateway Technology
Feature store
Pre-compute slow features offline
Lesson 1781Machine Learning for Ranking
Federated token refresh
Store mappings between your refresh tokens and external IdP refresh tokens, allowing seamless session extension across federated boundaries.
Lesson 932Multi-Tenant OAuth2 and Identity Federation
Federation
lets one Prometheus scrape metrics from other Prometheus servers.
Lesson 1206Metrics Federation and Long-Term Storage
Fetch
Download the page content via HTTP/HTTPS
Lesson 1732Crawling and Document Collection
Fetch the destination URL
when a short link is created
Lesson 1538Link Preview and Metadata
Fetcher Pool
Distributed workers downloading pages
Lesson 1732Crawling and Document Collection
Fewer round-trips
by aggregating related data in one query
Lesson 1910GraphQL Fundamentals and Query Language
Field information
title vs body vs metadata
Lesson 1735Inverted Index Structure
Field mapping
Rename or restructure response fields for client convenience
Lesson 882Request and Response Transformation
Field traversal
when logging complex nested objects
Lesson 1143Performance Impact of Structured Logging
Field validation
Only allow sorting on indexed fields to avoid performance issues
Lesson 1894Sorting Query Parameters
Field-level indexes
for filterable attributes (category, brand, price_bucket, etc.
Lesson 1775Faceted Search and Filters
Field-level redaction
Before logging, replace sensitive values with placeholders like `[REDACTED]` or hash them irreversibly.
Lesson 1145Sensitive Data in Structured Logs
FIFO
just needs a simple queue—blazing fast but accuracy suffers since it ignores access patterns entirely.
Lesson 154Implementation Tradeoffs
FIFO queues
guarantee exactly-once processing and strict ordering within a message group.
Lesson 669Amazon SQS Architecture
File integrity scans
Detecting bit rot or storage media degradation
Lesson 1408Backup Verification and Testing
File Type Validation
Verify the file extension *and* MIME type match expected formats (JPEG, PNG, MP4, etc.
Lesson 1592Upload Validation and Virus Scanning
File Type Verification
checks that uploads match allowed formats.
Lesson 1599Upload Validation and Virus Scanning
Filter first, paginate second
Apply `WHERE` clauses before `LIMIT/OFFSET` to reduce the working set
Lesson 1897Performance Considerations and Limits
Filters unnecessary fields
to reduce payload size
Lesson 905BFF Implementation Patterns
Finally, evaluate failure scenarios
Lesson 1364Choosing a Replication Mode
Financial ledger
You need consistency during partitions (PC) and likely during normal ops too (EC) → PC/EC system like traditional RDBMS with strong replication
Lesson 520Practical PACELC Analysis for Design Decisions
Financial losses
for customers
Lesson 1002The Double-Charge Problem
Financial operations
Money movement, account balances, billing
Lesson 322Transaction Requirements and Trade-offs
Financial systems
requiring ACID transactions across distributed data (banking, trading platforms, payment processors)
Lesson 337When to Choose NewSQL
Find a user's rank
Near-instant lookup
Lesson 359Redis for Leaderboards and Counting
Find hidden dependencies
that break when a component fails
Lesson 1343What is Chaos Engineering?
Find Responsible Node
Walk clockwise on the ring until you hit a node; that node owns this URL's deduplication state
Lesson 1854Distributed URL Deduplication
Find the divergence point
The leader maintains a `nextIndex` for each follower (initially set to the leader's last log index + 1).
Lesson 629Log Inconsistencies and Repair
Find the node
Look up which Redis node owns that hash range
Lesson 1806Rate Limiting with Consistent Hashing
Firewall timeouts
may close connections after periods of inactivity
Lesson 271Connection Validation and Stale Connections
First attempt
Token is new → process the request, store the token and result
Lesson 1027Idempotency Tokens in Retry LogicLesson 1711Idempotency Keys for Notifications
First check
You see $500 (replica A, up-to-date)
Lesson 535Monotonic Reads
First level (Range)
Partition by `order_date` into monthly buckets
Lesson 1453Composite Partitioning
First Normal Form (1NF)
Eliminate repeating groups—each cell contains a single atomic value, not lists
Lesson 302Normalization Fundamentals
First read
Hits Replica B (caught up through transaction #150)
Lesson 1360Monotonic Reads Across Replicas
first-class citizens
stored explicitly as data structures with their own identity, properties, and direct pointers between nodes.
Lesson 454When Relationships Are First-Class CitizensLesson 472Social Networks and Friend-of- Friend Queries
First-party fraud rings
Networks of accounts controlled by one person making coordinated purchases
Lesson 474Fraud Detection Through Pattern Matching
Fixed delays
cause synchronized retry storms
Lesson 1023Exponential Backoff Fundamentals
Fixed limits
require guessing capacity in advance — set them too high and you risk overload during incidents; too low and you waste capacity during healthy periods.
Lesson 972Adaptive Rate Limiting
Fixed Number of Partitions
Create many more partitions than nodes from the start (e.
Lesson 1485Rebalancing Partitions
Fixed size
Any content → exactly 256 bits (32 bytes)
Lesson 1852Content Fingerprinting with Hashing
fixed window counter
algorithm splits time into equal, non-overlapping intervals (windows) — say, 1-minute chunks.
Lesson 967Fixed Window CounterLesson 968Sliding Window LogLesson 975Algorithm Selection CriteriaLesson 1813Memory Footprint per User and Limits
Flat-rate tiers
(some Cloudflare plans): predictable monthly cost
Lesson 191CDN Provider Feature Comparison
Flattening
the hierarchy at query time or cache-load time, so you store the complete permission set
Lesson 934RBAC Implementation Patterns
Flexible Content Negotiation
Use `Accept` headers to let clients request JSON, Protocol Buffers, or XML.
Lesson 1919API Design for Polyglot Clients and Backwards Compatibility
Flexible execution
The framework decides how to optimize—chunking data like batch processing when appropriate, or flowing continuously when needed
Lesson 756Hybrid and Modern Alternatives
Flexible exploration
Data scientists and analysts can experiment with raw data without ETL bottlenecks
Lesson 758Data Lake Fundamentals
Flink
implements **true event-by-event streaming**: each event flows through the processing pipeline immediately upon arrival.
Lesson 771Flink vs Spark for StreamingLesson 772Apache Beam Programming Model
Flooding
represents resource exhaustion spreading across your system
Lesson 1068The Ship Bulkhead Analogy: Containing Damage
FlowFiles
(metadata + content wrapper).
Lesson 775Apache NiFi for Data Flow
FLP impossibility
you cannot solve consensus in purely asynchronous systems with even one possible failure.
Lesson 600Why Consensus Is Hard
FLP Impossibility Result
(named after Fischer, Lynch, and Paterson, 1985) is a foundational theorem in distributed systems.
Lesson 601The FLP Impossibility Result
Flush to disk (SSTable)
When the in-memory structure fills up, it's flushed to disk as an immutable sorted file called an SSTable (Sorted String Table).
Lesson 415Write Path and LSM Trees
Flush to HFiles
When the MemStore fills up (typically 128MB-256MB), it's flushed to disk as an immutable HFile.
Lesson 436HBase Write Path and WAL
Follower Reach Rate
tracks what percentage of a user's followers successfully received the post within your SLA (e.
Lesson 1657Measuring Fanout Performance
Follower replicas
Other brokers maintain synchronized copies by continuously pulling updates from the leader
Lesson 705Replication and Fault Tolerance
Followers apply changes
→ Eventually have the same data
Lesson 71Single-Leader Replication Model
Follows REST principles
Leverages HTTP's built-in content negotiation mechanism
Lesson 1902Content Negotiation with Media Types
Follows-From
A looser relationship indicating that one operation was triggered by another, but the parent doesn't wait for completion.
Lesson 1232Span Relationships and Hierarchy
Follows-from links
show asynchronous fire-and-forget patterns
Lesson 1232Span Relationships and Hierarchy
For data keys
Apply the hash function to the key itself.
Lesson 1458Mapping Keys and Nodes to the Ring
For each SSTable
(newest to oldest):
Lesson 416Read Path and Bloom Filters
For nodes
Apply the hash function to a node identifier (IP, hostname, UUID).
Lesson 1458Mapping Keys and Nodes to the Ring
For time-series data
(logs, metrics, sensor readings, financial transactions), most queries focus on recent data.
Lesson 249Time-Based Sharding
Force breaking changes
alienate users, break production systems, damage trust
Lesson 1898Why API Versioning Matters
Foreign key constraints
Does this reference point to an existing record?
Lesson 305Consistency Guarantees
Foreign keys
(relationships between tables) reference primary keys to maintain referential integrity
Lesson 299Primary Keys and Entity Integrity
Fork Process
Redis calls the system `fork()` command, creating a child process that shares the same memory view
Lesson 350Redis Persistence: RDB Snapshots
Format adaptation
Convert service responses to client-friendly formats
Lesson 882Request and Response Transformation
Format conversion
Transform REST payloads to match service contracts
Lesson 882Request and Response TransformationLesson 1601Video Transcoding Fundamentals
Format Integrity
Parse file headers to ensure the file isn't corrupted or malformed.
Lesson 1592Upload Validation and Virus Scanning
Format patterns
Does an email look like an email?
Lesson 886Request Validation
Forward compatible
Old consumers can read new messages (remove optional fields)
Lesson 725Schema Registry and Evolution
Forward recovery
Continue execution with retries or alternative paths
Lesson 585Alternative: Saga Pattern Introduction
Forwards
the request to the backend cluster running version 2 of the API
Lesson 1907Gateway-Level Version Routing
Free and battle-tested
widely used by companies from startups to tech giants
Lesson 111NGINX as a Load Balancer
Free tiers
Cloudflare offers basic CDN free; others provide limited trial credits
Lesson 191CDN Provider Feature Comparison
Frequency caps
maximum 5 emails per day
Lesson 1702User Preferences Lookup
Frequent multi-collection queries
(orders with products and users)
Lesson 405When Joins Are Required
Frequently accessed data
that changes infrequently (e.
Lesson 888Caching at the Gateway
Fresh content sources
News aggregators, forums, and social platforms constantly publish new content and links, helping your crawler stay current.
Lesson 1828Seed URLs and Starting Point
Freshness priority
Discovers important pages near the seed quickly—great for news sites or finding high-value content fast
Lesson 1830Breadth-First vs Depth-First Crawling
freshness requirements
or natural expiration, TTL (Time-To-Live) ensures stale data doesn't linger.
Lesson 153Choosing an Eviction PolicyLesson 1844Front Queue: Priority Management
Freshness signals
(news sites need frequent recrawling)
Lesson 1839FIFO vs Priority-Based Frontier
Friend relationships
(if distinct from followers)
Lesson 1653Selective Fanout Optimization
From document stores
Documents store hierarchical JSON-like objects with nested structures.
Lesson 410What is a Wide-Column Store?
From key-value stores
While key-value stores map one key to one value, wide-column stores map a row key to many columns, letting you retrieve individual columns or column groups without fetching the entire row.
Lesson 410What is a Wide-Column Store?
From relational databases
In SQL, a table has a fixed schema.
Lesson 410What is a Wide-Column Store?
front queue
is where you make these decisions *before* URLs flow into the politeness-controlled back queues.
Lesson 1844Front Queue: Priority ManagementLesson 1845Back Queue: Politeness Enforcement
front queues
managing priority and multiple **back queues** enforcing per-host politeness.
Lesson 1846Queue Router and Host MappingLesson 1849URL Frontier Persistence and Recovery
Frontend proxy
The first point of contact for client requests
Lesson 112HAProxy Overview
Frozen
Almost never accessed (S3 Glacier Deep Archive)
Lesson 1623Cold Storage and Archival
Full compatible
Both directions work (add/remove only optional fields)
Lesson 725Schema Registry and Evolution
Full historical detail
when you need every raw event, not just summaries
Lesson 762Query Performance Tradeoffs
Full rebuild
Rare nuclear option that regenerates everything
Lesson 777Workflow Orchestration Patterns
Full restores
Rebuild entire systems from scratch
Lesson 1408Backup Verification and Testing
Full search
happens when a user submits a complete query and expects comprehensive, highly ranked results.
Lesson 1757Typeahead vs Full Search
Full snapshots
store each version completely.
Lesson 1577Paste Editing and Version History
Full SQL interface
for queries and schema definition
Lesson 331What NewSQL Is
Full table scans
are a graph database's Achilles heel.
Lesson 478When Graphs Underperform: Aggregations and Scans
Full-stack feature teams
work best: a team owns the web client *and* its BFF, communicating directly with backend microservices.
Lesson 906BFF Ownership and Team Structure
Functional
"Serve customers food," "Take payment," "Provide a menu"—these are the *actions* the restaurant performs
Lesson 9Functional vs Non-Functional: Core Distinction
Functional requirements
describe *what* the system must do—the actual features and behaviors users interact with.
Lesson 9Functional vs Non-Functional: Core Distinction

G

G-Counter
(grow-only counter): Each replica maintains its own count; totals are summed
Lesson 1384Conflict-Free Replicated Data Types (CRDTs)
G-Set
(grow-only set): Items can only be added, never removed
Lesson 1384Conflict-Free Replicated Data Types (CRDTs)
Game Day
is a scheduled, controlled event where your team intentionally breaks things in a safe environment to practice incident response.
Lesson 1345Starting with Game Days
Game servers
tracking player positions and actions
Lesson 56What Makes a Service Stateful
Gaming leaderboards
Temporary score inconsistencies won't ruin the experience
Lesson 137Write-Behind: Risks and Use Cases
Gateway → Client (HTTP)
Lesson 874Protocol Translation
Gateway → Service (gRPC)
Lesson 874Protocol Translation
gauge
is a metric type that represents a **point-in-time measurement** that can both increase and decrease.
Lesson 1175Gauge MetricsLesson 1179Aggregation and Roll-UpsLesson 1184Gauge MetricsLesson 1200Grafana for Metrics Visualization
GCP integration, global scale
Pub/Sub
Lesson 735Choosing a Streaming Platform
GDPR
(Europe) and **CAN-SPAM** (US) require verifiable records showing when users opted out, what they opted out of, and that you stopped messaging them accordingly.
Lesson 1728Opt-Out and Compliance Tracking
Generate
a correlation ID when the request enters your system (e.
Lesson 1158Correlation IDs Across ServicesLesson 1512Random String Generation
Generate client libraries
in multiple languages automatically
Lesson 1885API Documentation with OpenAPI/Swagger
Generate once, cache aggressively
Lesson 1539QR Code Generation
Generating unique idempotency keys
– Typically a UUID or similar globally unique identifier
Lesson 1007Idempotency and Client Responsibilities
Geo + Hash
First shard by geographic region to keep data close to users, then hash-shard within each region for even distribution.
Lesson 250Hybrid Sharding Strategies
Geo-based
optimizes location-based queries
Lesson 253Evaluating Sharding Strategy Tradeoffs
Geo-Distributed Replication
Data is replicated across multiple nodes (typically 3+ copies) using a consensus protocol (Raft).
Lesson 334CockroachDB and Distributed SQL
Geo-distribution
is required with low-latency reads/writes
Lesson 337When to Choose NewSQL
Geographic and Network Metrics
show where requests originate and connection quality.
Lesson 1628Usage Analytics and Metrics
Geographic burst
→ Activate Proximity-Based Routing
Lesson 97Dynamic Algorithm Selection
Geographic latency
Users far from the data center experience delays
Lesson 1791Single Data Center vs Distributed SetupLesson 1862Why Distribute a Web Crawler
Geographic location
Derived from IP address (country, city).
Lesson 1505Analytics and Tracking Requirements
Geographic proximity
Weight nearby servers higher for latency
Lesson 86Weighted Round RobinLesson 180DNS-Based Request Routing
Geographic redundancy
means deploying your system components across multiple physical locations, often hundreds or thousands of miles apart, so that a disaster in one location doesn't destroy your entire service.
Lesson 1334Geographic Redundancy and Multi-Region
Geographic region
(North America is typically cheaper than Asia-Pacific)
Lesson 30CDN Bandwidth and Cost Estimation
Geographic rollouts
When expanding to a new region, warm the local PoPs with your most-accessed content.
Lesson 184Cache Warming and Preloading
Geolocation-aware crawling
means routing requests through IP addresses physically near the target server.
Lesson 1860IP Address Rotation and Geolocation
Geolocation-based storage
means distributing your original media files across multiple regional storage clusters, placing them nearest to where your primary audience lives.
Lesson 1626Geolocation-Based Storage
Get score ranges
"All players between 1000-2000 points"
Lesson 359Redis for Leaderboards and Counting
Get top N players
Instantly retrieve any range
Lesson 359Redis for Leaderboards and Counting
GetFile
/ **PutFile**: Read/write local files
Lesson 775Apache NiFi for Data Flow
GitOps workflows
and infrastructure-as-code practices.
Lesson 846Control Plane: API and User Interface
Global alone
Fair users suffer when bad actors consume all capacity.
Lesson 991Hierarchical Rate Limiting
Global default
All queries timeout after X seconds
Lesson 285Query Timeout and Statement Limits
Global Distributed Indexes (Term-Partitioned)
Lesson 1455Secondary Indexes and Partitioning
Global distribution
with correctness guarantees
Lesson 336NewSQL Tradeoffs
Global HTTP(S) Load Balancer
Layer 7, distributes traffic across multiple regions automatically
Lesson 114Cloud Load Balancers (GCP and Azure)
Global limits
Protect overall system capacity
Lesson 1697API Layer and Rate Limiting
Global Merge-Sort
Combine results from all shards, re-rank by score, and return the global top-K
Lesson 1780Distributed Query Coordination
Global multi-region applications
where users expect consistent data regardless of location
Lesson 337When to Choose NewSQL
Global query view
across all Prometheus instances
Lesson 1206Metrics Federation and Long-Term Storage
Global Rate Limiting
Use a centralized token bucket or distributed rate limiter to cap total fanout throughput across all workers.
Lesson 1654Fanout Rate Limiting
Global SSL Proxy/TCP Proxy
Layer 4 with TLS termination
Lesson 114Cloud Load Balancers (GCP and Azure)
Global system limits
act as a backstop when total traffic exceeds infrastructure capacity (e.
Lesson 973Multi-Tier Rate Limiting
Good (idempotent)
`if transaction_id not in processed_set: balance += 100; mark transaction_id as processed`
Lesson 679At-Least-Once Delivery
Good shard key
`user_id` — distributes users evenly and queries for one user hit one shard
Lesson 232Shard Key Selection
Google Cloud Trace
, and commercial offerings like Datadog APM provide:
Lesson 1251Choosing a Tracing System
Google Kubernetes Engine (GKE)
Lesson 1244Google Cloud Trace
Google Pub/Sub with subscriptions
, and **Kafka consumer groups**.
Lesson 663Hybrid Patterns: Topic + Queue
Google Spanner
provides strict serializability across globally distributed data centers.
Lesson 530Strong Consistency in PracticeLesson 576When 2PC is Used in Practice
gossip protocol
they periodically "chat" with random peers to share information about the entire cluster's state.
Lesson 430Gossip Protocol and Failure DetectionLesson 983Gossip Protocols for Rate Limit SyncLesson 1805Gossip Protocols for Approximate Limits
Governance challenges
Hard to track what's sensitive, who owns it, or if it's compliant
Lesson 764Data Governance and Quality
Graceful node changes
When nodes join or leave, only adjacent replicas are affected
Lesson 1466Replication with Consistent Hashing
Gradual circuit breaker resets
that slowly increase allowed traffic
Lesson 1081Thundering Herd After Recovery
Gradual evolution
as your data patterns change over time
Lesson 1480Hybrid Partitioning Approaches
Gradual migration
old and new versions can briefly coexist
Lesson 165Versioned Cache KeysLesson 1907Gateway-Level Version Routing
Gradual rebalancing
– spreads load as usage patterns emerge
Lesson 1475Dynamic Range Splitting
Grandfather (monthly backups)
Keep 12 monthly backups.
Lesson 1431Backup Retention Policies
Grant or Deny
Whether the permission is allowed or blocked
Lesson 937Access Control Lists (ACLs)
Granular but practical
(not hundreds of micro-permissions)
Lesson 930OAuth2 Scopes and Consent
Graph approach
Your friend's contact card has their friends' phone numbers directly written on it.
Lesson 476Graph Query Performance Characteristics
graph database
is a database system optimized for storing and querying data where **relationships between entities are just as important as the entities themselves**.
Lesson 451What is a Graph Database?Lesson 452Graph Model: Nodes and Edges
Graph databases
work fundamentally differently.
Lesson 476Graph Query Performance Characteristics
Graph/Time Series
Line charts showing metrics over time (perfect for request rates, CPU usage)
Lesson 1200Grafana for Metrics Visualization
Graphite
provides simpler but less flexible querying.
Lesson 1208Choosing a Metrics System for Your Scale
Graphite-web
The web application for querying and rendering graphs
Lesson 1202Graphite Time-Series Database
GraphQL subscriptions
establish a persistent, bidirectional connection between client and server—typically using **WebSockets**.
Lesson 1915GraphQL Subscriptions for Real-Time Data
GraphQL-style arguments
Clients specify filters as structured objects:
Lesson 1893Complex Filtering with Query Languages
Green light for innovation
Deploy new features aggressively
Lesson 1281Error Budget Policies
Gremlin
(used by Apache TinkerPop-compatible databases) is more like giving step-by-step walking directions through your graph.
Lesson 456Graph Query Languages: Cypher and Gremlin
Group related operations
from the same session
Lesson 1140Contextual Fields
Grouped logically
by resource type
Lesson 930OAuth2 Scopes and Consent
Grouping
works similarly—you organize cache keys into logical collections (groups) so you can operate on the entire collection at once.
Lesson 164Cache Tagging and GroupingLesson 1194Time-Series Queries and PromQL
GSSAPI/Kerberos
Enterprise-grade, ticket-based authentication
Lesson 727Kafka Security: Authentication and Encryption
Guarantee
You still write to N nodes, but they might not be the "right" N
Lesson 366Sloppy Quorums and Hinted Handoff

H

H.265
involve repetitive mathematical operations across millions of pixels—perfect for GPU parallelization.
Lesson 1607GPU Acceleration for Encoding
H.265 (HEVC)
or **AV1** instead of H.
Lesson 1621Compression and Format Optimization
HA pairs
Two HAProxy instances using keepalived or similar for redundancy
Lesson 112HAProxy Overview
Hadoop
built on MapReduce, adding:
Lesson 743Batch Processing Frameworks
Half-Open → Closed
Send successful requests during half-open state, verify full recovery
Lesson 1065Testing Circuit Breaker Behavior
Half-Open → Open
Send failures during half-open state, verify the breaker re-opens
Lesson 1065Testing Circuit Breaker Behavior
Half-open circuit
Limited test requests (possibly with one retry) to check health
Lesson 1030Combining Retries with Circuit Breakers
Half-open success rate
whether recovery attempts succeed
Lesson 1055Circuit Breaker Observability
Half-open testing windows
should align with your dependency's recovery patterns.
Lesson 1066Tuning for Production Workloads
Hamming distance
(number of differing bits) between two Simhash values correlates with content similarity.
Lesson 1855Near-Duplicate Detection with Simhash
Handle collisions gracefully
If taken, suggest alternatives (`nike2`, `nike-official`) or reject the request.
Lesson 1531Custom Aliases and Vanity URLs
Handle conflicts gracefully
Use application logic or CRDTs to resolve discrepancies
Lesson 583Alternative: Best Effort with Eventual ConsistencyLesson 1514Custom Short URL Support
Handle disagreements
through consensus protocols
Lesson 526The Cost of Strong Consistency
Handle failures
Default to generic preview if fetch fails
Lesson 1538Link Preview and Metadata
Handle failures gracefully
network issues, timeouts, broken links
Lesson 1826What is a Web Crawler
Handle massive parallelism
crawling millions of pages per second
Lesson 1826What is a Web Crawler
Handle missing deadlines
(apply sensible defaults)
Lesson 1113Cross-Protocol Deadline Handling
Handle rate limits gracefully
Back off when you receive 429 (Too Many Requests) or 503 (Service Unavailable) responses.
Lesson 1831Robots.txt and Crawl Etiquette
Handle schema changes
Coordinate table creation, deletion, and column family modifications
Lesson 447Master Server and Metadata Management
Handles client-specific error formatting
Lesson 905BFF Implementation Patterns
HAProxy
(High Availability Proxy) is a free, open-source software load balancer that specializes in distributing traffic across multiple servers with exceptional performance and reliability.
Lesson 112HAProxy Overview
Hard bounces
(permanent): Invalid address, domain doesn't exist
Lesson 1686Email Notifications
Hard purge
Immediately deletes cached content; next request must go to origin
Lesson 185Purging and Cache Invalidation Strategies
Hardware uniformity
All identical servers?
Lesson 226Load Distribution Across Replicas
Hash approach
Assign books by random number → shelves are equally full, but finding all mystery novels requires checking every shelf
Lesson 1454Partitioning Tradeoffs: Distribution vs Query Efficiency
Hash by hostname
Extract the hostname from each URL and route it to a dedicated queue
Lesson 1841Single-Host Queue Pattern
Hash Matching
Compare content hashes against databases of known malicious files (like VirusTotal)
Lesson 1581Abuse Prevention and Content ModerationLesson 1629Content Moderation at Scale
Hash the key
Apply your hash function to get a position on the ring (e.
Lesson 1459Clockwise Key Assignment RuleLesson 1806Rate Limiting with Consistent Hashing
Hash the long URL
Apply MD5 (128 bits) or SHA-256 (256 bits)
Lesson 1508Hash-Based Generation Approach
Hash the URL
Apply a hash function to the normalized URL: `hash("http://example.
Lesson 1854Distributed URL DeduplicationLesson 1867Distributed Deduplication with Bloom Filters
hash-based sharding
instead of ranges to scatter sequential IDs uniformly
Lesson 248Avoiding Hotspots in ShardingLesson 1769Horizontal Scaling of Search Infrastructure
HDFS
(Hadoop Distributed File System) for storing massive files across machines
Lesson 743Batch Processing Frameworks
head
to **tail**, with each replica forwarding the write to the next.
Lesson 1362Chain ReplicationLesson 1373Chain Replication
Head processes
→ Applies update, forwards to next node
Lesson 1373Chain Replication
Head-based sampling
means making the decision to keep or discard a trace at the moment it begins—at the "head" or root span of the request.
Lesson 1253Head-Based Sampling
Header forwarding
is the most common approach.
Lesson 945Token Propagation Across Services
Header manipulation
Add authentication tokens, remove sensitive client headers, inject tracing IDs
Lesson 882Request and Response Transformation
Header validation
Are required headers (like `Content-Type`) present and correct?
Lesson 886Request Validation
Header versioning
`Accept: application/vnd.
Lesson 809Versioning and Backward Compatibility
Header-based routing
lets you route based on metadata: send requests from your mobile app to `api-v2`, while web traffic stays on `api-v1`.
Lesson 848Traffic Management and Routing
Header-based versioning
uses HTTP headers:
Lesson 892API Versioning and Routing
Health tracking
Knowledge of which servers are healthy vs unavailable
Lesson 83Client-Side Load Balancing
Health-aware
Skip unhealthy instances entirely
Lesson 880Request Routing and Load Balancing
Health-Aware Routing
The mesh only routes to instances that have passed health checks, automatically excluding failed or unhealthy ones
Lesson 832Service Discovery in a Mesh
Healthcare records
– patient data and prescriptions demand accuracy
Lesson 318When to Choose ACID or BASE
Heaps
Both insert and extract-min are O(log n), balancing writes and reads beautifully.
Lesson 1847Heap-Based Priority Queue Implementation
Heartbeat mechanism
Client sends periodic "I'm alive" pings (every 30–60 seconds) over the existing WebSocket/SSE connection
Lesson 1676Presence Detection and User Status
Heat maps
revealing traffic patterns throughout the day
Lesson 1152The ELK Stack: Kibana
Heatmap
Density visualization for histogram buckets
Lesson 1200Grafana for Metrics Visualization
Heterogeneous hardware
A server with 64GB RAM might get 200 vnodes, while a 32GB server gets 100 vnodes
Lesson 363Virtual Nodes and Load DistributionLesson 372Consistent Hashing in Dynamo
Heterogeneous node weights
solve this by varying the number of virtual nodes assigned to each physical server based on its capacity.
Lesson 1465Heterogeneous Node Weights
Heterogeneous partitioning
Mix hash-based and range-based assignments freely
Lesson 1476Directory Partitioning Fundamentals
Heuristic-based detection
analyzes code patterns, keywords, and structure.
Lesson 1575Syntax Highlighting and Language Detection
Hidden dependencies
Discover undocumented service calls you didn't know existed.
Lesson 1229Service Dependency Graphs
Hide internal complexity
by presenting simplified, consistent APIs
Lesson 882Request and Response Transformation
Hierarchical
Combine multiple levels: `tenant:acme:user:42:payment:abc-123`
Lesson 1017Idempotency Key Scope and Namespacing
Hierarchical data
`region + store_id + transaction_id`
Lesson 413Row Keys and Clustering
High (minutes)
Push notification + in-app (e.
Lesson 1688Channel Selection Strategy
High accuracy critical
Use **Sliding Window Log** (perfect tracking) or **Sliding Window Counter** (near-perfect with less memory)
Lesson 975Algorithm Selection Criteria
High accuracy, high latency
Centralized counter with strong consistency guarantees (locks, transactions).
Lesson 985Trade-offs: Accuracy vs Latency
High availability requirements
strongly favor horizontal scaling.
Lesson 51When to Choose Horizontal Scaling
High availability, eventual consistency
`N=3, R=1, W=1`
Lesson 558N, R, W Configuration Trade-offs
High availability, low reliability
The ATM is always powered on (available), but occasionally dispenses the wrong amount of cash or debits your account incorrectly.
Lesson 1322Availability vs Reliability: Key Differences
High burst tolerance needed
Use **Token Bucket** (allows bursting up to bucket size) or **Leaky Bucket** (smooths bursts over time)
Lesson 975Algorithm Selection Criteria
High business impact
For critical user journeys (checkout, payment processing), the cost of downtime far exceeds tracing expenses.
Lesson 1260Cost-Benefit Analysis
High cache hit ratio
The same file serves millions of users
Lesson 173Content Types Suited for CDNs
High cohesion
means everything inside a service is closely related and works toward the same purpose.
Lesson 818High Cohesion, Low Coupling in Service Design
High consistency
W=3, R=1 (all writes succeed everywhere before returning)
Lesson 373Replication and Quorum in Dynamo
High consistency, slower writes
`N=3, R=1, W=3`
Lesson 558N, R, W Configuration Trade-offs
High cost
$10K–$100K+ upfront, plus maintenance
Lesson 108Hardware vs Software Load Balancers
High failure rates
even when closed (threshold might be misconfigured)
Lesson 1064Monitoring and Metrics
High latency
A user in Tokyo requesting a video from your Virginia data center faces network round-trip delays of 150-200ms
Lesson 1609Why CDNs Are Essential for Media Hosting
High priority
Fast uploads (performance)
Lesson 18Prioritizing Requirements Under Constraints
High priority queue
Direct messages, mentions, friend requests
Lesson 1700Priority Queues and Urgency Levels
High read throughput
The tail can handle many concurrent reads without coordinating with other replicas
Lesson 1362Chain Replication
High reliability
requires careful error handling, data validation, testing, and monitoring.
Lesson 14Availability and Reliability Requirements
High reliability, low availability
When the ATM works, it's always accurate, but it's frequently offline for maintenance.
Lesson 1322Availability vs Reliability: Key Differences
High resolution, short retention
1-second intervals kept for 24 hours—perfect for debugging live incidents
Lesson 1270Monitoring Resolution and Retention Tradeoffs
High throughput, complex streaming
Kafka
Lesson 735Choosing a Streaming Platform
High traffic variability
Need load smoothing to handle spikes
Lesson 654When to Use Async vs Sync
High traffic variance
Black Friday traffic shouldn't starve routine operations
Lesson 1076Bulkhead Tradeoffs: Complexity and Resource Overhead
High write throughput
IoT devices generate massive volumes of time-series data.
Lesson 404Mobile and IoT Backend StorageLesson 640Performance Characteristics of Consensus
High-availability systems
More frequent checks with shorter timeouts
Lesson 100Health Check Intervals and Timeouts
High-frequency routine operations
Don't log every successful cache hit or health check ping—you'll drown in noise and hurt performance.
Lesson 1129What to Log vs What Not to Log
High-lag replicas
might still serve less critical queries or get fewer requests until they catch up
Lesson 218Lag-Aware Load Balancing
High-risk scenarios
where random generation is critical:
Lesson 1515Short URL Predictability Tradeoffs
High-throughput transactional workloads
that have outgrown single-server SQL databases
Lesson 337When to Choose NewSQL
High-volume streaming
Kafka (covered later), NATS
Lesson 676Choosing Between Message Broker Technologies
Higher availability
If 3 nodes are down, you can still read (R=2) and write (W=2) successfully
Lesson 560Eventual Consistency with Quorums
Higher complexity
Self-hosted ActiveMQ, multi-datacenter setups
Lesson 676Choosing Between Message Broker Technologies
Higher costs
for infrastructure you don't need
Lesson 36YAGNI: You Aren't Gonna Need It
Higher CPU/memory cost
on the client (maintaining time windows and sorting)
Lesson 1186Summary Metrics
Higher durability
= later ack = higher latency per message
Lesson 682Producer Acknowledgments
Higher last term wins
If the logs end with different terms, the one with the higher term is more up-to-date
Lesson 628Election Restriction: Up-to-Date Check
Higher latency
Every write waits for coordination
Lesson 526The Cost of Strong Consistency
Higher operational complexity
More servers to monitor, upgrade, and debug
Lesson 639Consensus Cluster Sizing Tradeoffs
Higher RTO
Failover takes time (DNS updates, scaling up standbys)
Lesson 1436Active-Passive vs Active-Active DR
Higher throughput
– The primary isn't bottlenecked by the slowest replica
Lesson 1356Asynchronous Replication: Speed and Risk
Higher write latency
(consensus coordination)
Lesson 530Strong Consistency in Practice
Hint storage
The temporary nodes store a "hint" that this data belongs elsewhere
Lesson 1372Sloppy Quorums and Hinted Handoff
Histogram
Combine buckets from multiple measurements
Lesson 1179Aggregation and Roll-UpsLesson 1185Histogram Metrics
Histograms
Request latency distribution (buckets of response times)
Lesson 1172What Are Metrics and Why They Matter
Histograms and Summaries
demand `quantile()` for latency analysis.
Lesson 1193Aggregation Functions
Historical reprocessing is expensive
If replaying your entire event log through a stream processor would take days or weeks, Lambda's batch layer can handle full historical processing more efficiently.
Lesson 755When to Choose Lambda vs Kappa
Hit rate
is the percentage of requests served from cache.
Lesson 166Monitoring Cache Performance
Hit Ratio
= Hits / (Hits + Misses)
Lesson 129Cache Hit Ratio Optimization
Hits
successful cache retrievals
Lesson 129Cache Hit Ratio Optimization
HLS (HTTP Live Streaming)
Breaks video into small segments (~10 seconds each), with a playlist file telling the player which quality versions exist
Lesson 193CDN for Video StreamingLesson 1602Adaptive Bitrate Streaming (ABR)Lesson 1613HLS and DASH ProtocolsLesson 1625Adaptive Bitrate Streaming
HMAC
with a secret key only your application and CDN know.
Lesson 1627Access Control and Signed URLs
Horizontal expansion
means adding more worker instances rather than making one worker bigger (vertical scaling).
Lesson 1708Scalability and Horizontal Expansion
Horizontal Partitioning (Sharding)
means splitting your table by **rows**.
Lesson 231Vertical Partitioning vs Horizontal Partitioning
Horizontal partitioning/sharding
helps when:
Lesson 231Vertical Partitioning vs Horizontal Partitioning
Horizontal Scaling Becomes Trivial
You can spin up 5, 50, or 500 gateway instances instantly.
Lesson 878Stateless Gateway Design
Horizontal scaling complexity
Need sticky sessions or distributed session store
Lesson 916Session vs Token Tradeoffs
Host extraction
Parse the URL to identify its hostname (e.
Lesson 1845Back Queue: Politeness Enforcement
Host-based routing
Different domains routed to different backends
Lesson 113Cloud Load Balancers (AWS ELB/ALB)
Host-Based Sharding
Assign each host to a specific worker using consistent hashing.
Lesson 1868Coordinating Politeness Across Workers
Hot
Frequently accessed (S3 Standard, GCS Standard)
Lesson 1623Cold Storage and Archival
Hot data
is recent timeline content (e.
Lesson 1663Hot and Cold Timeline Data
Hot spots
High request volumes on certain edges indicate load concentration.
Lesson 1229Service Dependency Graphs
hot standby
is a fully-running, continuously synchronized backup system that can take over almost instantly when the primary fails.
Lesson 1417Hot Standby vs Cold StandbyLesson 1443DR Cost Optimization
Hot storage (0-7 days)
Keep all sampled traces with full fidelity.
Lesson 1246Trace Data Retention Policies
Hot tier (0-7 days)
Fast SSD-backed search indexes (Elasticsearch, Splunk).
Lesson 1165Log Retention Policies
Hotspot cascades
Overloaded partitions trigger the problems you learned earlier
Lesson 1491Data Skew and Cardinality Issues
Hotspot mitigation
by splitting hot ranges into smaller partitions
Lesson 1480Hybrid Partitioning Approaches
Hotspots emerge
Server A might handle 40% of your data while Server B handles only 5%
Lesson 1462The Uneven Distribution Problem
Hours (1-24)
Standard approach for most notifications
Lesson 1712Deduplication Windows and Storage
How to measure effectively
Lesson 40Measure Before Optimizing
HTML parser
(like BeautifulSoup, jsoup, or native DOM parsers) to identify all `<a>` tags with `href` attributes, plus other link sources like `<img>`, `<script>`, and `<link>` tags.
Lesson 1829URL Discovery and Extraction
HTTP
Custom headers like `X-Timeout-Ms` or `Request-Timeout`
Lesson 1113Cross-Protocol Deadline Handling
HTTP → WebSocket
Upgrading connections or proxying events from backend message streams
Lesson 881Protocol Translation
HTTP headers
Mobile user-agents route to mobile-optimized servers
Lesson 110Layer 7 (Application) Load BalancingLesson 854Request-Level Authorization
HTTP methods
`POST` requests might go to write-heavy servers
Lesson 110Layer 7 (Application) Load Balancing
HTTP-native
Direct CDN integration and pre-signed URL support
Lesson 1588Object Storage vs Block Storage
HTTP/2
for transport, delivering significantly better performance.
Lesson 1917gRPC: Protocol Buffers and Binary RPC
HTTP/2 or HTTP/3
with multiplexing to reduce round-trip overhead
Lesson 1618Optimizing for Mobile Networks
HTTP/REST
Simple, browser-friendly, widespread
Lesson 874Protocol Translation
HTTP/REST → gRPC
Gateway receives JSON over HTTP, marshals it into Protocol Buffer format, makes a gRPC call
Lesson 881Protocol Translation
Hub-and-Spoke
Regional clusters replicate to a central aggregation cluster
Lesson 726Multi-Datacenter Replication
Human intervention time
If automated retries fail, a human might investigate and manually retry within a business day
Lesson 1012Idempotency Key Expiration Strategy
Human-readable
means clear messages, logical ordering, and context that makes sense at a glance:
Lesson 1166Human-Readable vs Machine-Parseable
Human-Readable Message
A clear explanation for developers debugging the issue (e.
Lesson 1883Error Response Structure and Consistency
hybrid approaches
cache tokens in Redis with database backup, providing speed for cache hits and durability for misses.
Lesson 1040Idempotency Token Storage StrategiesLesson 1667Real-Time vs Precomputed RankingLesson 1752Index Compression Techniques
Hybrid approaches work
Many systems use **strong consistency for critical writes** (payment processing) while accepting **eventual consistency for reads** (product reviews).
Lesson 553Choosing Consistency Levels
Hybrid Model
recognizes that not all users are equal: some have millions of followers (celebrities), others are highly active, and some log in rarely.
Lesson 1639Hybrid (Pull-Push) Feed ModelLesson 1644Feed Personalization and Ranking RequirementsLesson 1645What is Fanout in Social Media Systems
Hybrid models
exist where platform teams provide BFF frameworks and templates, but feature teams customize their instances independently.
Lesson 906BFF Ownership and Team Structure
Hybrid needs
→ Consider using both (polyglot persistence)
Lesson 319Decision Framework: Data Model First
Hybrid patterns
let you use sessions where they work best (user-facing web apps with browsers) and tokens where *they* excel (microservices, mobile apps, API access).
Lesson 919Hybrid Session-Token Patterns
Hybrid sharding
means applying different sharding techniques together—either in layers or combined into a composite key—to balance multiple goals simultaneously.
Lesson 250Hybrid Sharding Strategies
HyperLogLog
is Redis's probabilistic data structure that estimates cardinality (unique counts) with ~0.
Lesson 359Redis for Leaderboards and Counting
Hystrix
(Netflix's now-deprecated library) and **Resilience4j** (the modern alternative).
Lesson 1075Implementing Bulkheads in Practice: Hystrix and Resilience4j

I

I/O (Input/Output)
Disk read/write speeds and network bandwidth also plateau.
Lesson 46Hardware Limits of Vertical Scaling
I/O load
Reading entire datasets can impact production system performance
Lesson 1421Full Backup Strategy
Idempotency is critical
Since both requests might complete, your operation must be idempotent (remember idempotency keys from earlier lessons)
Lesson 1031Hedged Requests and Speculative Execution
idempotency keys
, **server-side state tracking**, and **time windows** (concepts you've already learned) to transform non-idempotent operations into idempotent ones.
Lesson 1006Natural Idempotency vs Engineered IdempotencyLesson 1009HTTP Methods and Natural IdempotencyLesson 1033Idempotency Keys in Payment SystemsLesson 1711Idempotency Keys for Notifications
Idempotency time windows
limit how long the server remembers an idempotency key.
Lesson 1005Idempotency Time Windows
idempotency token
(or key) is a unique identifier the client generates and includes with each request.
Lesson 1027Idempotency Tokens in Retry LogicLesson 1036Request Token Generation and Management
Identical behavior
means any load balancer can route requests randomly
Lesson 57Scaling Stateless Services Horizontally
Identifies best channels
per segment (some users never open email but always read SMS)
Lesson 1729Analytics-Driven Optimization
Identifies the tenant/user
(already covered in lesson 1818)
Lesson 1824Tiered Rate Limiting
Identify
a slice of functionality to migrate first (often starting small)
Lesson 822The Strangler Fig Pattern for Migration
Identify candidates
Media older than 6-12 months with minimal access
Lesson 1623Cold Storage and Archival
Identify noisy logs
Which log statements fire constantly but provide little value during incidents?
Lesson 1171Log Review and Alert Fatigue
Identify the bottleneck
Review your QPS estimate, storage growth, or latency targets.
Lesson 35Iterate Based on Constraints
Identify the latest version
using timestamps or version vectors
Lesson 559Strong Consistency with Quorums
Identify your critical path
(what was taught in lesson 1082): determine which features are absolutely essential.
Lesson 1083Graceful Degradation Strategies
Identify yourself
Set a proper User-Agent header so site owners know who's crawling and can contact you if needed.
Lesson 1831Robots.txt and Crawl Etiquette
Identity
Who requested access (user ID, service account, API key)
Lesson 944Auditing and Compliance for Authorization
Identity federation
means delegating authentication to external identity providers (IdPs) while maintaining your own authorization layer.
Lesson 932Multi-Tenant OAuth2 and Identity Federation
Idle timeout
How long a connection can sit unused in the pool before being closed.
Lesson 272Connection Timeouts and Limits
Idle users
Optional middle tier—defer updates briefly, then pull
Lesson 1676Presence Detection and User Status
If
Partition, choose Availability or Consistency; **Else** (no partition), choose Latency or Consistency.
Lesson 516The 'Else' Clause: Normal Operation TradeoffsLesson 1015Conditional Writes for Idempotency
If a coordinator crashes
after sending PREPARE but before COMMIT, it reads its log on restart.
Lesson 574Recovery Protocols and Logs
If a participant crashes
after voting YES, it reads its log, sees it's waiting for a decision, and contacts the coordinator (or other participants) to learn the outcome.
Lesson 574Recovery Protocols and Logs
If all succeed
transition to **closed** (fully restored)
Lesson 1060Half-Open State Testing
If any fail
transition back to **open** (still broken, reset timer)
Lesson 1060Half-Open State Testing
If anyone said "no"
Coordinator tells everyone to abort
Lesson 568Two-Phase Commit (2PC) Overview
If collision detected
, generate a new random string and retry
Lesson 1512Random String Generation
If duplicate
, skip processing; **if new**, index and store the fingerprint
Lesson 1852Content Fingerprinting with Hashing
If everyone said "yes"
Coordinator tells all participants to commit permanently
Lesson 568Two-Phase Commit (2PC) Overview
If it exists
Skip storage entirely, just create a new metadata record pointing to the existing file
Lesson 1622Deduplication Strategies
If it's new
Store the file once and record its hash
Lesson 1622Deduplication Strategies
If no
Store the file once and record its hash and location
Lesson 1591Deduplication Using Content Hashing
If not cached
(cache miss), the edge fetches it once from your origin, caches it, then serves it
Lesson 192CDN for Static Asset Delivery
If the bucket overflows
(queue is full), new requests are rejected
Lesson 965Leaky Bucket Algorithm
If they fail
The service is still unhealthy, so the breaker returns to **Open** and waits longer before trying again
Lesson 1047The Three States: Half-Open
If they succeed
The service appears healthy, so the breaker transitions back to **Closed**
Lesson 1047The Three States: Half-Open
If yes
Don't store the file again—just create a new metadata record pointing to the existing storage location
Lesson 1591Deduplication Using Content Hashing
Immediate consistency
The paste URL is instantly valid
Lesson 1559Write Path: Synchronous vs Asynchronous Storage
Immediate requeue
Put the message back at the front of the queue
Lesson 684Negative Acknowledgments and Redelivery
Immediate response required
The client needs data right now (e.
Lesson 654When to Use Async vs Sync
Immediately after
, application deletes `user:123:profile` from cache
Lesson 157Active Invalidation on Write
Immutability
Logs cannot be altered or deleted by users
Lesson 944Auditing and Compliance for Authorization
Impact
What SLOs were violated, how many users affected, duration
Lesson 1304Blameless Postmortems
Impact Assessment
Duration of outage, users affected, revenue lost, SLO/error budget burn
Lesson 1350What is a Postmortem?
Impact estimate
"Affecting 2,500 users in us-east-1"
Lesson 1293Alert Context and Enrichment
Impact Metrics
Quantifiable damage—error rate, affected users, revenue loss, MTTR.
Lesson 1352Postmortem Structure and Action Items
Implement fallback logic
If push delivery fails (user offline, device unreachable), automatically retry or escalate to SMS.
Lesson 1689Multi-Channel Delivery
Implement scrubbing
Use logging middleware or filters that detect and redact sensitive patterns before writing logs.
Lesson 1163Avoid Logging Sensitive Data
Implementation
Store all short codes in lowercase in your database, then lowercase user input during redirects: `SELECT long_url FROM urls WHERE short_code = LOWER($input)`
Lesson 1518Case Sensitivity ConsiderationsLesson 1534Rate Limiting for URL Creation
Implements
compensating actions if it receives failure events
Lesson 590Choreography-Based Sagas
Important but not critical
Product recommendations, related items, reviews
Lesson 1082Critical Path Identification
Important requests
(browsing, search): moderate throttling during spikes
Lesson 995Graceful Degradation Through Throttling
Improve iteratively
learning from incidents through blameless postmortems
Lesson 1307What is Site Reliability Engineering (SRE)?
Improved Throughput
With dedicated resources for each operation type, you can serve more total requests per second.
Lesson 220Read-Write Splitting Fundamentals
Improves availability
by routing around failed servers
Lesson 76What Is a Load Balancer?
Improves cache hit ratio
through consolidation
Lesson 1614Origin Shield Pattern
Improves user experience
by catching limits early
Lesson 1789Client-Side vs Server-Side Rate Limiting
In interviews
, estimations demonstrate:
Lesson 19Why Back-of-the-Envelope Estimation Matters
In real-world engineering
, they help you:
Lesson 19Why Back-of-the-Envelope Estimation Matters
In transit
means data moving across networks—from your database server to backup storage, or when retrieving backups for restoration.
Lesson 1409Backup Encryption and Security
In-app
Rich notification object with action buttons and images
Lesson 1692Channel-Specific Formatting
In-memory buffer
– New documents first go into a small, fast in-memory index structure.
Lesson 1754Real-Time Indexing and Updates
in-memory caches
for high-volume, low-value operations where brief inconsistency is acceptable.
Lesson 1040Idempotency Token Storage StrategiesLesson 1712Deduplication Windows and Storage
In-memory stores
like Redis keep all data in RAM.
Lesson 340In-Memory vs Persistent Key-Value Stores
In-place updates
Directly modify existing posting lists.
Lesson 1737Index Building and Updates
In-Sync Replica (ISR)
set is Kafka's list of replicas—including the leader—that are fully caught up with the latest messages.
Lesson 707In-Sync Replicas (ISR)
Inability to Scale Horizontally
Lesson 77Why Load Balancers Are Necessary
Inactive accounts
(user hasn't logged in for months)
Lesson 1653Selective Fanout Optimization
Inbound policies
handle authentication, rate limiting, and request transformation before reaching backends
Lesson 899Azure API Management Features
Incident Command System
is a structured framework borrowed from emergency response that assigns specific roles to coordinate your response effectively.
Lesson 1300Incident Command System (ICS)
Incident Commander (IC)
The single decision-maker who owns the incident.
Lesson 1300Incident Command System (ICS)
Incident visibility
Capture more data exactly when you need it most
Lesson 1255Adaptive Sampling
Include `Retry-After`
so clients know exactly when to retry
Lesson 960Rate Limit Response Codes
Incomplete metadata
that prevents proper restoration
Lesson 1430Backup Verification and Testing
Inconsistency Risk
Even with careful engineering, subtle differences in floating-point math, aggregation order, or time zone handling can make the two layers produce slightly different results.
Lesson 751Lambda Architecture Tradeoffs
Increase cache size
(more data fits)
Lesson 129Cache Hit Ratio Optimization
Increase costs
dramatically (you're storing and querying far more data)
Lesson 1258Cardinality Explosion
Increase MTBF
(make failures rarer) — better hardware, redundancy, graceful degradation
Lesson 1325Availability Formula: MTBF and MTTR Relationship
Increased complexity
that makes the system harder to maintain
Lesson 36YAGNI: You Aren't Gonna Need It
Increased latency
All requests wait for slow database responses
Lesson 159Cache Stampede ProblemLesson 336NewSQL Tradeoffs
Increased operational complexity
requiring more expertise
Lesson 1413The Cost-Availability Tradeoff
Increased propagation delay
– Changes must traverse multiple hops; a 3-level tree means 3 sequential replication steps
Lesson 1374Tree Replication Topology
Incremental backfill
Process data in small, controlled batches
Lesson 777Workflow Orchestration Patterns
incremental backup
captures only the data that has changed since the *last backup* — whether that was a full or another incremental.
Lesson 1403Incremental BackupsLesson 1422Incremental Backup StrategyLesson 1424Backup Scheduling and Frequency
Incremental suffix
Append an increasing counter: try `hash(url)`, then `hash(url + "1")`, `hash(url + "2")`, etc.
Lesson 1509Handling Hash Collisions
Incremental updates
Build delta indexes for new documents, merge periodically
Lesson 1746Index Construction at ScaleLesson 1772Real-Time Index Updates
Incrementing counters
`UPDATE account SET balance = balance + 100` — repeating adds $100 each time
Lesson 1006Natural Idempotency vs Engineered Idempotency
Indefinite
Never delete (storage permitting)
Lesson 695Stream Retention and Replay
Independence
Subscribers consume at their own pace without affecting others
Lesson 656Pub-Sub Pattern Fundamentals
Independent Deployment
You can deploy Service B's new version without touching Service A.
Lesson 648Decoupling Through MessagingLesson 781What are Microservices?Lesson 796Faster Development Cycles
Independent Development
Teams can build, test, and modify their services without coordinating schedules.
Lesson 648Decoupling Through Messaging
Independent evolution
Teams can modify their services without organizational bottlenecks
Lesson 788Organizational Alignment: Conway's LawLesson 904BFF vs Single Gateway Tradeoffs
Independent timeline caches
for faster local reads
Lesson 1682Scaling to Billions of Daily Active Users
index
creates a sorted lookup structure that points directly to the data, letting the database jump straight to relevant rows.
Lesson 307Indexes and Query OptimizationLesson 623Log Structure and EntriesLesson 628Election Restriction: Up-to-Date Check
Index alignment matters
Combine filter columns and sort columns in composite indexes: `INDEX(genre, published_date, id)` supports filtering by genre, sorting by date, and stable ordering by ID.
Lesson 1896Combining Pagination, Filtering, and Sorting
Index efficiently
Log management systems can index each field separately
Lesson 1137What is Structured Logging
Index foreign keys
used in joins
Lesson 278Index Strategy for Large Tables
Index Merge
Use separate indexes on `user_id` and `status`, fetch matching rows from each, then intersect the results.
Lesson 280Index Merge and Multi-Column Indexes
Index Node Cache
Even at the index level, individual shard results for common terms can be cached to avoid disk I/O.
Lesson 1771Query Caching Strategies
Index Position
The entry's position in the log (1, 2, 3, .
Lesson 623Log Structure and Entries
Index replication
creates copies of each index shard on different physical servers.
Lesson 1770Index Replication for Availability
Index Size Metrics
track the growth of your inverted index and posting lists.
Lesson 1777Query Performance Monitoring
Index structure
What indexes already exist or can be added efficiently?
Lesson 1895Default Sorting and Index Alignment
Index-friendly
Leverages existing indexes on the ordering column(s), making queries efficient even with millions of rows
Lesson 1890Keyset Pagination
Indexed queries
Ensure `expires_at` is indexed for fast lookups
Lesson 1568Scheduled Cleanup Job Design
Indexers
continuously process crawled documents, perform tokenization and text analysis, build inverted indexes with posting lists, and push completed index segments to storage.
Lesson 1742Search System Architecture Overview
Indexing Speed vs Completeness
Indexing every field makes queries fast but slows ingestion and bloats storage (indexes consume 30-50% of raw log size).
Lesson 1159Log Aggregation Performance Considerations
Indexing Strategy
Only public pastes should be indexed.
Lesson 1582Search and Discovery
Individual metrics dashboards
showing query latency, storage usage, and request rates
Lesson 1492Operational Complexity of Partitioning
Inference
If "Alice mentors Bob" and "Bob mentors Carol," you can infer transitive knowledge relationships
Lesson 475Knowledge Graphs and Semantic Networks
Inflexibility
Schema changes require ETL pipeline updates
Lesson 759Schema-on-Write vs Schema-on-Read
Inflexible
Can't quickly spin up new instances
Lesson 108Hardware vs Software Load Balancers
Influence analysis
Measuring reach in social networks
Lesson 464Traversal Queries: Friends of Friends
InfluxDB
offers strong write performance for IoT-scale ingestion.
Lesson 1208Choosing a Metrics System for Your Scale
INFO
Normal operational events (user logged in, job completed)
Lesson 1141Log Levels in Structured Logs
Info (P4/P5)
Informational alerts document noteworthy events for awareness and investigation, but require no immediate action.
Lesson 1291Alert Severity Levels
Information about the file
(metadata) – owner, size, upload date, permissions – structured data perfect for relational databases
Lesson 1590Metadata Database Design
Infrastructure Burden
Two separate systems mean double the monitoring, alerting, debugging tools, and operational expertise.
Lesson 751Lambda Architecture Tradeoffs
Infrastructure changes
(migrations, upgrades)
Lesson 1279Error Budgets: The Core Concept
Infrastructure costs
More disks, more cloud storage fees
Lesson 409Data Size and Storage Considerations
Infrastructure management
More load balancers, more network policies, more databases, more certificates to rotate
Lesson 803Operational Overhead
Ingestion servers
receive the raw stream from the broadcaster (typically using RTMP or WebRTC protocols).
Lesson 1630Live Streaming Architecture
Ingests streaming data
from message queues or event streams (Kafka, Kinesis)
Lesson 749Lambda Architecture: Speed Layer
Ingress (incoming)
Data arriving when users request shortened URLs
Lesson 1499Bandwidth Requirements for Redirects
Ingress-Only Mesh
Apply mesh features only at the cluster boundary—your ingress gateway.
Lesson 869Alternatives to Full Service Mesh
Initial construction
typically uses batch processing:
Lesson 1737Index Building and Updates
Initial scoping
– "We have ~5 million DAU" is clearer than 4,847,293
Lesson 32Rounding and Approximation Techniques
Initiate the transaction
The coordinator receives the transaction request from a client or application.
Lesson 569The Coordinator Role in 2PC
Initiation
Client requests an upload ID from the server
Lesson 1586Multipart Upload for Large Files
Inject
trace context into outgoing requests
Lesson 1223Instrumentation Basics
Inject artificial delays
at various points: add sleep statements in test environments, use proxy tools like Toxiproxy to introduce network latency, or leverage service mesh features to inject faults.
Lesson 1125Timeout Testing and Chaos Engineering
Inject synthetic failures
Temporarily degrade a non-production service to cross an SLO threshold
Lesson 1295Testing Alerts and Dry Runs
Injecting latency
adds artificial delays to network calls or internal operations.
Lesson 1347Common Chaos Experiments
Inline
simplifies application logic but the cache becomes a critical component in the data path (potential bottleneck or failure point).
Lesson 142Look-Aside vs Inline Cache Topologies
Innovation stops
Teams avoid risk entirely
Lesson 1310Embracing Risk: The 100% Availability Trap
Input/output data
from each step for idempotency checks
Lesson 597Saga State Management and Persistence
Insert
the new short URL mapping once unique
Lesson 1512Random String Generation
INSERT IF NOT EXISTS
Prevents duplicate record creation.
Lesson 1015Conditional Writes for Idempotency
Inspection
→ Engineers examine DLQ messages to diagnose issues
Lesson 687Dead Letter Queues
Install on lagging followers
If a follower is too far behind, the leader sends the snapshot via `InstallSnapshot RPC` instead of individual log entries
Lesson 632Log Compaction: Snapshotting
Instance termination
Kill a primary server and verify the passive takes over
Lesson 1342Testing Redundancy with Fault Injection
Instant Cache Hits
Popular URLs are ready immediately after cache restarts
Lesson 1529Preloading Hot URLs into Cache
Instant distribution
No coordination needed—every server generates independently
Lesson 1516Counter-Based vs UUID Approaches
Instant Purge (Hard Invalidation)
Lesson 1617Cache Invalidation and Purging
Instant revocation
Delete the session record, user is logged out immediately
Lesson 916Session vs Token Tradeoffs
Instantaneous effect
Each operation takes effect at a single, atomic point in time during its execution
Lesson 523Linearizability Defined
Int32/Int64
Explicit integer types (JSON only has generic "number")
Lesson 390BSON Format and Data Types
Integration testing
requires multiple services running simultaneously.
Lesson 806Testing Complexity
Integration with notification processing
Lesson 1728Opt-Out and Compliance Tracking
Intelligent routing
based on time, severity, and team schedules
Lesson 1305On-Call Tooling and Automation
Interactive workflows
User is waiting for confirmation (e.
Lesson 654When to Use Async vs Sync
Interchangeable servers
Any server instance can handle any request because there's no "sticky" data tied to a specific server
Lesson 55What Makes a Service Stateless
Intermediate node failure risk
– If a middle node fails, its entire subtree loses updates until recovery
Lesson 1374Tree Replication Topology
Internal Load Balancer
Private load balancing within VPCs
Lesson 114Cloud Load Balancers (GCP and Azure)
Internal responder channel
Technical details, debugging output, hypotheses
Lesson 1301War Rooms and Communication Channels
Internal RPC
Context objects with deadline fields
Lesson 1113Cross-Protocol Deadline Handling
Intersection logic
that combines multiple active filters using AND/OR operations
Lesson 1775Faceted Search and Filters
Introduce randomness
into your shard key distribution:
Lesson 248Avoiding Hotspots in Sharding
Invalid
Gateway returns `401 Unauthorized` immediately—request never reaches backend
Lesson 883Authentication at the Gateway
Invalidate
the cache entry, forcing next read to rebuild (simpler, slightly slower)
Lesson 1664Timeline Caching StrategiesLesson 1722Real-Time Preference Updates
Invalidate immediately
(remove/update the cache entry)
Lesson 155Cache Invalidation Problem
Invalidate the application cache
(delete the key from Redis/Memcached)
Lesson 163Multi-Level Cache Invalidation
Invalidate the CDN cache
(purge or mark the edge content stale)
Lesson 163Multi-Level Cache Invalidation
Invalidate the database cache
(clear query result buffers)
Lesson 163Multi-Level Cache Invalidation
Invalidation
When data changes, mark cached copies as invalid (delete them).
Lesson 128Cache Coherence Across Layers
Invalidation strategy
When permissions change, proactively invalidate affected cache entries rather than waiting for TTL expiration.
Lesson 951Caching Authorization Decisions
Inventory
Tracks stock levels, reservations
Lesson 815Domain-Driven Design and Bounded Contexts
Inventory management
Stock levels that must stay accurate
Lesson 322Transaction Requirements and Trade-offs
Inventory systems
must prevent overselling
Lesson 317ACID vs BASE Tradeoffs
Inventory updates
Decrementing stock twice causes incorrect counts
Lesson 1001Side Effects and Idempotency
Inverse Document Frequency (IDF)
How rare is this term across all documents?
Lesson 1740TF-IDF Scoring Fundamentals
inverted index
is essentially a giant lookup table that flips the traditional document-to-words relationship upside down.
Lesson 1735Inverted Index StructureLesson 1740TF-IDF Scoring FundamentalsLesson 1743What Is an Inverted Index
Inverted indexes
It creates lookup tables mapping terms to document locations (like a book's index)
Lesson 1150The ELK Stack: Elasticsearch
IP address
Routes based on the client's IP (similar to IP Hash algorithm we covered earlier)
Lesson 94Session Affinity (Sticky Sessions)Lesson 1783Functional Requirements for Rate Limiter
IP address rotation
distributes requests across multiple source IPs, making your crawler appear as many independent clients and spreading the load.
Lesson 1860IP Address Rotation and Geolocation
IP Hash
can create imbalances if client IPs aren't evenly distributed.
Lesson 96Algorithm Selection Tradeoffs
IP Hash algorithm
is a load balancing strategy that uses the client's IP address to determine which backend server handles their request.
Lesson 89IP Hash AlgorithmLesson 90Consistent Hashing for Load Balancing
Isolate
transactions using local locks
Lesson 567The ACID Problem in Distributed Systems
Isolate failures
Don't let one component's failure cascade to others
Lesson 1336Graceful Degradation
Isolated data stores
Each service owns its data (no shared databases)
Lesson 791Independent Deployability
Isolated failures
Problems don't cascade to healthy shards
Lesson 266Shard Failure and Partial Outages
Isolation level implementations
(what "serializable" actually means varies)
Lesson 582Transaction Isolation Across Systems
Issuing certificates
for each service identity
Lesson 851Mutual TLS (mTLS) Authentication
Iterate Based on Constraints
when the requirements actually change.
Lesson 36YAGNI: You Aren't Gonna Need It

J

Jaeger
and **Zipkin** offer full control, zero licensing costs, and community support.
Lesson 1251Choosing a Tracing System
Jaeger Agent
acts as a local daemon on each host (typically a sidecar).
Lesson 1241Jaeger Architecture and Components
Jaeger Collector
receives traces from multiple agents, validates them, runs processing pipelines (enrichment, indexing), and writes them to persistent storage.
Lesson 1241Jaeger Architecture and Components
Jaeger Query Service
provides the API and UI for retrieving and visualizing traces.
Lesson 1241Jaeger Architecture and Components
Jaeger wins on scalability
its separated components (agent, collector, query) provide better horizontal scaling and fault isolation for large-scale systems.
Lesson 1242Zipkin Architecture and Design
Java Message Service (JMS)
specification—a Java API standard for messaging middleware.
Lesson 671ActiveMQ and Traditional Enterprise Messaging
Jittered reconnection delays
so clients don't all retry at once
Lesson 1081Thundering Herd After Recovery
Job distribution
happens automatically: each worker polls the queue and claims the next available job.
Lesson 1605Distributed Worker Architecture
JobManager
acts as the conductor, coordinating the entire data flow application.
Lesson 770Apache Flink Architecture
Join operations
Combining preferences with user profiles or subscription tiers.
Lesson 1721Preference Storage Strategy
Joins
Combine two streams or a stream with a table (e.
Lesson 722Kafka Streams API
Journey ends here
no origin contact needed.
Lesson 175CDN Request Flow: DNS to Edge to Origin
JSON:API filter specification
More flexible JSON structures passed as query values or request bodies.
Lesson 1893Complex Filtering with Query Languages
JWT (JSON Web Token)
is the most common format.
Lesson 61Stateless Authentication with Tokens
JWT Tokens
embed tenant information in signed tokens.
Lesson 1818Tenant Identification and Context

K

K accesses
for each cache entry, not just one.
Lesson 151LRU-K and Advanced LRU Variants
Kafka is self-managed infrastructure
(even on AWS MSK, you manage clusters), while **Kinesis is a fully-managed AWS service**.
Lesson 729Kinesis vs Kafka Tradeoffs
Kafka Streams
is a lightweight library that runs within your application (no separate cluster needed).
Lesson 744Stream Processing Frameworks
Keep cold content centralized
Store rarely-accessed media in fewer regions to minimize costs
Lesson 1631Multi-Region Replication Strategy
Keep it light
This check must happen in milliseconds.
Lesson 1533Access Control and Private URLs
Keep it lightweight
Push complex logic to services when possible
Lesson 877The API Gateway Bottleneck Risk
Keep this trace
Sample flag = true
Lesson 1238Span Sampling Decisions
Keeping all error traces
regardless of sampling rate
Lesson 1254Tail-Based Sampling
Keeps hot URLs cached
Frequently accessed links naturally get touched recently, staying in cache
Lesson 1525Cache Eviction Policy for URL Shortener
Key construction
Create a key combining the user ID and current time window (e.
Lesson 1794Redis-Based Rate Limiting with INCR
Key difference
HLS is Apple-native and simpler; DASH is codec-agnostic and more flexible.
Lesson 1625Adaptive Bitrate Streaming
Key-Shared
Maintains ordering per key while distributing load
Lesson 731Pulsar's Unique Features
Keyword/Pattern Detection
Flag common spam phrases, known malware signatures, or phishing patterns
Lesson 1581Abuse Prevention and Content Moderation
Killing instances
is the simplest experiment.
Lesson 1347Common Chaos Experiments
Kinesis Data Streams
that act like Kafka topics—ordered sequences of data records that multiple applications can read independently.
Lesson 728AWS Kinesis Overview
Knowledge gaps
the expert who wrote the procedure is unavailable
Lesson 1441Runbooks and Automation
KRaft
(Kafka Raft) is the newer approach that eliminates ZooKeeper by implementing consensus directly within Kafka itself, using a Raft-based protocol among controller brokers.
Lesson 704Brokers and Cluster ArchitectureLesson 715ZooKeeper vs KRaft Mode
Kubernetes-native
chaos engineering platform built on custom resources and operators.
Lesson 1348Chaos Engineering Tools

L

L1 cache
(~32-64 KB): Fastest, closest to CPU cores, accessed in ~1 nanosecond
Lesson 127CPU and Disk Caching Layers
L2 cache
(~256 KB-1 MB): Slightly slower, often per-core or shared between pairs
Lesson 127CPU and Disk Caching Layers
L3 cache
(~8-32 MB): Slowest CPU cache, shared across all cores, still 10-100x faster than RAM
Lesson 127CPU and Disk Caching Layers
Label
ground truth relevance (explicit ratings or implicit signals like clicks, engagement)
Lesson 1756Machine Learning for Ranking (Learning to Rank)
Lag-aware load balancing
means your load balancer (or routing layer) actively monitors each replica's current lag and makes intelligent decisions about where to send read queries.
Lesson 218Lag-Aware Load Balancing
Lakes
Highly compressed files (Parquet with Snappy/Gzip) minimize storage bills.
Lesson 763Cost and Storage Efficiency
Language freedom
Teams choose the best tool for their domain without networking constraints
Lesson 833Polyglot Microservices Support
Language lock-in
Libraries require per-language implementations; service meshes work with any language
Lesson 830Service Mesh vs Library-Based Solutions
Language-Specific Tokenization
Different languages break words differently.
Lesson 1768Typeahead for Multi-Language Support
Large scale (10,000 req/sec)
180ms × 10,000 = 1,800 seconds (30 minutes!
Lesson 276Why Query Optimization Matters at Scale
Large server
(8 cores, 16GB RAM): 200 virtual nodes
Lesson 1465Heterogeneous Node Weights
Large videos
(~hundreds of MB or GB): Asynchronous processing is essential.
Lesson 1598Synchronous vs Asynchronous Processing
Large-scale analytics workloads
If you need both operational random access AND analytical batch processing on the same data, HBase/BigTable bridge that gap.
Lesson 442When to Use HBase or BigTable
Larger payload
JWT can be hundreds of bytes vs small session ID
Lesson 916Session vs Token Tradeoffs
Last fetch time
Timestamp of the most recent successful request
Lesson 1848Politeness Table and Per-Host State
Last-Write-Wins (LWW)
is the simplest: each write carries a timestamp, and during conflict detection, the system keeps the write with the most recent timestamp and discards the others.
Lesson 1380Last-Write-Wins (LWW) Strategy
Latency delays
Add 5 seconds to 10% of requests to a specific service
Lesson 858Fault Injection for Testing
Latency improvement
Faster access for end users
Lesson 1631Multi-Region Replication Strategy
Latency is invisible
CAP treats availability as binary—either you respond or you don't—but says nothing about *how fast* you respond.
Lesson 492Limitations of CAP as a Framework
Latency patterns
Sample more slow requests to investigate performance degradation
Lesson 1255Adaptive Sampling
Latency percentiles
lower limits when response times degrade
Lesson 972Adaptive Rate Limiting
Latency requirements
How quickly must insights or results be available?
Lesson 746Choosing Batch vs Stream
Latency SLO
measures how fast requests complete.
Lesson 1278Multiple SLOs for Comprehensive Coverage
Latency spike
The celebrity's post takes much longer to complete than a normal user's
Lesson 1640Celebrity Problem in Push Models
Latency Targets
Users expect feed content instantly.
Lesson 1633Non-Functional Requirements: Scale and Performance
Latency Tradeoffs
Cross-region network calls can add 50-200ms depending on distance.
Lesson 1435Multi-Region Architecture for DR
Latency vs Consistency
Centralized stores add network hops but guarantee single source of truth
Lesson 947Distributed Session Management
Launch events
Before releasing a new video, product page, or software update, push it to all relevant edge servers so the flood of initial requests hits warm caches.
Lesson 184Cache Warming and Preloading
Layer 4 (Transport Layer)
and **Layer 7 (Application Layer)** of the OSI model.
Lesson 80Layer 4 vs Layer 7 Load Balancing
Layer 7 (Application Layer)
of the OSI model.
Lesson 80Layer 4 vs Layer 7 Load Balancing
Layer 7 capable
can inspect HTTP headers, URLs, and cookies for intelligent routing
Lesson 111NGINX as a Load Balancer
lazy deletion
approach prevents runtime overhead during redirects while keeping the database lean.
Lesson 1532Expiration and Time-to-LiveLesson 1567Lazy vs Eager Deletion StrategiesLesson 1812Lazy Deletion and Background Cleanup
Lazy Deletion (Delete-on-Access)
The system checks expiration only when someone tries to read a paste.
Lesson 1567Lazy vs Eager Deletion Strategies
Lazy evaluation
Only format log messages if they'll actually be written (check log level first).
Lesson 1133Logging Performance Impact
Lazy expiration
Let old keys expire naturally rather than actively cleaning them
Lesson 977Algorithm Implementation Patterns
LCS
Better read performance, use when reads dominate and disk space is limited
Lesson 428Compaction Strategies
Leader Completeness Property
is Raft's fundamental safety rule: *if a log entry is committed in a given term, that entry will be present in the logs of all leaders for all higher-numbered terms.
Lesson 627Safety: Leader Completeness PropertyLesson 628Election Restriction: Up-to-Date CheckLesson 629Log Inconsistencies and Repair
Leader election complexity
Multi-Paxos requires efficient leader election, which isn't fully specified in basic Paxos
Lesson 617Why Paxos Is Difficult in Practice
Leader fails
→ Term effectively ends, new election starts a new term
Lesson 620Terms: Logical Time in Raft
Leader logs the change
– The write is recorded (often as a replication log entry)
Lesson 1365Single-Leader Replication Topology
Leader processes the write
→ Updates its local database
Lesson 71Single-Leader Replication Model
Leader propagates changes
→ Sends the update to all followers
Lesson 71Single-Leader Replication Model
Leader replica
One broker holds the primary copy and handles all reads and writes for that partition
Lesson 705Replication and Fault Tolerance
Leader-based replication
Like Raft, one server is the leader handling writes; followers replicate
Lesson 633ZooKeeper: Coordination Service Built on Consensus
Leader's term
So followers can verify authority
Lesson 624AppendEntries RPC: Replication Mechanism
Leaderboards
in games that update within a known window
Lesson 549Bounded Staleness
Leaderless (Dynamo-style) replication
Any replica can accept writes directly
Lesson 1377What Are Replication Conflicts?
leaderless replication
model (also called peer-to-peer replication), there is no designated leader node.
Lesson 73Leaderless Replication ModelLesson 1371Leaderless Replication (Dynamo-Style)
Leading column matters
index `(A, B)` helps queries filtering on `A` alone, but not `B` alone
Lesson 278Index Strategy for Large Tables
Leaf nodes
Each key-value pair (or range of keys) is hashed
Lesson 376Anti-Entropy with Merkle Trees
Learn your domain
Early in a project, service boundaries are unclear.
Lesson 825Starting with a Modular Monolith
Learners
discover which value was chosen once a quorum of Acceptors agrees.
Lesson 610The Three Roles in Paxos
Learning curve
for your team
Lesson 37Prefer Boring Technology
Learning to Rank (LTR)
uses supervised machine learning to train models on historical click data, optimizing for the ranking order that maximizes user satisfaction rather than adhering to fixed mathematical formulas.
Lesson 1756Machine Learning for Ranking (Learning to Rank)
Learning-to-rank models
Historical clicks become training labels—clicked results are positive signals, skipped ones may be negative
Lesson 1779Search Analytics and Click Tracking
Least Connections Algorithm
makes routing decisions based on *real-time server load*.
Lesson 87Least Connections Algorithm
Least Frequently Used (LFU)
eviction policy removes cache items based on how *often* they're accessed, not how *recently*.
Lesson 147Least Frequently Used (LFU)
Least Recently Used (LRU)
removes the item that hasn't been accessed (read or written) for the longest time.
Lesson 146Least Recently Used (LRU)Lesson 1525Cache Eviction Policy for URL Shortener
Least Response Time
algorithm routes traffic to the server that can respond fastest, combining both response speed and active load.
Lesson 92Least Response Time AlgorithmLesson 96Algorithm Selection Tradeoffs
Leave
remaining functionality in the monolith temporarily
Lesson 822The Strangler Fig Pattern for Migration
Legitimate traffic continues flowing
Because the CDN has cache hits and established routing patterns, real users still access your content from edge caches while attack traffic gets filtered.
Lesson 189DDoS Protection and Security at CDN Edge
Lends
an available connection to a request that needs database access
Lesson 267What is Connection Pooling
Length 8
62^8 = ~218 trillion unique URLs
Lesson 1501Collision Probability and Namespace Size
Length constraints
(SMS truncation, push title limits)
Lesson 1692Channel-Specific Formatting
Length enforcement
String types have maximum sizes
Lesson 301Schema Enforcement and Type Safety
Length penalty
Standard UUIDs are 36 characters; even truncated versions need ~16-22 characters to maintain low collision probability
Lesson 1516Counter-Based vs UUID Approaches
Less duplication
Write authentication, rate limiting, and logging logic once
Lesson 904BFF vs Single Gateway Tradeoffs
Less mature tooling
for monitoring and debugging
Lesson 37Prefer Boring Technology
Lessons Learned
Broader insights about your system's behavior
Lesson 1350What is a Postmortem?
Level discipline
Use DEBUG/TRACE sparingly; keep INFO logs minimal in hot paths.
Lesson 1133Logging Performance Impact
Leveled Compaction (LCS)
Lesson 428Compaction Strategies
LFU (Least Frequently Used)
Removes rarely-accessed items—better for popularity-based caching
Lesson 355Redis as a CacheLesson 1525Cache Eviction Policy for URL Shortener
Library-Based Solutions
Instead of sidecar proxies, embed retry logic, circuit breaking, and observability directly into application code using libraries (like Resilience4j, Hystrix, or Polly).
Lesson 869Alternatives to Full Service Mesh
Lifecycle policies
Automate transitions after N days without access
Lesson 1623Cold Storage and Archival
Lightweight
Don't overload the system with expensive checks
Lesson 1339Health Checks and Failure Detection
Lightweight and fast
handles tens of thousands of concurrent connections efficiently
Lesson 111NGINX as a Load Balancer
Limit high-cardinality dimensions
Never use unbounded labels like `user_id`, `session_id`, or `ip_address`.
Lesson 1210Cardinality Management
Limit non-critical features
disable recommendation engines, advanced search filters, or analytics tracking while keeping core purchase flows operational.
Lesson 963Graceful Degradation with Rate Limits
Limit tag sets
to meaningful dimensions for debugging (service, endpoint, status_code)
Lesson 1258Cardinality Explosion
Limitations
Cache exists per application instance only—doesn't scale across multiple servers, memory- constrained, data lost on restart
Lesson 122Application-Level In-Memory CachingLesson 1373Chain ReplicationLesson 1791Single Data Center vs Distributed Setup
Limited Engineering Resources
Lesson 239When Not to Shard
Limited impact
If you have 8 shards and one fails, roughly 1/8 of users are affected
Lesson 266Shard Failure and Partial Outages
Limited intelligence
Cannot route based on URLs, headers, or content
Lesson 109Layer 4 (Transport) Load Balancing
Limited network traffic
Only one node receives data during removal
Lesson 1461Removing Nodes Gracefully
Limited Operations Team
Service meshes require expertise to configure, troubleshoot, and operate.
Lesson 835When You Don't Need a Service Mesh
Limited scalability
More replicas = longer write times
Lesson 1355Synchronous Replication: Guarantees and Costs
Limited scale
One data center's capacity is your ceiling
Lesson 1791Single Data Center vs Distributed Setup
Limited test requests
are permitted (often just one, or a small percentage)
Lesson 1047The Three States: Half-Open
Limited write throughput
(single-leader bottleneck)
Lesson 530Strong Consistency in Practice
Line charts
showing error rate over time
Lesson 1152The ELK Stack: Kibana
Lineage tracking
Clear documentation of where data came from and how it was transformed
Lesson 764Data Governance and Quality
Linear scalability
Add nodes as traffic grows
Lesson 1822Scaling Rate Limiter Horizontally
Linear scaling
becomes possible: double the instances, roughly double the capacity
Lesson 57Scaling Stateless Services HorizontallyLesson 1854Distributed URL Deduplication
Linearizability or strong consistency
Every read reflects the most recent write across all nodes
Lesson 493CP Systems: Prioritizing Consistency
Linearizable reads
from the leader: requires a heartbeat round trip to confirm leadership
Lesson 640Performance Characteristics of Consensus
Link Analysis
Extract URLs and check them against blacklists
Lesson 1581Abuse Prevention and Content Moderation
Link Extractor
Parses pages to find new URLs
Lesson 1732Crawling and Document Collection
List partitioning
groups related data but may have uneven sizes
Lesson 1453Composite Partitioning
List T₁
Recently-used items (seen once)
Lesson 152Adaptive Replacement Cache (ARC)
List T₂
Frequently-used items (seen multiple times)
Lesson 152Adaptive Replacement Cache (ARC)
Live notifications
Publish user-specific events to channels like `user:123:notifications`
Lesson 357Redis Pub/Sub for Real-Time Messaging
load balancers
, or **edge services**—the first components that handle incoming traffic from clients or external systems.
Lesson 1239Root Span and Entry PointsLesson 1321Redundancy and Parallel AvailabilityLesson 1674Connection Management at Scale
Load balancing strategies
Round-robin, least-connections, etc.
Lesson 842Control Plane: Configuration Management
Load Leveling
The queue acts as a shock absorber, smoothing out bursts of incoming messages so consumers can process them at a steady, sustainable rate.
Lesson 647Message Queue BasicsLesson 659Queue Use Cases: Work Distribution
Load shedding
at the recovering service (accept only partial load initially)
Lesson 1081Thundering Herd After Recovery
Load smoothing
means using a message queue as a buffer between producers (incoming requests) and consumers (processing services).
Lesson 649Load Smoothing and Backpressure
local cache
of recently received notification IDs (typically a hash set or sliding window buffer).
Lesson 1714Client-Side DeduplicationLesson 1801Local Caching for Performance
Local Communication
Sidecar-to-service communication uses localhost, avoiding physical network latency
Lesson 841Data Plane: Performance and Latency Overhead
Local development
Pretty-printed or line-formatted output for human consumption
Lesson 1166Human-Readable vs Machine-Parseable
Local disk
Fast append-only log files or snapshots
Lesson 1849URL Frontier Persistence and Recovery
Local numbering
Better delivery using local phone numbers vs international
Lesson 1685SMS Notifications
Local read replicas
of the follow graph
Lesson 1682Scaling to Billions of Daily Active Users
Local Transaction 1
Reserve hotel room → **Compensation**: Cancel hotel
Lesson 589Saga Fundamentals: Local Transactions and Compensations
Local Transaction 2
Book flight → **Compensation**: Cancel flight
Lesson 589Saga Fundamentals: Local Transactions and Compensations
Local Transaction 3
Charge credit card → **Compensation**: Refund card
Lesson 589Saga Fundamentals: Local Transactions and Compensations
Local write
"Confirm as soon as one replica accepts it"
Lesson 1398Consistency Level Per-Operation
Localization
Automatically serve the right language based on user preferences without worker-level logic.
Lesson 1701Template Service for Content
Localization variants
(English, Spanish, French versions of the same template)
Lesson 1701Template Service for Content
Localized logging
means writing logs directly to disk on the server where your application runs.
Lesson 1169Centralized vs Localized Logging
locally
using shared public keys, eliminating the need to call the auth service for every request.
Lesson 950Auth Service Single Point of FailureLesson 1804Multi-Region Rate Limiting Challenges
Location
Typing "pizza" near downtown might prioritize "Pizza Palace on 5th Street" over generic "pizza recipes," using GPS coordinates or IP geolocation.
Lesson 1767Personalized Typeahead
Lock contention
when multiple threads need to update the same tracking structures
Lesson 154Implementation TradeoffsLesson 509Latency: The Hidden Cost of CAP
Lock resources
if voting YES, the participant promises not to roll back unilaterally
Lesson 570Phase 1: Prepare Phase
Lock timeout policies
(one system might abort while another waits)
Lesson 582Transaction Isolation Across Systems
Lock-based concurrency control
preventing dirty reads
Lesson 308Strong Consistency by Default
Locking mechanisms
(pessimistic vs optimistic)
Lesson 582Transaction Isolation Across Systems
Log audits
Periodically scan stored logs for sensitive patterns
Lesson 1163Avoid Logging Sensitive Data
Log buffering
acts like a holding area between your application and the aggregator.
Lesson 1155Log Buffering and Backpressure
Log context
about what was attempted for debugging
Lesson 1115Deadline Exceeded Error Handling
Log entries
New commands to replicate (if any)
Lesson 624AppendEntries RPC: Replication Mechanism
Log in background
Write click metadata to a **message queue** (Kafka, RabbitMQ) or fast write buffer
Lesson 1530Analytics and Click Tracking
Log length as tiebreaker
Longer log = more up-to-date
Lesson 627Safety: Leader Completeness Property
Log positions
Each consensus decision corresponds to a numbered slot in the replicated log (slot 1, slot 2, etc.
Lesson 616Multi-Paxos for Log Replication
Log replication
How the leader distributes entries to followers
Lesson 618Raft Overview: Understandability as a Design Goal
Log sanitization pipelines
Process logs through a scrubbing layer before they reach storage systems.
Lesson 1145Sensitive Data in Structured Logs
Log Shippers/Agents
Lightweight processes running on each service host that collect logs and forward them (e.
Lesson 1148Centralized Logging Architecture
Log the cancellation
for observability
Lesson 1094Timeout Cancellation and Cleanup
Log the incoming timeout
what budget did this service receive?
Lesson 1106Timeout Propagation Observability
Log the outgoing timeout
what budget did we pass to downstream services?
Lesson 1106Timeout Propagation Observability
Log4j
(Java): The veteran framework with hierarchical loggers, multiple appenders (file, console, syslog), and extensive configuration options.
Lesson 1136Logging Libraries and Standards
Logging & Monitoring
The gateway captures request/response metadata, timing, and errors consistently across all services without each team implementing their own logging format.
Lesson 876API Gateway as a Cross-Cutting Concern Hub
Logging and analytics
Losing a few log entries during a crash is acceptable
Lesson 137Write-Behind: Risks and Use Cases
Logical separation
Different document types don't mix
Lesson 383Collections and Databases
Logical shards
are data partitions defined by ranges or hash buckets of your shard key—they exist as a concept independent of physical hardware.
Lesson 235Logical vs Physical Shards
Login
Client sends credentials to the server
Lesson 912Token-Based Authentication Fundamentals
Logout is tricky
Need token blacklists (reintroducing state!
Lesson 916Session vs Token Tradeoffs
Long intervals
(every 30-60 seconds) reduce overhead but mean users might hit a dead server for longer before the load balancer notices.
Lesson 100Health Check Intervals and Timeouts
Long retention periods
Often 7 years for financial compliance
Lesson 944Auditing and Compliance for Authorization
Long timeouts
(10s) are more forgiving but delay detection of truly failed servers.
Lesson 100Health Check Intervals and Timeouts
Long TTL possible
Files can be cached for hours, days, or weeks
Lesson 173Content Types Suited for CDNs
Long-lived sessions
(WebSocket connections, streaming)
Lesson 87Least Connections Algorithm
Long-running operations
Tasks take seconds or minutes (e.
Lesson 654When to Use Async vs Sync
Long-running processes
Transactions span minutes to hours (e.
Lesson 598Saga Frameworks and Real-World Adoption
Long-running tasks
Data exports, PDF generation
Lesson 659Queue Use Cases: Work Distribution
Long-running transactions or workflows
that span multiple steps over minutes or hours—like filling out a multi-page form with real-time validation, or a collaborative editing session—benefit from keeping that context alive in memory rather than constantly retrieving it from external st...
Lesson 62When Stateful Services Are Necessary
Long-term access patterns
Popularity doesn't change rapidly
Lesson 147Least Frequently Used (LFU)
Long-term quota tracking
Decrement counters across larger windows (daily/monthly)
Lesson 994Quota Management and Burst Allowances
Longer intervals
(1–5s): Lower load, risk of burst violations
Lesson 1802Synchronization Strategies for Local Caches
Longer recovery time
RTO increases with chain length
Lesson 1422Incremental Backup Strategy
Longer TTLs (15-60 minutes)
Better performance but higher security risk if permissions change frequently.
Lesson 942Caching Authorization Decisions
Look up keys quickly
to avoid slowing down requests
Lesson 1011Idempotency Key Storage and Lookup
Look-Aside
gives you full control but requires more application code.
Lesson 142Look-Aside vs Inline Cache Topologies
Looks up
routing rules in its configuration
Lesson 1907Gateway-Level Version Routing
Looks up their tier
from tenant configuration (lesson 1819)
Lesson 1824Tiered Rate Limiting
Loose coordination
Accept 10-20% over-limit as acceptable error in exchange for <10ms local decisions.
Lesson 1804Multi-Region Rate Limiting Challenges
Loose coupling
means components don't know (or care) about each other's internal details.
Lesson 38Design for ChangeLesson 791Independent Deployability
Loss of customer trust
and potential legal liability
Lesson 1002The Double-Charge Problem
LOUDS
encodes the trie structure as two bit vectors: one for tree shape (using level-order traversal), another for labels.
Lesson 1759Trie Space Optimization Techniques
Low coupling
means services depend minimally on each other, communicating through well-defined interfaces rather than sharing internal details.
Lesson 818High Cohesion, Low Coupling in Service Design
Low Latency Priority
Results appear in milliseconds or seconds, not hours.
Lesson 737What is Stream Processing?
Low network overhead
Each health check consumes bandwidth and server resources
Lesson 100Health Check Intervals and Timeouts
Low performance impact
during snapshot creation
Lesson 1426Snapshot-Based Backups
Low priority queue
Marketing emails, digests, recommendations
Lesson 1700Priority Queues and Urgency Levels
Low resolution, long retention
5-minute or hourly aggregates kept for 1+ years—captures long-term trends cheaply
Lesson 1270Monitoring Resolution and Retention Tradeoffs
Low-lag replicas
get priority for latency-sensitive reads
Lesson 218Lag-Aware Load Balancing
Low-latency inference
Models must score documents in milliseconds
Lesson 1781Machine Learning for Ranking
Low-resolution (1 hour)
Keep 1-2 years for long-term capacity planning
Lesson 1213Metric Retention Policies
Low-risk scenarios
where sequential IDs work fine:
Lesson 1515Short URL Predictability Tradeoffs
Lower accuracy, lower latency
Local counters with periodic sync via gossip protocols.
Lesson 985Trade-offs: Accuracy vs Latency
Lower infrastructure costs
Fewer running services
Lesson 904BFF vs Single Gateway Tradeoffs
Lower memory footprint
No dedicated threads sitting idle
Lesson 1070Semaphore-Based Bulkheads: Limiting Concurrent Requests
Lower Operational Complexity
Your operations team manages one deployment unit.
Lesson 783Deployment Simplicity: Monolith Advantage
Lower priority
Sub-millisecond global consistency (nice-to-have)
Lesson 18Prioritizing Requirements Under Constraints
Lower storage cost
on the monitoring backend (only sending a few percentile values)
Lesson 1186Summary Metrics
Lower write latency
– Users see faster response times
Lesson 1356Asynchronous Replication: Speed and Risk
Lowercasing
"Search" → "search" (so "Search" matches "search")
Lesson 1733Document Processing Pipeline
Lowers origin bandwidth costs
significantly
Lesson 1614Origin Shield Pattern
Lowest latency path
Writes propagate directly between any two leaders without routing through intermediaries
Lesson 1369Multi-Leader Topologies: All-to-All
LRU
(Least Recently Used) tracks *when* items were last used, **LFU** tracks *how many times* each item has been accessed.
Lesson 147Least Frequently Used (LFU)Lesson 152Adaptive Replacement Cache (ARC)Lesson 154Implementation Tradeoffs
LRU (Least Recently Used)
Removes items not accessed recently—great for hot data
Lesson 355Redis as a Cache
LRU cache
to automatically evict old entries.
Lesson 1714Client-Side Deduplication
LRU-K
tracks the last **K accesses** for each cache entry, not just one.
Lesson 151LRU-K and Advanced LRU Variants
LSH tables
to quickly find candidates with low Hamming distance.
Lesson 1855Near-Duplicate Detection with Simhash
LSM tree
separates writes from reads in time.
Lesson 415Write Path and LSM Trees
LSM tree-based storage
.
Lesson 433What is HBase?

M

Machine ID
(10 bits): which server generated this
Lesson 1511Distributed ID Generation
Machine Learning Model Training
Lesson 738Batch Processing Use Cases
Machine-parseable
means consistent field names, predictable types, and structured formats (JSON, structured text).
Lesson 1166Human-Readable vs Machine-Parseable
Mailgun
that handle the heavy lifting:
Lesson 1686Email Notifications
Main index on disk
– The larger, durable index already built from existing documents.
Lesson 1754Real-Time Indexing and Updates
Maintain backward compatibility
when service contracts change
Lesson 882Request and Response Transformation
Maintain consistency
one source of truth for ownership and permissions, even if storage tiers change
Lesson 1590Metadata Database Design
Maintain leadership
Proves the leader is operational
Lesson 624AppendEntries RPC: Replication Mechanism
Maintains open connections
that preserve state (WebSocket connections, database transactions)
Lesson 56What Makes a Service Stateful
Maintains performance
– prevents individual partitions from becoming bottlenecks
Lesson 1475Dynamic Range Splitting
Maintenance burden
Libraries need updates in every service; proxies update centrally through control plane
Lesson 830Service Mesh vs Library-Based Solutions
Maintenance window limits
Often capped at X hours per month or quarter
Lesson 1328Scheduled Maintenance and Availability Accounting
Maintenance windows
Restart a consumer service without blocking producers
Lesson 650Temporal Decoupling
MAJOR
Breaking changes—clients *must* update their code
Lesson 1906Semantic Versioning for APIs
Major compaction
merges all SSTables for a tablet into one file, removing deleted entries and old versions
Lesson 449Read Path and Compaction
majority agreement
if more than half the nodes agree, that's enough.
Lesson 605Quorums and Majority AgreementLesson 636Consensus for Leader Election
Make additive changes first
Add new columns as nullable or with defaults
Lesson 265Schema Changes in Sharded Environments
Make the call anyway
with a minimal timeout (e.
Lesson 1102Handling Zero or Negative Timeouts
Making optional fields required
Requests missing the now-mandatory field get rejected
Lesson 1905Breaking vs Non-Breaking Changes
Making required fields optional
Relaxing requirements never breaks callers
Lesson 1905Breaking vs Non-Breaking Changes
Managed
Pay-per-request or tiered pricing.
Lesson 900Open-Source vs Managed Gateway Tradeoffs
Managed control plane
You don't operate the control plane infrastructure yourself.
Lesson 864AWS App Mesh and Cloud-Native Meshes
Managed gateways
offload this burden to the cloud provider.
Lesson 900Open-Source vs Managed Gateway Tradeoffs
Managed services
provide polished, production-ready features out-of-the-box—often with better integrations for their cloud ecosystem (serverless functions, IAM, logging).
Lesson 900Open-Source vs Managed Gateway Tradeoffs
Managed/serverless
(zero ops): SQS, SNS, Google Cloud Pub/Sub, Azure Service Bus
Lesson 676Choosing Between Message Broker Technologies
manifest file
that tells the player: "Here are all available quality options—pick the best one based on current bandwidth.
Lesson 1602Adaptive Bitrate Streaming (ABR)Lesson 1625Adaptive Bitrate Streaming
Manifest generation
Build HLS/DASH file listing all qualities with bandwidth requirements
Lesson 1602Adaptive Bitrate Streaming (ABR)
manual
(a human promotes a replica) or **automatic** (software detects failure and promotes automatically).
Lesson 207Replica Promotion and Failover BasicsLesson 1311Toil: The Enemy of Scale
Manual cleanup
Expired sessions pile up and waste memory
Lesson 356Redis as a Session Store
Manual commit
Your application explicitly tells Kafka "I've processed up to offset 150.
Lesson 710Offsets and Commit Strategies
Manual instrumentation
means you explicitly create spans in your code to track custom business logic, internal functions, or application-specific workflows that frameworks can't automatically detect.
Lesson 1224Automatic vs Manual Instrumentation
Manual review storage
Engineers can investigate patterns in failures
Lesson 1705Retry and Dead Letter Queues
Manual runbook
documented steps in a wiki
Lesson 1441Runbooks and Automation
Manual-ack
Consumer explicitly sends acknowledgment after processing (safer—enables at-least-once delivery)
Lesson 681Acknowledgment Mechanisms
Manual-ack (post-processing)
The consumer explicitly confirms after successfully processing the message
Lesson 683Consumer Acknowledgment Timing
Many applications tolerate
brief inconsistency (social feeds, caches, recommendations)
Lesson 532Why Eventual Consistency Exists
map
(like a dictionary or hash table), but with three coordinates instead of one:
Lesson 444Data Model: Sparse, Distributed, Multi-Dimensional MapLesson 743Batch Processing Frameworks
Map to Ring Position
This hash value corresponds to a point on the ring
Lesson 1854Distributed URL Deduplication
MapReduce
(the original framework) splits work into two phases:
Lesson 743Batch Processing Frameworks
MapReduce-style distributed processing
.
Lesson 1746Index Construction at Scale
Mark work as abandoned
so retries don't double-process
Lesson 1094Timeout Cancellation and Cleanup
Marketing analytics
RPO of hours or days (historical trends less time-sensitive)
Lesson 1411Defining Recovery Point Objective (RPO)
Massive bandwidth capacity
CDN networks handle petabytes of traffic daily across hundreds of PoPs.
Lesson 189DDoS Protection and Security at CDN EdgeLesson 195CDN for DDoS Protection
match
patterns within your data—all using a JSON-like syntax that mirrors the document structure itself.
Lesson 393MongoDB Query Language BasicsLesson 461Cypher Query Language Fundamentals
Materialized aggregates
When the same aggregations run repeatedly and slight delays are acceptable
Lesson 284Aggregation Query Optimization
Max lifetime
The absolute maximum age of any connection before forced retirement and replacement.
Lesson 272Connection Timeouts and Limits
Max retries reached
→ Message moved to DLQ
Lesson 687Dead Letter Queues
Max retry count
How many times to attempt redelivery
Lesson 684Negative Acknowledgments and Redelivery
Max retry limit
Prevent infinite loops
Lesson 687Dead Letter Queues
Maximum availability
Set W=1, R=1 (sacrifices consistency)
Lesson 1361Quorum-Based Replication
Maximum availability during failures
→ Leaderless with sloppy quorums
Lesson 1376Topology Selection Tradeoffs
Maximum compactness
A counter of 1 billion encoded in Base62 is just 6 characters (`15ftgG`)
Lesson 1516Counter-Based vs UUID Approaches
Maximum consistency
Set W=N, R=1 (becomes synchronous replication)
Lesson 1361Quorum-Based Replication
Maximum flexibility
You can assign partitions arbitrarily based on access patterns, data size, or node capacity
Lesson 1476Directory Partitioning Fundamentals
Maximum Retry Attempts
caps the number of tries:
Lesson 1025Maximum Retry Attempts and Timeout Budgets
MaxScale
automatically inspect incoming SQL queries and route them intelligently:
Lesson 222Proxy-Based Read-Write Splitting
Maybe retry with caution
500 Internal Server Error (could be transient *or* a bug)
Lesson 1026Retry on Which Errors
Meaningful
Verify real functionality, not just "process is running"
Lesson 1339Health Checks and Failure Detection
Measure actual recovery time
from failure detection to full service restoration
Lesson 1419Measuring and Testing RPO/RTO Compliance
Measure actual restore times
to validate your RTO assumptions
Lesson 1430Backup Verification and Testing
Measure everything
using metrics, logs, and traces to understand system behavior
Lesson 1307What is Site Reliability Engineering (SRE)?
Measure RPO/RTO
against your targets to validate feasibility
Lesson 1438DR Testing Strategies
Medium (hours)
Email + in-app (e.
Lesson 1688Channel Selection Strategy
Medium priority
Handle 10x growth (scalability)
Lesson 18Prioritizing Requirements Under Constraints
Medium priority queue
Likes, comments, follower updates
Lesson 1700Priority Queues and Urgency Levels
Medium resolution, medium retention
1-minute intervals kept for 30 days—balances detail with historical analysis
Lesson 1270Monitoring Resolution and Retention Tradeoffs
Medium scale (1,000 req/sec)
180ms × 1,000 = 180 seconds wasted/second
Lesson 276Why Query Optimization Matters at Scale
Medium server
(4 cores, 8GB RAM): 100 virtual nodes
Lesson 1465Heterogeneous Node Weights
Medium-resolution (1-5 minutes)
Keep 30-90 days for recent trend analysis
Lesson 1213Metric Retention Policies
Memory (RAM)
Motherboards have finite RAM slots.
Lesson 46Hardware Limits of Vertical Scaling
Memory allocations
for intermediate string buffers
Lesson 1143Performance Impact of Structured Logging
Memory available
Use **Sliding Window Log** (stores every request timestamp)
Lesson 975Algorithm Selection Criteria
Memory efficiency
Only needs to track the current path, not all discovered URLs at a level
Lesson 1830Breadth-First vs Depth-First Crawling
Memory efficient
(one counter per window)
Lesson 967Fixed Window Counter
Memory exhaustion
in your metrics system (Prometheus stores every series in RAM)
Lesson 1211Avoiding High-Cardinality LabelsLesson 1887Why Pagination Is Essential at Scale
Memory footprint
Each sidecar process requires its own memory allocation, usually 50–100MB per pod at minimum.
Lesson 865Performance Overhead: Latency and Throughput
Memory limit reached
The cache has used its allocated RAM
Lesson 145What Are Cache Eviction Policies?
Memory overhead
to track access patterns (timestamps, counters, history lists)
Lesson 154Implementation Tradeoffs
Memory per server
– The difference between 8 GB and 16 GB affects instance type selection
Lesson 32Rounding and Approximation Techniques
Memory Safety
Rust's compile-time guarantees eliminate entire classes of bugs common in C/C++, resulting in fewer crashes and security vulnerabilities.
Lesson 862Linkerd: Lightweight Service Mesh
Memory upgrade
8GB RAM → 64GB RAM to cache more data
Lesson 43What is Vertical Scaling?
Memory write (MemTable)
Writes go immediately to an in-memory structure (often a sorted tree).
Lesson 415Write Path and LSM Trees
Memory-heavy operations
Processing large file uploads
Lesson 971Concurrency Limiter Pattern
Memory-mapped indexes
Load the entire compressed trie into memory using mmap, enabling sub-millisecond lookups without serialization overhead.
Lesson 1776Typeahead Index Optimization
Memory/storage grows
with active user count
Lesson 909Session-Based Authentication Fundamentals
MemStore
After the WAL confirms the write, data is written to the MemStore, an in-memory buffer for each column family.
Lesson 436HBase Write Path and WAL
Memtable (In-Memory Buffer)
Simultaneously, the write is also stored in an in-memory structure called a **memtable**.
Lesson 426Write Path and Commit LogLesson 448Write Path: MemTable and Commit Logs
Merge
Combine and rank both sources in real-time before returning the feed
Lesson 1655Celebrity Follower CachingLesson 1772Real-Time Index Updates
Merge in recent posts
from celebrities they follow (pull operation)
Lesson 1639Hybrid (Pull-Push) Feed Model
Merge results
combine data from multiple sources, keeping the most recent version
Lesson 429Read Path and Bloom FiltersLesson 449Read Path and Compaction
Message delays
Network packets can arrive late or out of order
Lesson 608The Problem Paxos Solves
Message durability
means writing messages to disk storage before acknowledging receipt, ensuring they persist beyond memory and process failures.
Lesson 651Message Durability
Message Expiration
Set time-to-live on messages
Lesson 671ActiveMQ and Traditional Enterprise Messaging
Message Groups
Ensure related messages go to the same consumer
Lesson 671ActiveMQ and Traditional Enterprise Messaging
Message Selectors
Filter messages using SQL-like syntax
Lesson 671ActiveMQ and Traditional Enterprise Messaging
Metadata checks
Ensuring backup catalogs are complete and accurate
Lesson 1408Backup Verification and Testing
Metadata drift
Object storage count diverging from database metadata count (indicates orphaned files)
Lesson 1574Monitoring Expiration and Storage Health
Metadata is shared
All nodes know about exchanges, bindings, users, and permissions
Lesson 668RabbitMQ Clustering and High Availability
Metadata linking
Store references in your metadata database so retrieval services know where to find each size
Lesson 1624Thumbnail and Preview Generation
Metadata requirements
(email headers, push categories, deep link schemes)
Lesson 1692Channel-Specific Formatting
Metadata tracking
Update your database to flag archived status
Lesson 1623Cold Storage and Archival
Metadata-rich
Store content-type, encoding, resolution tags with each file
Lesson 1588Object Storage vs Block Storage
Metric documentation
solves this by creating a searchable catalog where each metric includes:
Lesson 1216Metric Documentation and Discovery
Metric retention policies
define how long you keep data at different resolutions.
Lesson 1213Metric Retention Policies
Metrics SDK
specifically handles creating, collecting, and exporting metrics using a vendor-agnostic API.
Lesson 1205OpenTelemetry Metrics SDK
Microsecond latency
Critical when you're checking limits on every request
Lesson 1807In-Memory vs Persistent Storage for Rate Limiting
Microservices
enable independent scaling but increase operational overhead
Lesson 39Trade-offs Over Best Practices
Microservices authorization
Service A needs to call Service B on behalf of a user with specific scopes/permissions.
Lesson 920OAuth2 Fundamentals and Use Cases
Microservices-friendly
Pass token between services without shared state
Lesson 916Session vs Token Tradeoffs
Middle ground
Redis with short TTLs and accept slight over-limit bursts.
Lesson 985Trade-offs: Accuracy vs Latency
Migrate data
– move appropriate rows to the new partition
Lesson 1475Dynamic Range Splitting
Migration scripts
or codemods when possible
Lesson 1909Client SDK Versioning and Distribution
MIME type
(Multipurpose Internet Mail Extensions).
Lesson 1833Content Type Detection
Minimal data movement
Only affected key ranges move, not entire partitions
Lesson 372Consistent Hashing in Dynamo
Minimal perfect hashing
for leaf nodes
Lesson 1759Trie Space Optimization Techniques
Minimal protocol overhead
lightweight TCP connections with simple text-based commands
Lesson 673NATS and Lightweight Messaging
Minimal reshuffling
Adding/removing nodes only affects adjacent ranges
Lesson 1854Distributed URL Deduplication
Minimal Resource Footprint
The Linkerd proxy typically uses 10-20MB of memory per instance (compared to Envoy's 50- 100MB), making it dramatically cheaper to run at scale.
Lesson 862Linkerd: Lightweight Service Mesh
Minimal RTO
Traffic instantly reroutes to healthy region
Lesson 1436Active-Passive vs Active-Active DR
Minimal storage
only changes since last backup
Lesson 1422Incremental Backup Strategy
MINOR
New backward-compatible features—clients *can* benefit but aren't forced to change
Lesson 1906Semantic Versioning for APIs
Minor compaction
merges a few recent SSTables
Lesson 449Read Path and Compaction
Minutes (5-15)
For transient errors and immediate retries
Lesson 1712Deduplication Windows and Storage
Mirrored queues
(classic feature) or **quorum queues** (modern, Raft-based) replicate queue contents across multiple nodes:
Lesson 668RabbitMQ Clustering and High Availability
MirrorMaker 2
(MM2) is Kafka's built-in tool for replicating topics between clusters.
Lesson 726Multi-Datacenter Replication
Miss
If not found, fetch from database
Lesson 355Redis as a Cache
Miss rate
is the opposite—requests that went to the backend.
Lesson 166Monitoring Cache Performance
Misses
requests that went to the source
Lesson 129Cache Hit Ratio Optimization
Missing alerts
for unmonitored failure modes
Lesson 1296Post-Incident Alert Review
Missing dependencies
like encryption keys or configuration files
Lesson 1430Backup Verification and Testing
Missing entries
(lagging behind the new leader)
Lesson 629Log Inconsistencies and Repair
Mission-critical applications
where data correctness cannot be compromised, even under scale
Lesson 337When to Choose NewSQL
Mobile app
requests `/user/profile` → Gateway fetches full user data but returns only `{id, name, avatarUrl}`
Lesson 875Client-Specific API Composition
Mobile BFF
Optimizes for limited bandwidth, touch interfaces, and offline capabilities
Lesson 902Backend-for-Frontend (BFF) Pattern Overview
Mobile/SPA clients
Native apps and single-page applications that can't securely store credentials use OAuth2 flows to obtain tokens.
Lesson 920OAuth2 Fundamentals and Use Cases
Moderate burn (6-hour window)
If you see 0.
Lesson 1289Multi-Window and Multi-Burn-Rate Alerting
Moderate complexity
RabbitMQ clusters, Redis
Lesson 676Choosing Between Message Broker Technologies
Modern Protocol Support
Layer 7 load balancing for HTTP/2, gRPC, and WebSocket connections, plus advanced features like automatic retries and traffic shadowing.
Lesson 115Envoy Proxy Architecture
Modern Protocols
Native HTTP/2, gRPC, and WebSocket support
Lesson 840Data Plane: Envoy Proxy Fundamentals
modular monolith
is a single deployable application that's internally organized into well-defined, loosely-coupled modules with clear boundaries.
Lesson 790Modular Monoliths as Middle GroundLesson 825Starting with a Modular Monolith
Modular reuse
through references (`{{ ref('other_model') }}`)
Lesson 774dbt for Analytics Engineering
Monetizes
your API by making higher limits a paid feature
Lesson 990Tiered Rate Limits for Different User Classes
Monitor actively
Track when you're running degraded so you can fix root causes
Lesson 1336Graceful Degradation
Monitor effectiveness
Track cache hit ratios after warming
Lesson 161Cache Warming Strategies
Monitor expiration
Verify older backups in your retention period still work
Lesson 1408Backup Verification and Testing
Monitor health
Detect when tablet servers fail and reassign their tablets to healthy servers
Lesson 447Master Server and Metadata Management
Monitor health signals
queue depth, response times, error rates, circuit breaker states
Lesson 1084Load Shedding Under Cascading Failure
Monitor partition size
– track row count, disk space, or request load per partition
Lesson 1475Dynamic Range Splitting
Monitor their outcomes
closely
Lesson 1060Half-Open State Testing
Monitor your cardinality
Track the number of active time series.
Lesson 1210Cardinality Management
Monitoring and alerting
Detect shard failures quickly to minimize user impact
Lesson 266Shard Failure and Partial OutagesLesson 1656Fanout Failure Handling
Monitoring and Alerting Isolation
Lesson 1790Multi-Tenancy Considerations
Monitoring and observability tools
must now track hundreds of metrics across dozens of services instead of one application.
Lesson 811Infrastructure and Tooling Costs
Monitoring and testing
to validate recovery capabilities
Lesson 1413The Cost-Availability Tradeoff
Monitoring complexity
Instead of watching one database's CPU, memory, disk I/O, and query performance, you must monitor all shards independently.
Lesson 264Operational Complexity of Sharded Systems
Monitoring happens silently
– The breaker tracks successes and failures in the background
Lesson 1045The Three States: Closed
Monitoring signal
High DLQ volume indicates systemic problems
Lesson 1705Retry and Dead Letter Queues
Monitoring sprawl
You need distributed tracing to follow requests across services, aggregated logging to debug issues, and service-specific metrics
Lesson 803Operational Overhead
Monolithic Applications
A mesh is designed for inter-service communication.
Lesson 835When You Don't Need a Service Mesh
Monotonic increase
They never decrease during normal operation
Lesson 1174Counter Metrics
Monotonic Read Consistency
guarantees that once a client reads a particular version of data, all subsequent reads will return that version or a newer one—never an older one.
Lesson 1391Monotonic Read Consistency
Monotonic read violations
Reading from different replicas shows inconsistent timelines
Lesson 1358Replication Lag in Async Systems
Monotonic Reads Consistency
guarantees that once a client reads a particular version of data, all future reads by that same client will return that version or a newer one—never an older version.
Lesson 543Monotonic Reads Consistency
Monotonic write consistency
ensures that if a client performs write W1 followed by write W2, any replica that applies W2 has already applied W1.
Lesson 1392Monotonic Write Consistency
Monotonic writes
is a consistency guarantee stating that if a single client performs multiple write operations, those writes will be applied to all replicas *in the same order* they were issued.
Lesson 536Monotonic WritesLesson 537Writes-Follow-Reads ConsistencyLesson 544Monotonic Writes Consistency
Monotonic Writes Consistency
ensures that a single client's write operations are applied to all replicas in the exact order they were issued, preventing out-of-order updates.
Lesson 544Monotonic Writes Consistency
Month 0
Announce deprecation, publish migration guide
Lesson 1903Version Deprecation Strategies
Month 12
Fully retire old version
Lesson 1903Version Deprecation Strategies
Month 6
Add deprecation warnings to responses
Lesson 1903Version Deprecation Strategies
Month 9
Restrict new client registrations to new version
Lesson 1903Version Deprecation Strategies
Monthly backups
(grandfathers): Keep 12+ months
Lesson 1406Backup Retention Policies
More network traffic
Leader sends AppendEntries to all followers
Lesson 639Consensus Cluster Sizing Tradeoffs
Move clockwise
Scan in the clockwise direction
Lesson 1459Clockwise Key Assignment Rule
Moving backwards in time
Refreshing shows older data
Lesson 1358Replication Lag in Async Systems
MTBF
= how long it stays on before burning out (e.
Lesson 1325Availability Formula: MTBF and MTTR Relationship
MTTR
= how long it takes you to replace it (e.
Lesson 1325Availability Formula: MTBF and MTTR Relationship
Multi-AZ deployments
in cloud environments
Lesson 1321Redundancy and Parallel Availability
Multi-backend support
Export to multiple platforms simultaneously
Lesson 1205OpenTelemetry Metrics SDK
Multi-burn-rate
means calculating *how fast* you're burning your error budget relative to your SLO target.
Lesson 1289Multi-Window and Multi-Burn-Rate Alerting
Multi-Column (Composite) Index
A single index on `(user_id, status)` together.
Lesson 280Index Merge and Multi-Column Indexes
multi-datacenter replication
, provides tunable consistency knobs, and uses vector clocks (or dotted version vectors in newer versions) for conflict resolution.
Lesson 370Distributed Key-Value Store Architectures in PracticeLesson 726Multi-Datacenter Replication
Multi-entity atomicity
Operations affecting several related records (orders + inventory + payments)
Lesson 322Transaction Requirements and Trade-offs
Multi-hop reasoning
"Find medications that treat diseases caused by viruses discovered after 2000"
Lesson 475Knowledge Graphs and Semantic Networks
Multi-key indexes
handle array fields.
Lesson 385Indexing in Document Stores
Multi-leader replication
breaks this constraint—multiple nodes can independently accept writes at the same time, then synchronize changes with each other.
Lesson 1367Multi-Leader Replication FundamentalsLesson 1377What Are Replication Conflicts?
Multi-level caches
dramatically reduce latency:
Lesson 1742Search System Architecture Overview
Multi-Provider
Route high-priority emails through one vendor, bulk through another
Lesson 1690Channel Provider Abstraction
Multi-region replication
with network egress fees
Lesson 1413The Cost-Availability Tradeoff
Multi-Subscriber Support
Unlike point-to-point queues, multiple independent consumers must read the same stream without interfering.
Lesson 699Event Streaming Platform Requirements
Multi-tenancy
where CustomerA and CustomerB share infrastructure but must never access each other's data
Lesson 860Multi-Cluster and Multi-Tenancy
Multi-Tenancy Built-In
Pulsar natively supports hierarchical organization:
Lesson 730Apache Pulsar Architecture
Multi-tenancy, geo-replication
Pulsar
Lesson 735Choosing a Streaming Platform
Multi-tenant app
Shard key = `(tenant_id, user_id)` → keeps tenant data together while balancing users within.
Lesson 245Composite Shard Keys
Multi-Tier
or **Priority Queues**
Lesson 975Algorithm Selection Criteria
Multi-tier caching
introduces multiple levels of cache between users and your origin storage, each with a specific purpose:
Lesson 1611Multi-Tier Caching Architecture
Multi-tier setups
HAProxy instances at different layers (edge, internal services)
Lesson 112HAProxy Overview
Multi-version maintenance
You must maintain parallel codebases or clever abstraction layers
Lesson 1899URI Versioning (Path-Based)
Multi-window
means observing your error budget consumption across *several* time periods simultaneously (e.
Lesson 1289Multi-Window and Multi-Burn-Rate Alerting
Multiple client types
(web, mobile, IoT) needing different data shapes or protocols from the same backend services
Lesson 879When to Introduce an API Gateway
Multiple clusters
in different AWS regions for disaster recovery
Lesson 860Multi-Cluster and Multi-Tenancy
Multiple consumer instances
connect to the same queue
Lesson 661Competing Consumers Pattern
Multiple consumers
can read the same events without interference
Lesson 694Producers and Consumers
Multiple database replicas
with automatic failover
Lesson 1321Redundancy and Parallel Availability
Multiple dimensions
(read vs write limits, per-endpoint quotas)
Lesson 1824Tiered Rate Limiting
Multiple independent systems
need to react to the same event
Lesson 664Choosing Between Queue and Pub-Sub
Multiple perspectives
Different consumers can process the same events at different times for different purposes—one for real-time alerts, another for weekly reports.
Lesson 695Stream Retention and Replay
Multiple Script Support
Store romanized versions (transliterations) alongside native scripts.
Lesson 1768Typeahead for Multi-Language Support
Multiple sizes
Create several variants (small, medium, large) to support different UI contexts—grid views, detail pages, mobile screens
Lesson 1624Thumbnail and Preview Generation
Multiple SSTables
on disk — potentially slow
Lesson 416Read Path and Bloom Filters
Must deliver
Multi-channel redundancy (SMS + Push + Email)
Lesson 1688Channel Selection Strategy
Must-have
Users can upload and view photos (functional requirement)
Lesson 18Prioritizing Requirements Under Constraints
Mutations
Write operations for creating or modifying data (e.
Lesson 1912GraphQL Schema and Resolvers
Muted relationships
(follower muted this user)
Lesson 1653Selective Fanout Optimization
Mutual TLS
for secure service-to-service communication
Lesson 838Data Plane: Sidecar Proxy Pattern
Mutual TLS (mTLS)
extends standard TLS by requiring *both* client and server to present valid certificates and verify each other's identity.
Lesson 851Mutual TLS (mTLS) AuthenticationLesson 953Service-to-Service Authentication
MySQL
(simple, familiar relational DB)
Lesson 1242Zipkin Architecture and Design

N

N+1 query problem
multiplies network round-trips and database operations.
Lesson 405When Joins Are Required
N+1 redundancy
means if you need **N** components to handle your workload, you provision **N+1** — one extra.
Lesson 1333N+1 and N+2 Redundancy
N+2
components, surviving up to two simultaneous failures.
Lesson 1333N+1 and N+2 Redundancy
N+2 redundancy
takes it further: you provision **N+2** components, surviving up to two simultaneous failures.
Lesson 1333N+1 and N+2 Redundancy
Namespace Reuse
A 6-character base62 code gives you 56 billion combinations.
Lesson 1504Link Expiration and Retention Policies
Namespacing
means combining the idempotency key with additional scope identifiers to create a unique composite key:
Lesson 1017Idempotency Key Scope and Namespacing
Naming conventions
(already established) ensure consistency
Lesson 1216Metric Documentation and Discovery
Native format storage
Data stays in its original form (Parquet, JSON, logs, images)
Lesson 758Data Lake Fundamentals
native graph database
, meaning graphs aren't simulated on top of tables or documents—they're the fundamental storage structure.
Lesson 460Neo4j Architecture OverviewLesson 477Index-Free Adjacency and Physical Storage
Native Graph Storage Engine
stores nodes, relationships, and properties as separate, fixed-size records on disk.
Lesson 460Neo4j Architecture Overview
Native integration
App Mesh integrates directly with AWS CloudMap for service discovery, AWS Certificate Manager for mTLS certificates, and CloudWatch for metrics—no separate components to wire together.
Lesson 864AWS App Mesh and Cloud-Native Meshes
NATS
focuses on simplicity and performance for cloud-native applications, offering both request-reply and pub-sub with minimal overhead.
Lesson 665Overview of Message Broker LandscapeLesson 673NATS and Lightweight Messaging
Natural data locality
Related keys (timestamps, alphabetical names) cluster together
Lesson 1451Range-Based Partitioning
Natural keys
(`user_id`, `order_id`): Per-key ordering, risk of skew
Lesson 703Partitioning Strategies and Key Selection
Natural ordering
Applications that need sorted iteration (leaderboards, time-series analysis, pagination) benefit enormously.
Lesson 1471Range Partitioning Fundamentals
Natural pagination
"Give me 20 posts before timestamp X"
Lesson 1661Timeline Schema Design
Natural testing
Both regions constantly validated under real load
Lesson 1436Active-Passive vs Active-Active DR
Natural TTL support
Keys expire automatically, no manual cleanup needed
Lesson 1807In-Memory vs Persistent Storage for Rate Limiting
Navigate
from root → 's' → 'e' (2 operations)
Lesson 1758Trie Data Structure for Prefix Matching
Nearly instantaneous creation
(seconds, not hours)
Lesson 1426Snapshot-Based Backups
Need user-priority differentiation
**Multi-Tier** or **Priority Queues**
Lesson 975Algorithm Selection Criteria
Neither dominates
The writes were concurrent.
Lesson 562Version Vectors and Conflict Detection
Nest resources
when the sub-resource **cannot exist without** or is **tightly owned by** the parent:
Lesson 1878Nested Resources and Sub-Resources
Netflix Conductor
is an orchestration engine that defines sagas as JSON workflows with tasks.
Lesson 598Saga Frameworks and Real-World Adoption
Network
60% bandwidth used, retransmit queue growing (saturation), packet drops (errors)
Lesson 1264USE Method: Utilization, Saturation, Errors
Network calls are unreliable
What was once a guaranteed in-memory function call can now timeout, fail mid-request, or succeed but never return a response.
Lesson 802Distributed System Complexity
Network congestion
Transferring gigabytes of JSON over HTTP takes minutes, not milliseconds.
Lesson 1887Why Pagination Is Essential at Scale
Network delay
Changes must travel over the network from primary to replicas.
Lesson 208Replication Lag: What It Is and Why It Happens
Network delays
CDN purges can take seconds to propagate globally
Lesson 163Multi-Level Cache Invalidation
Network Dependency
Unlike standalone SQL databases, NewSQL systems rely heavily on network reliability and bandwidth.
Lesson 336NewSQL Tradeoffs
Network locality
– Child nodes can be geographically grouped near their parent
Lesson 1374Tree Replication Topology
Network partition recovery
Even extended outages typically resolve within hours
Lesson 1012Idempotency Key Expiration Strategy
Network partition tolerance
If your primary region loses internet connectivity, you can still restore from a remote backup location.
Lesson 1429Geographic Backup Distribution
Network round-trip time
to/from the downstream service
Lesson 1098Per-Hop Timeout Budgets
Network saturation
Each span might be 1-5KB.
Lesson 1259Network and Agent Overhead
Network Serialization
Data must be marshaled from your service into the proxy, then unmarshaled, then re-marshaled to send to the next hop.
Lesson 841Data Plane: Performance and Latency Overhead
Network timeouts
Request takes too long, but the service is actually working
Lesson 1020Why Retries Are Necessary in Distributed Systems
Network/resource heavy
– reading and writing all data stresses infrastructure
Lesson 1402Full Backups
Never break anything
accumulate technical debt forever, supporting legacy behaviors that slow innovation
Lesson 1898Why API Versioning Matters
Never log
passwords, authentication tokens, full credit card numbers, encryption keys, or session tokens.
Lesson 1160Security and Access Control for Logs
Never log tokens
Even debug logs can expose credentials
Lesson 931OAuth2 Security Best Practices
New consumers
Launch a new analytics service?
Lesson 695Stream Retention and Replay
New documents
accumulate in an in-memory buffer
Lesson 1772Real-Time Index Updates
New entry requested
Application tries to cache additional data
Lesson 145What Are Cache Eviction Policies?
News feed
You can tolerate stale reads for speed.
Lesson 520Practical PACELC Analysis for Design Decisions
News feed generation
Eventual consistency (W=ONE, R=ONE) prioritizes speed over freshness
Lesson 563Tunable Consistency in Practice
Next allowed fetch time
Computed as `last_fetch_time + crawl_delay`, prevents fetching too soon
Lesson 1848Politeness Table and Per-Host State
Next fetch timestamp
(earliest time first)
Lesson 1847Heap-Based Priority Queue Implementation
Next read
fetches fresh data from the database
Lesson 158Event-Based Invalidation
no
, either wait briefly for it to catch up, or route to the primary (or a more up-to-date replica)
Lesson 216Timestamp-Based Consistency ChecksLesson 351Redis Persistence: AOF LogsLesson 957Rate Limiting vs Throttling
No boundary problem
A user can't exploit window edges to exceed limits (100 requests at 12:59:59, then 100 more at 13:00:00)
Lesson 968Sliding Window Log
No bursts allowed
Use **Fixed Window Counter** or **Concurrency Limiter** for strict limits
Lesson 975Algorithm Selection Criteria
No cascade operations
Deleting a customer won't automatically delete their orders on another shard
Lesson 262Referential Integrity Across Shards
No causality awareness
LWW can't distinguish "B happened after seeing A" from "A and B happened independently"
Lesson 1381Limitations of Last-Write-Wins
No clear pattern
→ FIFO or Random (simpler, often "good enough")
Lesson 153Choosing an Eviction Policy
No complex join queries
at read time
Lesson 1638Push (Write-Time) Feed Model
No complex routing
pure subject-based pub-sub (like `orders.
Lesson 673NATS and Lightweight Messaging
No configuration guesswork
about bucket boundaries
Lesson 1186Summary Metrics
No consistency
(local memory): fastest, but limits multiply by server count
Lesson 976Rate Limiting State Storage
No consistency compromises
like eventual consistency
Lesson 332The NewSQL Value Proposition
No coordination needed
Consumers don't need to know about each other
Lesson 661Competing Consumers Pattern
No coordination overhead
means no performance penalty for adding instances
Lesson 57Scaling Stateless Services Horizontally
No coordination tax
Everyone works in the same codebase with shared context
Lesson 820When a Monolith is the Right Choice
No distributed coordination
– Each node tracks its own users independently
Lesson 982Sticky Sessions and Rate Limiting
No distributed locking
Each step commits immediately in its own database
Lesson 585Alternative: Saga Pattern Introduction
No distributed locks needed
Since events are immutable and append-only, there's no cross-system coordination during writes
Lesson 586Alternative: Event Sourcing for Consistency
No downtime
The existing index remains available during updates
Lesson 1772Real-Time Index Updates
No duplicate cross-service work
Each service processes the event exactly once (though retries within a service are possible)
Lesson 663Hybrid Patterns: Topic + Queue
No duplicates allowed
Each identifier must be unique across all records
Lesson 299Primary Keys and Entity Integrity
No duplication
of authentication code across dozens of services
Lesson 883Authentication at the Gateway
No enduring value
the problem returns—fixing it once doesn't prevent future occurrences
Lesson 1311Toil: The Enemy of Scale
No expiration
(permanent until manually deleted)
Lesson 1565Expiration Requirements and TTL Basics
No expired entries
TTL-based removal isn't enough to free space
Lesson 145What Are Cache Eviction Policies?
No fanout computation
– no need to write to millions of feed timelines
Lesson 1647Fanout-on-Read (Pull Model)
No guarantee
`R = 2`, `W = 2` → `2 + 2 = 4 ≤ 5`
Lesson 557The Quorum Condition: R + W > N
No health awareness
DNS doesn't know if a server is down; clients may receive IPs of failed servers
Lesson 116DNS-Based Load Balancing
No horizontal scaling
Adding servers means losing session data
Lesson 356Redis as a Session Store
No hotspots
Unlike range-based partitioning, celebrity URLs don't concentrate on one shard
Lesson 1541Sharding and Database Scaling
No key (null)
Maximum parallelism, no ordering
Lesson 703Partitioning Strategies and Key Selection
No library maintenance
No need to update SDKs in five different languages when you change retry policy
Lesson 833Polyglot Microservices Support
No manual deletion logic
needed across your codebase
Lesson 165Versioned Cache Keys
No manual expiration needed
The algorithm self-adjusts based on actual access patterns
Lesson 1525Cache Eviction Policy for URL Shortener
No message persistence
by default—messages exist only while in flight
Lesson 673NATS and Lightweight Messaging
No metadata to synchronize
Every server can independently compute `jump_hash(key, num_nodes)` and get the same answer.
Lesson 1467Jump Hash: Stateless Alternative
No nulls allowed
Every record must have a valid identifier
Lesson 299Primary Keys and Entity Integrity
No perfect failure detection
You can't tell if a node crashed or is just slow
Lesson 608The Problem Paxos Solves
No personalization needed
Everyone gets identical content
Lesson 173Content Types Suited for CDNs
No predefined schema
You don't declare columns upfront.
Lesson 380Document Structure and Schema Flexibility
No query optimization
Document stores lack the sophisticated query planners found in relational databases that can rewrite queries, choose optimal join strategies, or parallelize operations efficiently.
Lesson 408Query Performance Limitations
No random page access
You can't jump directly to page 47; you must traverse sequentially
Lesson 1890Keyset Pagination
No session awareness
Can't implement sticky sessions easily
Lesson 116DNS-Based Load Balancing
No shared memory
means no conflicts between instances
Lesson 57Scaling Stateless Services Horizontally
No single coordinator
Each shard is a separate database that can't directly enforce atomicity with other shards
Lesson 261Distributed Transactions Across Shards
No stale data
You'll never see outdated information, even during failures
Lesson 493CP Systems: Prioritizing Consistency
No stale reads
You never read old data after a write has been acknowledged
Lesson 484Consistency in CAP Context
No synchronization overhead
Direct in-memory counter updates
Lesson 1791Single Data Center vs Distributed Setup
No upfront validation
Any format, any source, dumped into storage
Lesson 764Data Governance and Quality
No write amplification
– When Bob posts, the system doesn't pre-generate feeds for his 1M followers
Lesson 1637Pull (Read-Time) Feed Model
Node added
Existing nodes must reduce their quotas; the new node claims its share
Lesson 984Quota Sharding Across Nodes
Node C
might lose network connection and not know what happened
Lesson 567The ACID Problem in Distributed Systems
Node failures
Servers can crash at any moment
Lesson 608The Problem Paxos Solves
Node removed
Remaining nodes must increase their quotas to maintain the global limit
Lesson 984Quota Sharding Across Nodes
Non-breaking changes
allow old clients to continue functioning without modification while new clients can adopt enhancements.
Lesson 1905Breaking vs Non-Breaking Changes
Non-critical data
(user timelines, recommendations, view counts): Use AP strategies.
Lesson 502Mixed Strategies: Hybrid Systems
Non-critical requests
(analytics, recommendations, notifications): aggressive throttling or temporary rejection
Lesson 995Graceful Degradation Through ThrottlingLesson 1084Load Shedding Under Cascading Failure
Non-functional
"Serve food within 15 minutes," "Handle 100 customers simultaneously," "Stay open 24/7," "Keep food costs under budget"—these describe *how well* it performs
Lesson 9Functional vs Non-Functional: Core Distinction
Non-functional requirements
describe *how well* the system performs its job—the quality attributes that matter but aren't features themselves.
Lesson 9Functional vs Non-Functional: Core DistinctionLesson 13Scalability Requirements: Growth Expectations
Non-idempotent operations
Lesson 998What is Idempotency?
Non-retriable failures
400 Bad Request, 401 Unauthorized—these are *your* fault, not the downstream service's.
Lesson 1057Failure Detection and Counting
Normal operation
Your distributed database can be both consistent *and* available
Lesson 504Why 'Choose Two' is OversimplifiedLesson 517PA/EL Systems: Availability and Latency First
Normal traffic
→ Use Round Robin for simplicity
Lesson 97Dynamic Algorithm Selection
Normalization
reduces data redundancy but may slow down read-heavy queries
Lesson 39Trade-offs Over Best PracticesLesson 1738Query Processing Flow
Normalize status codes
into standard states: `sent`, `delivered`, `read`, `failed`, `bounced`
Lesson 1693Delivery Receipt Tracking
Normalized
Optimizes for writes and data integrity—one update affects one row
Lesson 289Normalized vs Denormalized Schema Design
Not allowed
"Find all users where age > 30"
Lesson 342No Secondary Indexes or Query Language
Not CAP Available
= A branch closes its doors (refuses requests) until the courier connection is restored
Lesson 485Availability in CAP Context
Not complete safety
Still vulnerable if both primary and that one replica fail simultaneously
Lesson 217Semi-Synchronous Replication Trade-offs
Not idempotent
`POST /users` creating a new user without idempotency keys — each call creates a duplicate user with a new ID.
Lesson 1008What Makes an API IdempotentLesson 1875HTTP Methods: GET, POST, PUT, DELETE Semantics
Not idempotent (naturally)
`POST /wallet/charge` for $10 — each retry charges another $10.
Lesson 1008What Makes an API Idempotent
NOT Operation
Set difference
Lesson 1739Boolean Query Operators
Not retry immediately
the operation took too long, so instant retry will likely fail again
Lesson 1115Deadline Exceeded Error Handling
Not strictly RESTful
Purists argue the resource (`/users/123`) shouldn't change identity based on representation
Lesson 1899URI Versioning (Path-Based)
Notification delivery
Does PagerDuty/Slack/email actually arrive?
Lesson 1295Testing Alerts and Dry Runs
Notification payload
Include deep-link data so tapping opens the specific post
Lesson 1681Mobile Push Notification Integration
Notification state tracking
means storing each notification's lifecycle status in a database so you can monitor, debug, and audit your entire notification pipeline.
Lesson 1706Notification State Tracking
Notification types
security alerts only, no marketing
Lesson 1702User Preferences Lookup
Notifications
are informational messages that provide awareness but don't require urgent response.
Lesson 1285Alert vs Notification
NTP synchronization
Keep all servers synchronized using Network Time Protocol, reducing skew to milliseconds.
Lesson 949Clock Skew and Token ValidationLesson 1114Clock Skew and Time Synchronization
Number of network hops
between routers
Lesson 169The Latency Problem CDNs Solve

O

O(1) average-case lookup time
.
Lesson 344Performance Characteristics
O(1) memory per user
instead of O(N) for every request, while maintaining ~99% accuracy of the full sliding window log.
Lesson 1797Sliding Window Counter with Redis
O(depth)
rather than O(rows × joins).
Lesson 460Neo4j Architecture Overview
O(m)
where m = prefix length, independent of total vocabulary size.
Lesson 1758Trie Data Structure for Prefix Matching
O(page_size)
constant and predictable.
Lesson 1887Why Pagination Is Essential at Scale
O(total_records)
to **O(page_size)** — constant and predictable.
Lesson 1887Why Pagination Is Essential at Scale
OAuth scopes
(read:profile, write:orders)
Lesson 884Authorization and Policy Enforcement
OAUTHBEARER
Token-based authentication for modern systems
Lesson 727Kafka Security: Authentication and Encryption
Object
(the value or another thing)
Lesson 453Property Graphs vs RDF Triples
Object Count
How many items will you store?
Lesson 25Storage Estimation Basics
Object detection
identifies weapons, drugs, or prohibited items
Lesson 1629Content Moderation at Scale
Object Storage (S3/Blob)
stores the actual paste content using the paste ID as the key
Lesson 1556Hybrid Storage: Metadata + Content References
Object Storage (S3/GCS)
Cheapest for long-term retention but slow query performance; good for compliance archives
Lesson 1245Trace Storage Backends
ObjectId
A 12-byte unique identifier MongoDB generates automatically for document `_id` fields
Lesson 390BSON Format and Data Types
Observability Integration
The mesh automatically emits metrics when timeouts fire, correlating them with service topology and request traces.
Lesson 1126Timeout Configuration in Service Mesh
Observability tools
(monitoring, logging, alerting for their services)
Lesson 794Team Autonomy and Ownership
OData query syntax
Microsoft's protocol uses URL-encoded expressions:
Lesson 1893Complex Filtering with Query Languages
Off-peak scheduling
Run during low-traffic hours
Lesson 1568Scheduled Cleanup Job Design
Off-peak timing
Maintenance during low-traffic hours minimizes user impact even if excluded from SLA
Lesson 1328Scheduled Maintenance and Availability Accounting
On cache hit
Return the target URL immediately (fast path!
Lesson 1524Cache-Aside Pattern for URL Lookups
On cache miss
Fetch from database, populate the cache with the result, then return to user
Lesson 1524Cache-Aside Pattern for URL Lookups
On subsequent requests
The cached entry serves future requests instantly
Lesson 1524Cache-Aside Pattern for URL Lookups
On write
Update database immediately, then invalidate or update the cache entry.
Lesson 1722Real-Time Preference Updates
Onboarding
Assign one role instead of configuring dozens of permissions
Lesson 933Role-Based Access Control (RBAC) Fundamentals
One dominates
(all counters ≥ the other): The higher one happened after.
Lesson 562Version Vectors and Conflict Detection
One Primary database
handles ALL write operations (INSERT, UPDATE, DELETE)
Lesson 199Primary-Replica Architecture
One-time reads
Data accessed once but never again still gets treated as "recently used"
Lesson 151LRU-K and Advanced LRU Variants
Online Analytical Processing (OLAP)
.
Lesson 760Data Warehouse Architecture
Online resharding
migrates data while the system continues serving traffic:
Lesson 259Resharding Strategies: Stop-the-World vs Online
Online users
Send real-time WebSocket updates immediately
Lesson 1676Presence Detection and User Status
Only one succeeds
Due to consensus, only one client can create a specific lock resource
Lesson 637Distributed Locks via Consensus
OPA (Open Policy Agent)
Uses Rego language—more developer-friendly, JSON/YAML-compatible.
Lesson 936ABAC Policy Engines
Open → Half-Open
Wait through the timeout window, confirm the breaker allows test requests
Lesson 1065Testing Circuit Breaker Behavior
Open circuit
All requests fail immediately—**no retries attempted**
Lesson 1030Combining Retries with Circuit Breakers
Open content
No sensitive data behind the link
Lesson 1515Short URL Predictability Tradeoffs
Open Graph metadata
is a protocol (popularized by Facebook) where websites embed meta tags in their HTML:
Lesson 1538Link Preview and Metadata
Open rate
by time-of-day and day-of-week
Lesson 1729Analytics-Driven Optimization
Open rates
Email might cost 10× more than push but deliver 5× better engagement
Lesson 1694Channel Costs and Economics
Open-source solutions
offer deep customization through plugins, Lua scripting, or custom filters.
Lesson 900Open-Source vs Managed Gateway Tradeoffs
Opened
The user interacted with the notification by opening or viewing it.
Lesson 1724Notification Analytics Events
Opens
to stop the flow (rejects requests immediately)
Lesson 1044The Electrical AnalogyLesson 1064Monitoring and Metrics
OpenTelemetry
with vendor backends minimizes self-hosting burden.
Lesson 1208Choosing a Metrics System for Your ScaleLesson 1240OpenTelemetry Overview
OpenTelemetry Logs
An emerging standard that unifies logs with traces and metrics, providing consistent correlation IDs and context propagation across your observability stack.
Lesson 1136Logging Libraries and Standards
OpenTelemetry SDKs
(recommended modern approach)
Lesson 1244Google Cloud Trace
Operation status
is it processing, complete, or failed?
Lesson 1004Server-Side State for Idempotency
Operation-based CRDTs (CmRDTs)
Replicas send operations (add, remove) that are commutative—order doesn't matter.
Lesson 538Conflict-Free Replicated Data Types (CRDTs)Lesson 1384Conflict-Free Replicated Data Types (CRDTs)
Operational burden
Routine tasks like checking replication lag, rotating credentials, tuning query performance, or investigating alerts multiply by the number of shards.
Lesson 264Operational Complexity of Sharded SystemsLesson 803Operational Overhead
Operational control
Libraries require redeployment for policy changes; service meshes update configurations without redeploying apps
Lesson 830Service Mesh vs Library-Based Solutions
Operational knowledge
Teams must understand Kubernetes, service meshes, API gateways, and other distributed system tools
Lesson 803Operational Overhead
Operational learnings
After a few incidents, you realize your 99.
Lesson 1284Iterating on SLIs and SLOs
Operational needs
How far back do you realistically investigate incidents?
Lesson 1165Log Retention Policies
Operational Separation
Rate limiting requires fast, shared state (who made how many requests?
Lesson 1782Rate Limiter Service Overview
Operational Transformation
solves this by transforming each user's operation against concurrent operations from others, preserving everyone's *intent* even when the document state has changed.
Lesson 1385Operational Transformation
Operational Transformation (OT)
Transforms operations based on concurrent changes.
Lesson 1579Collaborative Editing and Real-Time Updates
Opsgenie
, and **VictorOps** centralize alerting, escalation, and communication—but the real power comes from *automation*.
Lesson 1305On-Call Tooling and Automation
Opt-out triggers
(too many notifications in short window)
Lesson 1729Analytics-Driven Optimization
Optimistic Locking
Use Redis `WATCH` to detect concurrent modifications and retry.
Lesson 981Race Conditions in Distributed Counters
Optimistic UI update
The client immediately shows your post in the feed (before server confirmation), then reconciles if the server response differs.
Lesson 1678Read-After-Write Consistency
Optimize
it (choose indexes, join order)
Lesson 286Prepared Statements and Query Caching
Optimized Proxy Implementations
Modern proxies like Envoy are written in C++ and highly optimized for throughput
Lesson 841Data Plane: Performance and Latency Overhead
Option A (CP)
Wait 5 seconds while the system ensures all 2 billion users see consistent data before confirming your post
Lesson 497Social Media and Content Feeds (AP)
Option B (AP)
Your post succeeds instantly, and it gradually propagates to followers over the next few seconds
Lesson 497Social Media and Content Feeds (AP)
Optional
Personalized recommendations, reviews, related products
Lesson 1083Graceful Degradation Strategies
Optional Features
Expiration times, syntax highlighting, privacy controls
Lesson 1542Pastebin System Overview
Orchestrates multiple backend calls
in parallel or sequence
Lesson 905BFF Implementation Patterns
Orchestration
centralizes the workflow in one orchestrator service.
Lesson 592Choreography vs Orchestration Tradeoffs
Orchestrator initiates
the first local transaction
Lesson 591Orchestration-Based Sagas
Order Management
Handles customer orders, validation, pricing
Lesson 815Domain-Driven Design and Bounded Contexts
Order processing acceptable later
Eventual consistency meets your needs
Lesson 654When to Use Async vs Sync
Ordered
New entries append to the end, improving B-tree index performance
Lesson 1520Primary Key Selection: Auto-Increment vs UUID
ordering
and **durability**—if a replica crashes, it can resume from its last known position without missing changes.
Lesson 206Replication Logs and MechanismsLesson 693The Commit Log Abstraction
Ordering (often)
Many queues preserve message order, delivering them in the sequence they were sent.
Lesson 647Message Queue Basics
Ordering and Prioritization
Not all URLs are equal.
Lesson 1838URL Frontier: Definition and Purpose
Orders of magnitude checks
– Round to nearest power of 10 (100, 1,000, 10,000)
Lesson 32Rounding and Approximation Techniques
Orders Service
owns order records—it's responsible for order lifecycle and status
Lesson 817Identifying Service Boundaries by Data Ownership
Organization ID
(B2B collaboration tools)
Lesson 244Entity-Based Sharding
Organizational clarity
Related data stays together
Lesson 383Collections and Databases
Organizational Features
Allow users to tag pastes, create folders, or assign categories.
Lesson 1578User Accounts and Paste Management
Organizational Maturity
Do you have mature DevOps practices?
Lesson 826Decision Framework for Microservices Adoption
Organizational trust
(freedom to make decisions within guidelines)
Lesson 794Team Autonomy and Ownership
Origin cache
(in-memory or SSD layer before object storage) handles the final backstop, reducing actual disk/object-store reads.
Lesson 1611Multi-Tier Caching Architecture
Origin caches
protect your object storage from direct requests
Lesson 1611Multi-Tier Caching Architecture
Origin offloading
The cryptographic operations (encryption, decryption, certificate validation) are CPU-intensive.
Lesson 187SSL/TLS Termination at the EdgeLesson 1609Why CDNs Are Essential for Media Hosting
Origin overload
Millions of requests for popular media crush your origin infrastructure
Lesson 1609Why CDNs Are Essential for Media Hosting
Origin servers
are your application's home base—they hold the original, authoritative version of your content (images, videos, JavaScript files, etc.
Lesson 170CDN Architecture: Edge Servers and OriginLesson 1630Live Streaming Architecture
Origin Shield Pattern
the shield validates tokens before hitting origin storage, preventing unauthorized access from reaching deeper layers.
Lesson 1615Signed URLs and Token-Based Access
OS page cache
sits between your application and the disk.
Lesson 127CPU and Disk Caching Layers
Out-of-memory crashes
in your monitoring system
Lesson 1192Cardinality and Label Explosion
Outages and downtime
Your server crashes because it can't handle the traffic spike
Lesson 2Why System Design Matters
Outbound policies
transform responses, add headers, or cache results
Lesson 899Azure API Management Features
Outbox pattern
Instead of publishing directly, write the message to an "outbox" table in the same database transaction as your business data.
Lesson 688Transactional Semantics
Outlier detection
(also called "ejection") monitors success rates and response times.
Lesson 852Circuit Breaking at the Mesh Level
Over-isolation
Creating 20 separate thread pools for similar internal services fragments your available resources
Lesson 1076Bulkhead Tradeoffs: Complexity and Resource Overhead
Over-provisioning required
Must buy capacity for peak traffic, even if idle 99% of the time
Lesson 108Hardware vs Software Load Balancers
Overage buffers
(from the previous lesson): Allow small temporary violations locally, knowing global state will converge.
Lesson 987Multi-Region Rate Limiting Challenges
Overhead
Meetings, training, planning
Lesson 1312Measuring and Reducing Toil
Overload
Too many devices drawing power simultaneously
Lesson 1044The Electrical Analogy
Overwhelm database write capacity
, causing latency spikes or timeouts
Lesson 1654Fanout Rate Limiting
Overwhelm storage backends
with millions of unique span combinations
Lesson 1258Cardinality Explosion
Overwrite conflicts
Once the divergence point is found, the leader sends all missing entries from that point forward.
Lesson 629Log Inconsistencies and Repair
Owned
Every action has a name attached
Lesson 1352Postmortem Structure and Action Items
Ownership metadata
(team tags embedded in service configs)
Lesson 1292Alert Routing and Escalation
Ownership Tracking
Add a `user_id` foreign key to your paste metadata table.
Lesson 1578User Accounts and Paste Management

P

P0 (Critical)
Complete service outage or major revenue loss.
Lesson 1298Incident Severity Levels and Escalation
P1 (High)
Significant degradation affecting many users.
Lesson 1298Incident Severity Levels and Escalation
P2 (Medium)
Partial feature failure or isolated user impact.
Lesson 1298Incident Severity Levels and Escalation
P4 (Trivial)
Cosmetic issues, documentation fixes.
Lesson 1298Incident Severity Levels and Escalation
P50
(median): Half your requests complete faster
Lesson 1093The P99 Problem with Timeouts
P95 latency
(not average response time) → reveals tail latencies affecting users
Lesson 1215Avoiding Vanity Metrics
P95 or P99 latencies
from production metrics (as covered in Adaptive Timeouts Based on Historical Latency).
Lesson 1118Per-Operation Timeout Configuration
P99
99% complete faster — but 1 in 100 naturally takes longer
Lesson 1093The P99 Problem with TimeoutsLesson 1188Percentiles and Tail Latencies
P99 or P999 metrics
under normal load.
Lesson 1093The P99 Problem with Timeouts
PA/EC
Some DNS systems (available during partition, but consistent when stable)
Lesson 515PACELC Framework ExplainedLesson 519PA/EC Systems: Mixed Strategies
PA/EL
Cassandra, DynamoDB (available during partition, low-latency during normal ops)
Lesson 515PACELC Framework ExplainedLesson 517PA/EL Systems: Availability and Latency First
PACELC
gives you the full picture:
Lesson 515PACELC Framework Explained
PACELC framework
, a **PC/EC system** makes consistency its top priority in *both* scenarios:
Lesson 518PC/EC Systems: Consistency Always
PageRank or link popularity
(more incoming links = higher priority)
Lesson 1839FIFO vs Priority-Based Frontier
PagerDuty
, **Opsgenie**, and **VictorOps** centralize alerting, escalation, and communication—but the real power comes from *automation*.
Lesson 1305On-Call Tooling and Automation
Panels
are individual visualization blocks.
Lesson 1200Grafana for Metrics Visualization
Parallel Execution
Workers independently fetch each batch, then write the post reference to each follower's feed cache or storage.
Lesson 1652Fanout Worker Parallelization
Parallel processing
Score multiple documents simultaneously across CPU cores
Lesson 1741Search Latency and Response Time
Parallel Upload
Each part uploads with its part number and upload ID
Lesson 1586Multipart Upload for Large Files
Parallelism
Multiple consumers can read from different partitions simultaneously, dramatically increasing throughput.
Lesson 701Topics and Partitions
Parent nodes
Pairs of child hashes are combined and hashed together
Lesson 376Anti-Entropy with Merkle Trees
Parent-Child (ChildOf)
The most common relationship.
Lesson 1232Span Relationships and Hierarchy
Parent-child links
show synchronous dependencies (blocking calls)
Lesson 1232Span Relationships and Hierarchy
Pareto Principle
or **80-20 rule**: roughly 80% of your requests hit only 20% of your URLs.
Lesson 1502Cache Memory Requirements
Parse provider-specific payloads
to extract status information
Lesson 1693Delivery Receipt Tracking
Parse the HTML
for Open Graph tags (og:title, og:description, og:image)
Lesson 1538Link Preview and Metadata
Parses
the version indicator (`v2` in the path)
Lesson 1907Gateway-Level Version Routing
Partial availability
recognizes that systems can be partially functional.
Lesson 1329Partial Availability and Graceful Degradation
Partial Failure Handling
If one shard times out or fails, decide whether to return incomplete results or fail the query
Lesson 1780Distributed Query Coordination
Partial failures become normal
In a monolith, the whole system is either up or down.
Lesson 802Distributed System Complexity
Partial restores
Recover individual files or database records
Lesson 1408Backup Verification and Testing
Partial Result Collection
Each shard returns its local top-K results with scores
Lesson 1780Distributed Query Coordination
Partial success tracking
mark completed tasks so retries skip them
Lesson 777Workflow Orchestration Patterns
partition
(network split), these systems continue accepting reads and writes from all sides of the split, even though different replicas might temporarily disagree.
Lesson 517PA/EL Systems: Availability and Latency FirstLesson 1806Rate Limiting with Consistent HashingLesson 1865Distributed URL Frontier Architecture
Partition by request type
Route celebrity writes through a specialized write path
Lesson 1483Celebrity User Problem
Partition Configuration
More partitions enable higher parallelism and throughput, but increase coordination overhead and rebalancing time.
Lesson 724Kafka Performance Tuning
Partition Followers
When a post is created, the fanout service queries the follow graph and splits the follower list into chunks (e.
Lesson 1652Fanout Worker Parallelization
Partition Keys
Shard counters by node ID, aggregate periodically—trades accuracy for reduced contention.
Lesson 981Race Conditions in Distributed Counters
Partition pruning
organizing data by time ranges so queries only scan relevant chunks
Lesson 760Data Warehouse ArchitectureLesson 1473Range Partitioning Benefits
Partition quotas
Divide the 100 requests/minute into 40 for US, 30 for EU, 30 for Asia.
Lesson 1804Multi-Region Rate Limiting Challenges
Partition Tolerance is non-negotiable
in practice.
Lesson 481What CAP Theorem States
Partition-Availability / Else-Consistency
systems that choose availability when partitions occur, but prioritize consistency over low latency when the network is healthy.
Lesson 519PA/EC Systems: Mixed Strategies
Partition-aware backfill
Only reprocess affected date/time partitions
Lesson 777Workflow Orchestration Patterns
Partition-specific alerts
(Partition 7's disk is 90% full, but Partition 3 is fine)
Lesson 1492Operational Complexity of Partitioning
Partitioned
into logical chunks (user IDs 1-1000, 1001-2000)
Lesson 1447Partitioning vs Sharding vs Replication
Partitions are common
in geo-distributed systems
Lesson 532Why Eventual Consistency Exists
Partner API
requests `/user/profile` → Gateway maps internal field names (`user_id` → `externalId`) and adds partner-specific metadata
Lesson 875Client-Specific API Composition
Partner API BFF
Enforces third-party rate limits and compliance requirements
Lesson 902Backend-for-Frontend (BFF) Pattern Overview
Pass reduced timeout
Forward the smaller budget in request headers/context
Lesson 1119Timeout Budget Management Across Service Chains
Pass the reduced budget
to downstream services
Lesson 1098Per-Hop Timeout Budgets
Passive checks
have zero overhead since they only monitor existing traffic.
Lesson 99Active vs Passive Health Checks
Passive health checks
are more observational—the load balancer watches actual user requests as they happen.
Lesson 99Active vs Passive Health Checks
Password reset tokens
Sequential IDs make brute-force attacks trivial
Lesson 1515Short URL Predictability Tradeoffs
Paste views
Track read counts per paste using counters.
Lesson 1583Analytics and Usage Metrics
Path or method
"Service A can POST to /orders but not DELETE"
Lesson 854Request-Level Authorization
Path-based models
Store resource paths like `/org/proj/repo` and match prefixes
Lesson 939Permission Inheritance and Hierarchies
Path-based routing
Route `/api/*` to one set of servers, `/images/*` to another
Lesson 113Cloud Load Balancers (AWS ELB/ALB)
Path-based versioning
embeds the version in the URL:
Lesson 892API Versioning and Routing
Pay-per-use
(CloudFront, Fastly): charge per GB transferred and per request
Lesson 191CDN Provider Feature Comparison
Payment processing
(CP): Block and wait for cross-region confirmation, even if it takes 500ms, to ensure no double- charges
Lesson 510Real Systems: Multi-Region Trade-offsLesson 1001Side Effects and Idempotency
Payments
team can deploy their fraud detection improvements on Tuesday morning while the **User Profile** team deploys avatar updates Thursday afternoon—completely independently.
Lesson 791Independent Deployability
PC
During a partition, it sacrifices **availability** to maintain consistency
Lesson 518PC/EC Systems: Consistency Always
PC/EC
Traditional RDBMS with sync replication (consistent always, higher latency)
Lesson 515PACELC Framework Explained
PC/EC system
makes consistency its top priority in *both* scenarios:
Lesson 518PC/EC Systems: Consistency Always
PCollections
Immutable datasets (bounded for batch, unbounded for streams)
Lesson 772Apache Beam Programming Model
PDP evaluates policies
using RBAC, ABAC, or other models
Lesson 941Policy Decision Points (PDP) and Enforcement Points (PEP)
Peak traffic multiplier
Add 2-3× buffer for traffic spikes
Lesson 1499Bandwidth Requirements for Redirects
Peak traffic multipliers
help you account for these surges.
Lesson 24Peak Traffic MultipliersLesson 26Bandwidth Estimation from Data Size
PEP extracts context
user identity, requested resource, action
Lesson 941Policy Decision Points (PDP) and Enforcement Points (PEP)
PEP queries PDP
"Can user X perform action Y on resource Z?
Lesson 941Policy Decision Points (PDP) and Enforcement Points (PEP)
Per-account alone
Doesn't prevent one rogue user within that account from monopolizing shared quota.
Lesson 991Hierarchical Rate Limiting
Per-client limits
Each API key gets 5000 requests/day
Lesson 885Rate Limiting and Throttling
Per-dependency circuit breakers
maintain **separate, independent circuit breakers for each downstream service**.
Lesson 1063Per-Dependency Circuit Breakers
Per-Domain Cache
Store one parsed robots.
Lesson 1861Robots.txt Caching and Parsing
Per-endpoint limits
The expensive `/search` endpoint allows 100 requests/minute, while `/health` is unlimited
Lesson 885Rate Limiting and Throttling
Per-IP
Rate limit based on the client's source IP address
Lesson 989Per-User vs Per-IP Rate Limiting
Per-key sequential consistency
means that all operations on a *single key* appear to execute in some sequential order that all clients agree on, but operations on *different keys* can be reordered independently.
Lesson 552Per-Key Sequential Consistency
Per-Operation-Type
Different operations (payment, refund, transfer) use separate namespaces, allowing key reuse across different actions.
Lesson 1017Idempotency Key Scope and Namespacing
Per-partition ordering
lets you scale horizontally—use multiple partitions with multiple consumers, each processing their partition in order.
Lesson 685Message Ordering Guarantees
Per-region quotas
Divide the global limit by number of regions (1000 req/min → 333/region).
Lesson 987Multi-Region Rate Limiting Challenges
Per-Route Granularity
Define different timeouts for specific endpoints (e.
Lesson 1126Timeout Configuration in Service Mesh
Per-service limits
Control how many notifications each calling service can send (e.
Lesson 1697API Layer and Rate Limiting
Per-shard metrics
Identify if specific database shards are slower
Lesson 1657Measuring Fanout Performance
Per-Tenant
In multi-tenant systems, scope by tenant ID to isolate entire organizations.
Lesson 1017Idempotency Key Scope and Namespacing
Per-user alone
One wealthy account with 1,000 users could overwhelm your system if each user maxes out their limit simultaneously.
Lesson 991Hierarchical Rate Limiting
Per-user limits
Prevent spamming individual users (e.
Lesson 1697API Layer and Rate Limiting
Per-Worker Rate Limiting
Each fanout worker limits its own write throughput (e.
Lesson 1654Fanout Rate Limiting
Percentile analysis
If 99% of users make <100 req/min but your limit is 50, you're blocking normal behavior
Lesson 997Testing and Monitoring Rate Limiters
Perfect collision avoidance
Sequential IDs never collide
Lesson 1516Counter-Based vs UUID Approaches
Perfect for horizontal scaling
Any server can verify any token independently
Lesson 916Session vs Token Tradeoffs
Performance anomalies
Slow queries, timeout warnings, resource exhaustion signals.
Lesson 1129What to Log vs What Not to Log
Performance benefits
Not waiting for cross-region coordination means lower latency
Lesson 532Why Eventual Consistency ExistsLesson 891SSL/TLS Termination
Performance ceiling
Limited by host OS and general-purpose CPU
Lesson 108Hardware vs Software Load Balancers
Performance considerations
matter.
Lesson 945Token Propagation Across Services
Performance cost
Coordinating across shards requires multiple round-trips and locks, destroying the performance benefits you sharded for in the first place
Lesson 261Distributed Transactions Across Shards
Performance impact
Migration consumes I/O, CPU, and network resources
Lesson 258Resharding and Data Migration
Performance optimizations
Naive implementations perform poorly; optimizations add layers of complexity
Lesson 617Why Paxos Is Difficult in Practice
Performance penalty
Multi-partition transactions are 10-100x slower than single-partition operations
Lesson 1489Cross-Partition Transactions
performance requirements
(which focus on *how fast* your system responds right now), scalability asks: *"What happens when we go from 100 users to 100,000 users?
Lesson 13Scalability Requirements: Growth ExpectationsLesson 19Why Back-of-the-Envelope Estimation Matters
Performance SLAs
(latency commitments)
Lesson 191CDN Provider Feature Comparison
Performance tuning
Mobile BFF can implement aggressive caching; web BFF can prioritize real-time updates
Lesson 904BFF vs Single Gateway Tradeoffs
Performance/Latency
How fast must operations complete?
Lesson 553Choosing Consistency Levels
Performant
(loads quickly, even for users following thousands of accounts)
Lesson 1632Functional Requirements: Core Feed Features
Performs
its local transaction when triggered
Lesson 590Choreography-Based Sagas
Periodic polling
Every 30-60 seconds while app is active
Lesson 1671Real-Time Requirements for Social Feeds
Periodic Reports and Analytics
Lesson 738Batch Processing Use Cases
Periodically sync to Redis
Flush accumulated counts every 100-500ms, or when local count reaches a threshold
Lesson 1801Local Caching for Performance
Permanent errors
indicate something fundamentally wrong that won't fix itself:
Lesson 1026Retry on Which Errors
Permission
What they can do (read, write, delete, execute)
Lesson 937Access Control Lists (ACLs)
Permissions
Specific actions on resources (e.
Lesson 933Role-Based Access Control (RBAC) Fundamentals
Persist results
to return them for duplicate requests
Lesson 1011Idempotency Key Storage and Lookup
Persist to disk
Temporary overflow storage
Lesson 1155Log Buffering and Backpressure
Persistence
The coordinator writes its decision to durable storage **before** announcing it, ensuring recovery is possible if it crashes.
Lesson 569The Coordinator Role in 2PCLesson 698Streaming vs Message Queues
Persistent connections
(database connection pools, chat servers)
Lesson 87Least Connections AlgorithmLesson 893WebSocket and Long-Polling Support
Persistent stores
like RocksDB write data to disk first.
Lesson 340In-Memory vs Persistent Key-Value Stores
Physical distance
signals must travel
Lesson 169The Latency Problem CDNs Solve
Physical shards
(or nodes) are the actual database servers that host those logical shards.
Lesson 235Logical vs Physical Shards
Physical storage location
(which server holds your data via partitioning)
Lesson 413Row Keys and Clustering
Pick a safe environment
(staging preferred initially)
Lesson 1345Starting with Game Days
Pie charts
breaking down log levels by service
Lesson 1152The ELK Stack: Kibana
PLAIN
Username/password (use only over SSL)
Lesson 727Kafka Security: Authentication and Encryption
Planning maintenance windows
– Schedule proactive work before predicted failure
Lesson 1323Mean Time Between Failures (MTBF)
Player logic
Client measures network speed, requests appropriate quality segments
Lesson 1602Adaptive Bitrate Streaming (ABR)
Plural for accumulating metrics
Counters that grow should use plural nouns: `http_requests_total`, not `http_request_total`.
Lesson 1182Metric Naming Conventions
plus
a conflict resolution mechanism (like read-repair with version vectors) that always returns and propagates the latest value.
Lesson 559Strong Consistency with QuorumsLesson 1140Contextual Fields
PN-Counter
(positive-negative counter): Separate increment and decrement counters
Lesson 1384Conflict-Free Replicated Data Types (CRDTs)
Point of Presence (PoP)
is a physical data center location where a CDN provider has installed servers and networking equipment.
Lesson 171Points of Presence (PoPs) and Edge Locations
Point-in-time consistency
across partitions requires coordination
Lesson 1492Operational Complexity of Partitioning
Pointer compression
(use 32-bit offsets instead of 64-bit pointers)
Lesson 1759Trie Space Optimization Techniques
Policy Decision Point (PDP)
is a dedicated service that evaluates authorization policies and returns allow/deny decisions.
Lesson 941Policy Decision Points (PDP) and Enforcement Points (PEP)
Policy enforcement
Translates high-level policies into Envoy-compatible configurations
Lesson 861Istio: Architecture and Components
Policy Enforcement Point (PEP)
is the component in your application or gateway that intercepts requests, asks the PDP for a decision, and enforces it.
Lesson 941Policy Decision Points (PDP) and Enforcement Points (PEP)
Politeness Queues
(Back-End): Per-host queues ensuring only one request per host happens concurrently.
Lesson 1843Multi-Queue Frontier Architecture
politeness table
is an in-memory data structure that maintains crawling metadata for each host your crawler interacts with.
Lesson 1848Politeness Table and Per-Host StateLesson 1849URL Frontier Persistence and Recovery
Polyglot Microservices Environments
Lesson 868When Service Mesh Adds Value
Polyglot Persistence Pattern
means using different types of databases within a single application, rather than forcing all your data into one database system.
Lesson 327Polyglot Persistence Pattern
Poor Key Distribution
If your partition key is a timestamp and everyone writes at the current time, all writes hit one partition.
Lesson 1482The Hot Partition Problem
Poor scalability
You can't reliably predict capacity when adding hardware
Lesson 1462The Uneven Distribution Problem
Poor user experience
– Users wait seconds for feeds to load
Lesson 1637Pull (Read-Time) Feed ModelLesson 1647Fanout-on-Read (Pull Model)
Popular content
Rank pastes by view count, creation date, or trending velocity (views per hour).
Lesson 1583Analytics and Usage Metrics
Popular data naturally cached
Frequently requested data stays in cache, while rarely-used data doesn't waste cache space
Lesson 131Cache-Aside (Lazy Loading) Pattern
Popular high-authority sites
Start with well-known domains like major news outlets, Wikipedia, or social media platforms.
Lesson 1828Seed URLs and Starting Point
Popular URLs Stay Hot
Viral links remain in cache, never touching the database after initial load
Lesson 1523Caching Layer Architecture
Popularity matters most
→ LFU
Lesson 153Choosing an Eviction Policy
Populate
Store result in Redis with TTL
Lesson 355Redis as a Cache
Position offsets
where exactly in the document
Lesson 1735Inverted Index Structure
Positional information
Where exactly in the document the term appears (optional, for phrase queries)
Lesson 1736Posting Lists and Document IDs
Positions
List of word positions within the document (for phrase queries)
Lesson 1745Posting Lists and Document IDs
Post-hoc governance
Tools like data catalogs and metadata layers added later to organize the chaos
Lesson 764Data Governance and Quality
Post-recovery validation
automated smoke tests
Lesson 1441Runbooks and Automation
Postgres-Compatible SQL
You can use standard SQL queries, transactions, and tools.
Lesson 334CockroachDB and Distributed SQL
PostgreSQL
for order transactions (needs ACID guarantees)
Lesson 327Polyglot Persistence Pattern
posting list
the ordered sequence of document IDs where that term appears.
Lesson 1743What Is an Inverted IndexLesson 1745Posting Lists and Document IDs
Posting list cache
Cache frequently accessed index segments
Lesson 1742Search System Architecture Overview
Potential cascading failure
Backend systems may crash under load
Lesson 159Cache Stampede Problem
Pre-aggregated metrics
"total revenue by region" when you've already rolled up the data
Lesson 762Query Performance Tradeoffs
Pre-computation strategy
The ratio justifies pre-generating feeds asynchronously rather than computing them on-demand.
Lesson 1636Capacity Estimation: Feed Reads vs Writes
Pre-creates
a set of database connections when your application starts
Lesson 267What is Connection Pooling
Pre-Generated Key Pools
means creating millions of keys in advance and storing them in a dedicated table.
Lesson 1551Key Generation Strategy
Pre-screening
Before creating the short URL, query the threat API with the destination URL
Lesson 1540Spam and Malicious Link Detection
Precision
Every request counted exactly where it occurred
Lesson 968Sliding Window Log
Precomputed ranking
means maintaining a pre-sorted feed in storage.
Lesson 1667Real-Time vs Precomputed Ranking
Precondition checks
verify the system state before execution
Lesson 1441Runbooks and Automation
Predictability risk
Users can guess `abc123` → `abc124`, revealing information about your traffic volume
Lesson 1516Counter-Based vs UUID Approaches
Predictable failure mode
retry loop handles edge cases
Lesson 1512Random String Generation
Predictable freshness
without paying the latency cost of synchronous replication
Lesson 1397Bounded Staleness Consistency
Predictable impact
You know exactly which keys move
Lesson 1461Removing Nodes Gracefully
Predictable lookup
Any node can calculate where replicas should be by performing the same clockwise walk
Lesson 1466Replication with Consistent Hashing
Predictable lookups
Given a short code, you always know which shard to query
Lesson 1541Sharding and Database Scaling
Predictable queries
dashboards, monthly reports, KPIs you run repeatedly
Lesson 762Query Performance Tradeoffs
Predictable read latency
same for 10 or 10 million followers viewing
Lesson 1638Push (Write-Time) Feed Model
Predictable read performance
Read latency is constant regardless of how many people a user follows.
Lesson 1646Fanout-on-Write (Push Model)
Predictable traffic patterns
If analytics show certain content becomes popular at specific times (news sites at 6am, streaming shows at 8pm), preload those assets beforehand.
Lesson 184Cache Warming and Preloading
Predictable Update Patterns
Lesson 290When to Denormalize
Predictive
Use analytics to anticipate viral content
Lesson 1631Multi-Region Replication Strategy
Preemption handling
If a higher-numbered prepare arrives, the current leader steps down, and the protocol reverts to full Paxos
Lesson 616Multi-Paxos for Log Replication
Prefect
and **Dagster** represent the next generation.
Lesson 773Prefect and Dagster for Modern Workflows
Preferences
Stored user preferences (language, past clicks, categories of interest) influence which suggestions surface first.
Lesson 1767Personalized Typeahead
Prefetching
means resolving DNS records *before* you actually need them.
Lesson 1858DNS Prefetching and Batch Resolution
Prefetching thumbnails
and initial video segments during idle time
Lesson 1618Optimizing for Mobile Networks
Prefix compression
stores common prefixes once.
Lesson 1776Typeahead Index Optimization
Premature rejection
Tokens rejected as expired when they're still valid
Lesson 949Clock Skew and Token Validation
Premium storage tiers
with faster access
Lesson 1413The Cost-Availability Tradeoff
Prepare phase
It sends a `PREPARE` message to all participant nodes, asking "Can you commit this transaction?
Lesson 569The Coordinator Role in 2PCLesson 5752PC Performance CharacteristicsLesson 612The Two-Phase ProtocolLesson 613The Prepare Phase
Prepared
After Phase 1, the participant has voted "yes" and locked resources, but hasn't committed yet
Lesson 572Participant State Transitions
Presence detection
lets you distinguish between "online and active" vs "offline" vs "connected but inactive," so you can optimize real-time delivery accordingly.
Lesson 1676Presence Detection and User Status
Presence service
Tracks active user IDs in a fast key-value store (Redis) with TTL (time-to-live) expiration
Lesson 1676Presence Detection and User StatusLesson 1681Mobile Push Notification Integration
Present meaningful feedback
to end users rather than cryptic error messages
Lesson 1115Deadline Exceeded Error Handling
Preserve capacity
for critical operations or users that can still be served
Lesson 1084Load Shedding Under Cascading Failure
Prevent new elections
Followers reset their election timeouts when receiving valid heartbeats
Lesson 624AppendEntries RPC: Replication Mechanism
Prevent resource exhaustion
by freeing up threads/connections that would otherwise hang forever
Lesson 1086What Timeouts Are and Why They Matter
Preventing duplicate writes
becomes possible when you can uniquely identify each entity
Lesson 299Primary Keys and Entity Integrity
Prevention
Always close connections in `finally` blocks or use automatic resource management patterns to guarantee cleanup, even when errors occur.
Lesson 275Common Pooling Anti-Patterns
Prevents
paying customers from being starved by free users
Lesson 990Tiered Rate Limits for Different User Classes
Prevents cascading failures
– If Service A keeps hammering failing Service B, both services may collapse under the load
Lesson 1046The Three States: Open
Prevents orphans
You can't insert an order with `user_id=999` if user 999 doesn't exist
Lesson 300Foreign Keys and Referential Integrity
Prevents overload
by ensuring no single server handles too many requests
Lesson 76What Is a Load Balancer?
Prevents repeat failures
By fixing underlying system issues rather than "fixing" people, you address the real problems— missing guardrails, inadequate testing, confusing interfaces, or knowledge silos.
Lesson 1351Blameless Postmortem Culture
Prevents Thundering Herd
Avoids simultaneous cache misses for the same popular URLs
Lesson 1529Preloading Hot URLs into Cache
Previous log information
To ensure consistency with the follower's log
Lesson 624AppendEntries RPC: Replication Mechanism
Pricing
Pay per GB processed and per hourly rule configured, with no charge for the load balancer itself when idle.
Lesson 114Cloud Load Balancers (GCP and Azure)Lesson 728AWS Kinesis Overview
Primary access patterns
How do users typically need the data ordered?
Lesson 1895Default Sorting and Index Alignment
Primary connection pool
A set of pre-established database connections to the primary server, used exclusively for INSERT, UPDATE, DELETE operations
Lesson 221Application-Level Connection Management
Primary delivery pipeline
– processes the notification and sends it
Lesson 1725Analytics Pipeline Architecture
primary key
is a column (or set of columns) that uniquely identifies each row in a table.
Lesson 299Primary Keys and Entity IntegrityLesson 423Primary Key Components
Primary key constraints
Is this identifier unique and not null?
Lesson 305Consistency Guarantees
Primary processes the write
→ Stores the data and records the change
Lesson 199Primary-Replica Architecture
Primary sends changes
→ Replicates updates to all Replicas asynchronously or synchronously
Lesson 199Primary-Replica Architecture
Primary-Replica architecture
(also called Master-Slave), you have:
Lesson 199Primary-Replica Architecture
Principal
Who (user ID, group, service account)
Lesson 937Access Control Lists (ACLs)
Prioritize Availability (AP)
Allow transactions to proceed with uncertainty.
Lesson 580CAP Theorem Impact on Distributed Transactions
Prioritize Consistency (CP)
Block and wait indefinitely until the partition heals.
Lesson 580CAP Theorem Impact on Distributed Transactions
Prioritize critical requests
using request classification.
Lesson 963Graceful Degradation with Rate Limits
Prioritize URLs
based on relevance, freshness, or importance
Lesson 1826What is a Web Crawler
Prioritized
Critical fixes first, nice-to-haves later
Lesson 1352Postmortem Structure and Action Items
Prioritizer Queues
(Front-End): Multiple queues (e.
Lesson 1843Multi-Queue Frontier Architecture
Prioritizing Requirements Under Constraints
?
Lesson 19Why Back-of-the-Envelope Estimation Matters
Priority
Crawl important or frequently-changing pages first
Lesson 1732Crawling and Document Collection
Priority handling
Premium users or smaller images can jump the queue
Lesson 1595Thumbnail and Preview Generation Trigger
Priority level
Critical alerts (security warnings) might override user preferences and use multiple channels.
Lesson 1703Channel Routing Logic
priority queue
where each URL has a "next crawl time" calculated from these factors.
Lesson 1835Crawl Freshness RequirementsLesson 1844Front Queue: Priority Management
Priority queuing
Process regular tenants' requests before hot tenants during contention
Lesson 1823Hot Tenant Problem
Priority sampling
High-value transactions (premium users, checkout flows, critical B2B partners) carry a priority flag.
Lesson 1256Priority and Debug Sampling
Priority score
(lower = more urgent), or
Lesson 1847Heap-Based Priority Queue Implementation
Priority tiers
Premium clients get higher limits than free-tier users
Lesson 885Rate Limiting and ThrottlingLesson 995Graceful Degradation Through Throttling
Priority-based scheduling
High-value or frequently-changing pages get recrawled more often.
Lesson 1873Handling Recrawls and Freshness
Priority-Based Throttling
Apply stricter limits to celebrity fanouts while allowing normal users faster processing.
Lesson 1654Fanout Rate Limiting
Privacy compliance
Sensitive data doesn't linger indefinitely
Lesson 1565Expiration Requirements and TTL Basics
Private
pastes require authentication.
Lesson 1576Access Control and Privacy Settings
Problem
If thousands of edge servers all miss cache simultaneously, they'd stampede your origin server with identical requests.
Lesson 182Cache Hierarchies and Tiered Caching
Problem statement
"Users share photos with followers.
Lesson 10Identifying Functional Requirements
Process all historical events
through the new logic
Lesson 754Event Log Replay in Kappa
Process later
Consumer services pull from the queue and persist to an analytics database
Lesson 1530Analytics and Click Tracking
Processes incrementally
using stream processing frameworks (Storm, Flink, Spark Streaming)
Lesson 749Lambda Architecture: Speed Layer
Processing
Generate thumbnails, transcode videos to multiple formats/bitrates
Lesson 1584Image/Video Hosting: Problem Definition and Scale
Processing Latency
tracks end-to-end time from API submission to channel delivery.
Lesson 1707Processing Pipeline Monitoring
Processing load
Analyzing every trace overwhelms backend systems
Lesson 1228Trace Sampling Fundamentals
Processing time
when Flink sees it (arrival at warehouse)
Lesson 770Apache Flink Architecture
Producer acknowledgments
(or "acks") are the broker's way of saying "Yes, I received your message.
Lesson 682Producer AcknowledgmentsLesson 707In-Sync Replicas (ISR)
Producer registers schema
Before sending data, the producer submits the schema (e.
Lesson 725Schema Registry and Evolution
producers
create and send messages independently of **consumers** who receive and process them.
Lesson 646The Producer-Consumer ModelLesson 671ActiveMQ and Traditional Enterprise MessagingLesson 694Producers and Consumers
Product catalog API
Cache for 5 minutes—inventory doesn't change every second
Lesson 194CDN for API Acceleration
Product catalogs
– item descriptions rarely change instantly; eventual sync is fine
Lesson 318When to Choose ACID or BASE
Product teams
traditionally prioritize velocity and customer features.
Lesson 1282Error Budget as a Shared Currency
Production aggregation
Pure structured JSON sent to your centralized logging system
Lesson 1166Human-Readable vs Machine-Parseable
PROFILE
runs the query *and* shows performance metrics:
Lesson 469Indexes and Query Performance
Progressive health checks
that signal "ready but limited capacity"
Lesson 1081Thundering Herd After Recovery
Progressive rollout
if canaries look healthy, expand to 25%, then 50%, then 100%
Lesson 1314Release Engineering and Safe Deployment
Progressive Technology Adoption
means you can introduce new tools, languages, or platforms incrementally by creating new microservices or refactoring individual services—while the rest of your system continues running unchanged.
Lesson 799Progressive Technology Adoption
Project work
New features, migrations
Lesson 1312Measuring and Reducing Toil
Projections compute state
Different services read the event log and build their own view of current state
Lesson 586Alternative: Event Sourcing for Consistency
Prometheus
excels at hundreds of thousands of time series but may need federation beyond single-datacenter deployments.
Lesson 1208Choosing a Metrics System for Your Scale
Prometheus's metadata API
, internal wikis, or dedicated platforms like **Datadog's metric summaries** to maintain these catalogs.
Lesson 1216Metric Documentation and Discovery
promise
"I won't accept any proposal lower than yours.
Lesson 612The Two-Phase ProtocolLesson 613The Prepare Phase
Promotion
The selected replica is reconfigured to accept writes
Lesson 207Replica Promotion and Failover Basics
PromQL
in Prometheus offers powerful ad-hoc queries, while **Graphite** provides simpler but less flexible querying.
Lesson 1208Choosing a Metrics System for Your Scale
Propagate
it through HTTP headers, message queues, or RPC metadata
Lesson 1158Correlation IDs Across Services
Propagates automatically
Each hop receives and can read this deadline
Lesson 1104gRPC Timeout Propagation
Propagation strategies
determine how updates flow through cache layers:
Lesson 128Cache Coherence Across Layers
Properties
Key-value pairs attached to both nodes and edges
Lesson 451What is a Graph Database?Lesson 452Graph Model: Nodes and Edges
Proportional to Node Count
Maintain a fixed number of partitions per node (using virtual nodes in consistent hashing).
Lesson 1485Rebalancing Partitions
Propose a targeted solution
Address *only* that constraint.
Lesson 35Iterate Based on Constraints
Proposers
suggest values for the consensus decision.
Lesson 610The Three Roles in Paxos
Protection at Scale
When you're handling millions of requests per second (like search engines, APIs, or notification systems), every service needs protection.
Lesson 1782Rate Limiter Service Overview
Protection from noisy neighbors
in multi-tenant scenarios
Lesson 859Rate Limiting at Service Boundaries
Protects on delete
By default, you can't delete a user if they have orders (you must handle the orders first)
Lesson 300Foreign Keys and Referential Integrity
Protects origin
during traffic spikes or viral content
Lesson 1614Origin Shield Pattern
Protocol Buffers
(protobuf) for binary serialization and **HTTP/2** for transport, delivering significantly better performance.
Lesson 1917gRPC: Protocol Buffers and Binary RPC
Protocol Enhancements
CDNs use optimized protocols between edge and origin (like HTTP/2, HTTP/3, or custom protocols) that reduce overhead and handle packet loss better than standard internet connections.
Lesson 186Dynamic Content Acceleration
Protocol overhead
Account for TCP/IP headers (typically 5-10% extra)
Lesson 1499Bandwidth Requirements for Redirects
Protocol translation
becomes the client's responsibility
Lesson 870What is an API Gateway?
Protocol-agnostic
Works with any TCP/UDP traffic (HTTP, HTTPS, FTP, databases, gaming)
Lesson 109Layer 4 (Transport) Load Balancing
Prove the concept first
Focus on product-market fit, not infrastructure complexity
Lesson 820When a Monolith is the Right Choice
Provide predictable latency bounds
for your service's response times
Lesson 1086What Timeouts Are and Why They Matter
Provider-specific fields
(FCM vs APNS token formats)
Lesson 1692Channel-Specific Formatting
Provisions certificates
– When a new sidecar proxy starts, the control plane generates a unique X.
Lesson 844Control Plane: Certificate Management
Proxy-Based Read-Write Splitting
moves that responsibility to a middleware layer—a database proxy that sits between your application and your databases.
Lesson 222Proxy-Based Read-Write Splitting
ProxySQL
(for MySQL) or **MaxScale** automatically inspect incoming SQL queries and route them intelligently:
Lesson 222Proxy-Based Read-Write Splitting
Prune unnecessary logs
Remove debug-level logs left in production code, redundant information, or logs that duplicate what metrics already capture.
Lesson 1171Log Review and Alert Fatigue
Pruning
Ignore low-scoring documents early using approximate scoring
Lesson 1741Search Latency and Response Time
Psychological safety
means people can admit errors without fear, knowing the team will analyze systems, not scapegoat individuals.
Lesson 1317Blameless Culture and Learning from Failure
PTransforms
Operations that transform data (like `Map`, `Filter`, `GroupByKey`)
Lesson 772Apache Beam Programming Model
Pub-sub
email service, analytics, CRM, and welcome workflow all need to know independently.
Lesson 664Choosing Between Queue and Pub-Sub
Pub/Sub
implements the **pub-sub pattern** for event broadcasting.
Lesson 672Redis as a Lightweight Message BrokerLesson 735Choosing a Streaming Platform
Public
pastes are indexed, searchable, and accessible to anyone with the URL.
Lesson 1576Access Control and Privacy Settings
Public marketing campaigns
URLs are meant to be shared anyway
Lesson 1515Short URL Predictability Tradeoffs
Public status page
Customer-facing updates (e.
Lesson 1301War Rooms and Communication Channels
Public URLs
Fast path with full caching (no auth checks)
Lesson 1533Access Control and Private URLs
Publish-Subscribe
(Pub-Sub) messaging solves this by introducing a central message broker.
Lesson 1675Pub-Sub for Real-Time Distribution
PublishKafka
Integrate with Kafka
Lesson 775Apache NiFi for Data Flow
Pull (fanout-on-read)
makes sense for **celebrities with >100K followers**.
Lesson 1658Fanout Strategy Selection Criteria
Pull (Lazy Caching)
The CDN fetches content from your origin server only when a user requests it.
Lesson 1610Push vs Pull CDN Models for Media
Pull distributed state
On cache miss or window expiration, fetch the authoritative count from Redis
Lesson 1801Local Caching for Performance
Pull for celebrities
When a celebrity with 50 million followers posts, don't fan-out to everyone—that's too expensive.
Lesson 1639Hybrid (Pull-Push) Feed Model
Pull for inactive users
Don't maintain pre-built feeds for users who haven't logged in for months; generate their feed on- demand if they return.
Lesson 1639Hybrid (Pull-Push) Feed Model
Pull Model (Read-Time Fanout)
No immediate distribution.
Lesson 1645What is Fanout in Social Media Systems
Pull subscriptions
Your application actively requests messages from Pub/Sub.
Lesson 674Google Cloud Pub/Sub Architecture
Pull-on-open
Only when user actively opens the app (most common)
Lesson 1671Real-Time Requirements for Social Feeds
Pulsar
self-hosted gives you control.
Lesson 735Choosing a Streaming Platform
Purge Strategy
When a paste expires or is deleted, you need to purge it from the CDN using their API to prevent serving stale content.
Lesson 1569CDN Integration for Paste Delivery
Push (fanout-on-write)
works best when users have **fewer than ~1,000-5,000 followers**.
Lesson 1658Fanout Strategy Selection Criteria
Push (Pre-Population)
*You* upload content directly to CDN edge locations before anyone requests it.
Lesson 1610Push vs Pull CDN Models for Media
Push a lightweight notification
to active/online users only (via WebSocket or SSE)
Lesson 1679Hybrid Pull-Push Model
Push for active users
When a regular user posts, fan-out their content immediately to all followers' pre-built feeds (fast reads, manageable fan-out).
Lesson 1639Hybrid (Pull-Push) Feed Model
Push Model (Write-Time Fanout)
When Alice publishes, the system immediately writes her post to all followers' feed storage.
Lesson 1645What is Fanout in Social Media Systems
Push subscriptions
Pub/Sub delivers messages via HTTP POST to your webhook endpoint.
Lesson 674Google Cloud Pub/Sub Architecture
PutFile
Read/write local files
Lesson 775Apache NiFi for Data Flow
PyBreaker
(Python): Simple but effective
Lesson 1062Circuit Breaker Libraries and Frameworks
PythonOperator
Executes Python functions
Lesson 767Airflow Operators and Executors

Q

QPS
(queries per second) and understand how much traffic a single server can handle, you need to figure out: *How many servers do I actually need?
Lesson 28Server Count EstimationLesson 120Caching Hierarchy Overview
QR code
a scannable 2D barcode that encodes the URL.
Lesson 1539QR Code Generation
Quality assurance
Bad data is rejected at write time
Lesson 759Schema-on-Write vs Schema-on-Read
Quality checks
ETL processes validate, clean, and transform data upfront
Lesson 764Data Governance and Quality
Quality issues
Garbage in, garbage out—discovered only when queried
Lesson 764Data Governance and Quality
Quality uncertainty
Bad data only discovered when accessed
Lesson 759Schema-on-Write vs Schema-on-Read
Quarantine mechanism
Temporarily disable suspicious short URLs pending manual review
Lesson 1540Spam and Malicious Link Detection
Quarantine system
for suspicious content pending manual review
Lesson 1581Abuse Prevention and Content Moderation
Queries
Read operations clients can perform (e.
Lesson 1912GraphQL Schema and Resolvers
Query Alignment
Select keys that match your access patterns.
Lesson 1472Range Partition Key Selection
Query Broadcasting
Send the identical search request to all shards in parallel
Lesson 1780Distributed Query Coordination
Query complexity
Mixed fast and slow queries?
Lesson 226Load Distribution Across Replicas
Query complexity analysis
assigns a cost to each field before execution.
Lesson 1916Rate Limiting and Complexity Analysis in GraphQL
Query Complexity Hurts Performance
Lesson 290When to Denormalize
Query efficiency
Use clustering columns that match your read patterns, so related data sits together and sorted
Lesson 423Primary Key ComponentsLesson 1480Hybrid Partitioning Approaches
Query efficiently
"Show all videos uploaded by user X" without scanning massive files
Lesson 1590Metadata Database Design
Query Engine
Executes PromQL queries against stored data
Lesson 1198Prometheus Architecture and Data Model
Query examples
How to use it in PromQL or other query languages
Lesson 1216Metric Documentation and Discovery
Query expansion
rewrites or augments queries with synonyms.
Lesson 1774Spell Correction and Query Expansion
Query Flow
User submits search → API queries search index → Index returns ranked paste IDs → Fetch paste metadata from database/cache → Return results
Lesson 1582Search and Discovery
Query followers
`SELECT follower_id WHERE followee_id = ?
Lesson 1643Follow Graph Storage
Query following
`SELECT followee_id WHERE follower_id = ?
Lesson 1643Follow Graph Storage
Query local cache
(OS or browser level)
Lesson 1856DNS Resolution Fundamentals for Crawlers
Query locality
Queries often filter by multiple dimensions.
Lesson 245Composite Shard Keys
Query parameter versioning
`/api/orders?
Lesson 809Versioning and Backward Compatibility
Query Pattern Alignment
is crucial.
Lesson 247Choosing the Right Shard Key
Query precisely
"Find all login failures from this IP in the last hour" becomes a simple filter on `event="login_failed"` and `ip_address` fields
Lesson 1137What is Structured Logging
Query routers (mongos)
Direct client queries to the correct shard(s)
Lesson 396Sharding in MongoDB
Query routing layer
A lightweight service maps prefix ranges to server addresses.
Lesson 1764Distributed Trie Architecture
Query scope
Queries run against a specific collection, not the entire database
Lesson 383Collections and Databases
Query servers
(or "search servers") receive user queries, parse them using boolean operators, fetch relevant posting lists from indexes, compute TF-IDF or other scoring functions, rank results, and return the top matches—all within 100-300ms.
Lesson 1742Search System Architecture Overview
Query Service
API for retrieving traces
Lesson 1242Zipkin Architecture and Design
Query slowdowns
Aggregations must scan thousands or millions of series
Lesson 1207Metrics Cardinality and Performance Impact
Query targeting
(queries hit specific shards)
Lesson 397Shard Key Selection
Query that partition's filter
(via RPC or cache lookup)
Lesson 1867Distributed Deduplication with Bloom Filters
Query the directory service
"Where is `user_id=12345` stored?
Lesson 242Directory-Based Sharding
Query throughput
increases as load distributes
Lesson 65What is Data Partitioning?
Query timeout
Maximum time a query can run before being automatically killed
Lesson 285Query Timeout and Statement Limits
Query timeouts
and **statement limits** are safety mechanisms that prevent individual queries from consuming excessive resources.
Lesson 285Query Timeout and Statement Limits
Query understanding
Frequent misspellings guide spell correction; low-CTR queries highlight gaps in your index
Lesson 1779Search Analytics and Click Tracking
Query-aligned
Matches your most common query patterns to minimize cross-shard lookups
Lesson 232Shard Key Selection
querying
(query service)—Jaeger handles millions of spans per second.
Lesson 1241Jaeger Architecture and ComponentsLesson 1730What is a Search Engine?
QueryRecord
Transform data formats (JSON, CSV, Avro)
Lesson 775Apache NiFi for Data Flow
Queue assignment
Route the URL to the dedicated queue for that host
Lesson 1845Back Queue: Politeness Enforcement
Queue buildup
Failed requests leave behind tasks that still run to completion
Lesson 1096Why Timeouts Must Propagate
Queue conflicting updates
for later reconciliation
Lesson 494AP Systems: Prioritizing Availability
Queue depths
Backlog growing in message queues or thread pools
Lesson 993Adaptive Rate LimitingLesson 1871Monitoring Crawler Fleet Performance
Queue lower-priority operations
for later processing rather than rejecting them outright.
Lesson 963Graceful Degradation with Rate Limits
Queue mirroring
(now called **quorum queues** in modern RabbitMQ) replicates queue data across nodes, ensuring messages aren't lost when a node crashes.
Lesson 668RabbitMQ Clustering and High Availability
Queue Partitioning
Split your notification queue into multiple partitions (e.
Lesson 1708Scalability and Horizontal Expansion
Queue/Buffer
Temporarily stores messages in order
Lesson 646The Producer-Consumer Model
Queues
provide point-to-point communication with competing consumers.
Lesson 675Azure Service Bus Features
Quiet hours
no notifications between 10 PM–8 AM
Lesson 1702User Preferences Lookup
Quorum Availability
Whether a majority of nodes can communicate.
Lesson 643Monitoring and Operating Consensus Clusters
Quorum satisfied
`R = 3`, `W = 3` → `3 + 3 = 6 > 5`
Lesson 557The Quorum Condition: R + W > N
Quorum waits
Systems must contact and wait for multiple nodes before returning results
Lesson 509Latency: The Hidden Cost of CAP
Quorum write
"Confirm once a majority of replicas acknowledge"
Lesson 1398Consistency Level Per-Operation
Quota
Your monthly spending limit ($10,000)
Lesson 994Quota Management and Burst Allowances
Quota sharding
means splitting the total allowed quota across your nodes.
Lesson 984Quota Sharding Across Nodes

R

R (read quorum)
How many replicas must respond to a read
Lesson 1361Quorum-Based Replication
R = 3
(must read from 3 nodes)
Lesson 555What is a Quorum?
RabbitMQ
is a versatile broker built on the AMQP protocol.
Lesson 665Overview of Message Broker Landscape
Race conditions
Multiple services detecting expiration simultaneously can trigger parallel refresh attempts.
Lesson 946Token Refresh in Distributed SystemsLesson 977Algorithm Implementation Patterns
Raft consensus
(which you learned earlier) to ensure all servers agree on which services are healthy and available.
Lesson 635Consul: Service Discovery with Raft Consensus
Random
has nearly zero overhead (just pick any item) but evicts blindly.
Lesson 154Implementation Tradeoffs
Random Generation
creates identifiers on-the-fly using random characters from a character set.
Lesson 1551Key Generation Strategy
Random Replacement
simply picks an entry at random and evicts it.
Lesson 150Random Replacement
Random Selection
are dead simple—minimal computation, easy to understand and debug.
Lesson 96Algorithm Selection Tradeoffs
Random, individual file retrieval
(not batch processing)
Lesson 1593Distributed File System Considerations
Range
Assigns contiguous partition ranges per topic
Lesson 716Consumer Groups and Partition Assignment
Range approach
Group by genre → finding mysteries is trivial, but the "Romance" section might overflow while "Gardening" sits empty
Lesson 1454Partitioning Tradeoffs: Distribution vs Query Efficiency
Range limits
Numeric types have min/max boundaries
Lesson 301Schema Enforcement and Type Safety
Range Partitioning Vulnerability
As you learned with range partitioning hotspots (lesson 1474), sequential keys naturally create this problem—newest data in one partition, all recent queries hammering that same partition.
Lesson 1482The Hot Partition Problem
Range queries by time
become extremely efficient.
Lesson 418Time-Series and Time-Ordered Data
Range-based
supports range queries efficiently
Lesson 253Evaluating Sharding Strategy Tradeoffs
Range-based sharding
Partition documents by ranges (e.
Lesson 1769Horizontal Scaling of Search Infrastructure
Ranked feeds
reorder posts using engagement signals:
Lesson 1644Feed Personalization and Ranking Requirements
Ranking
results by relevance, popularity, or other signals
Lesson 1730What is a Search Engine?
Ranking Algorithm
Results need ordering.
Lesson 1582Search and Discovery
Ranking signals
are measurable properties or metadata about posts and users that indicate how relevant or interesting a post might be.
Lesson 1666Ranking Signals and Features
Rapid updates
Score updates, status changes, progress bars
Lesson 1713Provider-Side Deduplication
Rarely changing reference data
→ Application in-memory cache
Lesson 130Choosing the Right Caching Layer
Rate
How many requests per second is your service handling?
Lesson 1190The RED MethodLesson 1265RED Method: Rate, Errors, Duration
Rate drops
→ upstream service stopped calling you, or clients are timing out
Lesson 1265RED Method: Rate, Errors, Duration
Rate limit complex queries
Track cost (filters + sorts) per user
Lesson 1897Performance Considerations and Limits
Rate limit spikes
Momentary quota breach that resets quickly
Lesson 1020Why Retries Are Necessary in Distributed Systems
Rate Limiting and Politeness
You can't hammer a single website with thousands of requests per second.
Lesson 1838URL Frontier: Definition and Purpose
Rate limiting and throttling
to protect backends
Lesson 870What is an API Gateway?
Rate limiting counters
(API request counts)
Lesson 141Cache-as-SoR (System of Record) Pattern
Rate-based
Allow N debug logs per second
Lesson 1164Sampling for High-Volume Logs
Rate-Based Adjustment
If processing rate falls despite a full queue, scale up; if workers are idle, scale down
Lesson 1872Dynamic Scaling Based on Queue Depth
Rate-limited operations
API calls to third-party services
Lesson 659Queue Use Cases: Work Distribution
Ratios and percentages
– Cache hit rates of 70% vs 95% drastically change design
Lesson 32Rounding and Approximation Techniques
Raw data
Keep 1-second resolution for 1 day
Lesson 1179Aggregation and Roll-Ups
Raw storage
365 billion × 1 KB = **365 TB**
Lesson 1498Storage Capacity Estimation
Raw/high-resolution (1-15 seconds)
Keep 1-7 days for debugging active incidents
Lesson 1213Metric Retention Policies
RDF (Resource Description Framework)
structures data as **triples**:
Lesson 453Property Graphs vs RDF Triples
re-encryption
(terminate at load balancer, then re-encrypt to backends).
Lesson 118SSL/TLS Termination at Load BalancersLesson 891SSL/TLS Termination
Re-evaluate
Check if new bottlenecks emerged.
Lesson 35Iterate Based on Constraints
Re-hash with modified input
Use the collision result itself as input: `hash(hash(url))`.
Lesson 1509Handling Hash Collisions
Read `robots.txt` directives
Many sites specify a `Crawl-delay: N` value (in seconds) telling crawlers to wait N seconds between requests
Lesson 1842Politeness Budget and Crawl Delay
Read availability
Multiple nodes can serve the same key simultaneously, spreading read traffic across replicas.
Lesson 364Replication in Distributed Key-Value Stores
Read bandwidth
10-100x write bandwidth due to viral content
Lesson 1584Image/Video Hosting: Problem Definition and Scale
Read Committed
You only see data that's been committed.
Lesson 312Isolation Levels and Concurrent Transactions
Read concern
controls what data you see when reading:
Lesson 395Read and Write Concerns
Read consistency
All reads come from the leader (in basic configurations), guaranteeing you see the latest committed data
Lesson 706Leaders and Followers
Read from the leader
For data a user modified, always route their reads to the primary replica (which has the latest write).
Lesson 542Read-Your-Writes Consistency
Read matching SSTables
scan files that likely contain your data
Lesson 429Read Path and Bloom Filters
Read performance scales
Multiple copies mean more servers can handle read requests simultaneously
Lesson 68What is Data Replication?
Read queries
(`SELECT`) → Read replicas
Lesson 222Proxy-Based Read-Write Splitting
Read repair
fixes this during read operations.
Lesson 431Hinted Handoff and Read Repair
Read scaling
– Add replicas to handle more read traffic without impacting write performance
Lesson 1365Single-Leader Replication Topology
read time
, pull recent posts from celebrities the user follows and merge them with their pre-computed feed.
Lesson 1640Celebrity Problem in Push ModelsLesson 1647Fanout-on-Read (Pull Model)
Read timeout
controls how long your client will wait to *receive* data from the server after the connection succeeds.
Lesson 1089Read Timeout and Write Timeout
Read Uncommitted
Allows reading data that other transactions haven't committed yet ("dirty reads").
Lesson 312Isolation Levels and Concurrent Transactions
read-after-write consistency
for individual users without forcing *all* reads to the primary, which would defeat the purpose of read replicas.
Lesson 214Session-Based Routing for Read-After-WriteLesson 224Read-After-Write ConsistencyLesson 1671Real-Time Requirements for Social FeedsLesson 1678Read-After-Write Consistency
Read-from-leader for own writes
After a write, that client reads from the leader (not replicas) for a period of time or until replication catches up.
Lesson 1390Read-Your-Writes Consistency
Read-Heavy Access Patterns
Lesson 290When to Denormalize
Read-through
Cache manages its own population
Lesson 133Read-Through Caching Pattern
Read-Through + Write-Behind
Lesson 139Combining Cache Patterns
Read-Through + Write-Through
Lesson 139Combining Cache Patterns
Read-through caching
simplifies this by making the cache layer itself handle missing data.
Lesson 133Read-Through Caching Pattern
Read-to-write ratio: 50,000:1
(or more conservatively, 100:1 to 1000:1)
Lesson 1636Capacity Estimation: Feed Reads vs Writes
Read-Your-Writes Consistency
solves this by guaranteeing that once a client writes data, any subsequent reads *by that same client* will reflect that write (or a newer version).
Lesson 542Read-Your-Writes ConsistencyLesson 543Monotonic Reads ConsistencyLesson 1390Read-Your-Writes Consistency
Read-your-writes violation
You can't see your own changes
Lesson 1358Replication Lag in Async Systems
Read:write ratio
100:1 (users view 200 photos/day)
Lesson 33Putting It All Together: Worked Example
Reading data
`GET /user/456` — fetching repeatedly has no side effects
Lesson 1006Natural Idempotency vs Engineered Idempotency
Reads can fan out
– Clients can read from leader or any replica, distributing query load
Lesson 1365Single-Leader Replication Topology
Reads dominate writes
(mappings change rarely), so cache aggressively at clients.
Lesson 1477Directory Service Architecture
Reads may also block
to prevent stale balance information
Lesson 511Banking Systems: Consistency Over Availability
Ready
(initial state) — The participant is idle, ready to receive a prepare request
Lesson 572Participant State Transitions
Real example
Cassandra is PA/EL (available during partitions, prioritizes low latency during normal operation).
Lesson 514Beyond CAP: The Need for PACELC
Real Scale
Popular Pastebin services handle millions of pastes and hundreds of millions of reads, making partitioning, caching, and CDN strategies essential.
Lesson 1542Pastebin System Overview
Real-time accuracy critical
→ Skip caching or use very short invalidation windows
Lesson 130Choosing the Right Caching Layer
Real-time constraint
If operation A completes before operation B begins (in wall-clock time), A must appear before B in that sequence
Lesson 523Linearizability Defined
Real-time dashboards
Backend publishes metrics updates; dashboards subscribe for live data
Lesson 357Redis Pub/Sub for Real-Time MessagingLesson 739Stream Processing Use Cases
Real-time feel
Active users get instant awareness through push notifications.
Lesson 1679Hybrid Pull-Push Model
Real-time leaderboards
(gaming scores)
Lesson 141Cache-as-SoR (System of Record) Pattern
Real-time metrics
displaying request rates, latency, and error rates
Lesson 846Control Plane: API and User Interface
Real-time pub-sub
→ WebSocket → instant delivery
Lesson 1677Selective Push Strategies
Real-time ranking
means running your ranking model every time a user requests their feed.
Lesson 1667Real-Time vs Precomputed Ranking
Real-time reporting
across entities
Lesson 405When Joins Are Required
Real-time verdict
The API returns a risk score or classification (safe, suspicious, malicious)
Lesson 1540Spam and Malicious Link Detection
Real-time views
– Recent data processed by the speed layer
Lesson 750Lambda Architecture: Serving Layer
Real-world analogy
It's like having library branches that each manage their own inventory.
Lesson 262Referential Integrity Across Shards
Rebalancing detection
to spot when partitions become skewed
Lesson 1492Operational Complexity of Partitioning
Recent changes
Last deployment timestamp, config changes
Lesson 1293Alert Context and Enrichment
Recommendation systems
traverse "user liked → movie → similar_to → movie" paths
Lesson 458Use Cases: Fraud Detection and Knowledge Graphs
Reconcile
them using domain-specific logic (merge, pick latest timestamp, keep both, etc.
Lesson 377Eventual Consistency and Application Reconciliation
Reconcile asynchronously
Background jobs detect and fix mismatches
Lesson 583Alternative: Best Effort with Eventual Consistency
Reconciliation Overhead
The serving layer must merge results from both layers.
Lesson 751Lambda Architecture Tradeoffs
Record metadata
The snapshot includes the last included index and term—the log position it represents
Lesson 632Log Compaction: Snapshotting
Record timeout exhaustion events
when a request times out, log how much time was actually used
Lesson 1106Timeout Propagation Observability
Records
the event ID after successful processing
Lesson 1035Idempotency in Event Processing
Recovery
If a consumer crashes or has a bug, replay from before the problem occurred and reprocess correctly.
Lesson 695Stream Retention and ReplayLesson 1330What is Fault Tolerance?
Recovery challenges
requires careful design to handle failures
Lesson 136Write-Behind (Write-Back) Caching Pattern
Recovery mechanism
Messages can be replayed after fixing issues
Lesson 1705Retry and Dead Letter Queues
Recovery Point Objective (RPO)
is non-zero – You lose seconds/minutes of data
Lesson 1356Asynchronous Replication: Speed and RiskLesson 1411Defining Recovery Point Objective (RPO)
Recovery protocols
Bringing a failed node back online requires careful state reconciliation
Lesson 617Why Paxos Is Difficult in Practice
Recovery time expectations
How long does your dependency typically need to recover?
Lesson 1059Timeout Windows and Reset Logic
Recovery Time Objective (RTO)
is the maximum acceptable amount of time your system can be down after a failure before business impact becomes unacceptable.
Lesson 1412Defining Recovery Time Objective (RTO)
Recursive queries
Database queries that traverse parent relationships
Lesson 939Permission Inheritance and Hierarchies
Recycling short codes
back into the pool after a grace period (prevents accidental reuse)
Lesson 1532Expiration and Time-to-Live
RED method
(Rate, Errors, Duration) or **Four Golden Signals** helps focus on metrics that matter because each directly maps to user impact and system health.
Lesson 1215Avoiding Vanity Metrics
REDACTED
*` or `creditCard=1234-****-****-5678`.
Lesson 1163Avoid Logging Sensitive Data
Redeliver to different consumer
Let another instance try
Lesson 684Negative Acknowledgments and Redelivery
Redelivery
→ Consumer nacks or doesn't acknowledge (**negative acknowledgments**)
Lesson 687Dead Letter Queues
Redelivery headers
Metadata tracking attempt count and timestamps
Lesson 684Negative Acknowledgments and Redelivery
Redirect first
Look up the destination URL in cache or database and return the HTTP 301/302 immediately
Lesson 1530Analytics and Click Tracking
Redirection
Your application is pointed to the new primary
Lesson 207Replica Promotion and Failover Basics
Redis (with Streams/Pub-Sub)
offers lightweight messaging built into your caching layer.
Lesson 665Overview of Message Broker Landscape
Redis pipelining
, you can batch operations like:
Lesson 1811Batch Operations to Reduce Network Calls
Redis Pub-Sub
works well for ephemeral, low-latency messaging where losing occasional messages is acceptable.
Lesson 1675Pub-Sub for Real-Time Distribution
Redis Sentinel
is a separate process (or cluster of processes) that acts as a watchdog for your Redis deployment.
Lesson 353Redis Sentinel for High Availability
Redis/Memcached
Fast, in-memory key-value stores perfect for session data
Lesson 59Externalizing State with Shared Storage
Reduce
Workers combine their results into a final answer
Lesson 743Batch Processing Frameworks
Reduce configuration surface area
Every config flag is a potential production incident
Lesson 1315Simplicity as a Core Value
Reduce fear
of production incidents by making failure practice routine
Lesson 1345Starting with Game Days
Reduce human error
Code executes consistently; humans make mistakes when tired or rushed
Lesson 1308The SRE Philosophy: Treating Operations as Software
Reduced attack window
Stolen access tokens are useless after minutes
Lesson 915Token Expiration and Refresh Tokens
Reduced availability
If any replica is down, writes may fail or block
Lesson 1355Synchronous Replication: Guarantees and Costs
Reduced availability during partitions
(CP in CAP terms)
Lesson 530Strong Consistency in Practice
Reduced bandwidth
by eliminating over-fetching
Lesson 1910GraphQL Fundamentals and Query Language
Reduced coordination
No cross-team approval needed for BFF changes
Lesson 906BFF Ownership and Team Structure
Reduced correlated failures
Regional power outages, natural disasters, or ISP issues affect one location but not geographically distant ones.
Lesson 1429Geographic Backup Distribution
Reduced Downtime
When you know something's wrong immediately, you can fix it faster.
Lesson 1262What is Monitoring and Why It Matters
Reduced Latency
Read queries return faster because replicas aren't competing with write operations for CPU, memory, and disk I/O.
Lesson 220Read-Write Splitting FundamentalsLesson 887API Composition and Aggregation
Reduced replication lag risk
At least one replica is guaranteed up-to-date
Lesson 217Semi-Synchronous Replication Trade-offs
Reduced resilience
– Can't easily redirect traffic away from struggling nodes
Lesson 982Sticky Sessions and Rate Limiting
Reduced risk
A bad deployment affects only one service, not everything
Lesson 786Independent Deployability of Microservices
Reduced throughput
Limited by slowest node in the coordination group
Lesson 526The Cost of Strong Consistency
Reduces leader load
– The root node handles far fewer replication connections
Lesson 1374Tree Replication Topology
Reduces origin load
by 80-95% for popular content
Lesson 1614Origin Shield Pattern
Reduces wasted network calls
when you know you'll be rejected
Lesson 1789Client-Side vs Server-Side Rate Limiting
Reducing notification fatigue
Prevents spam from transient states
Lesson 1713Provider-Side Deduplication
Redundancy
Multiple copies of critical components (recall active-active and active-passive patterns)
Lesson 1330What is Fault Tolerance?Lesson 1335Failover Mechanisms
Redundancy for failures
Add extra servers so if one fails, the others can absorb its load.
Lesson 28Server Count Estimation
Redundancy multiplier
Never run at 100% capacity.
Lesson 28Server Count Estimation
Redundant information
If your framework already logs HTTP requests, don't duplicate it.
Lesson 1129What to Log vs What Not to Log
Redundant infrastructure
running continuously
Lesson 1413The Cost-Availability Tradeoff
Redundant instances
across multiple availability zones
Lesson 1784Non-Functional Requirements: Latency and Availability
Reference
documents from other documents (similar to foreign keys, though not enforced)
Lesson 382Document IDs and Primary Keys
Reference counting
is critical: only delete the physical file when *all* metadata references are gone.
Lesson 1622Deduplication Strategies
Referential integrity
is the guarantee that relationships between tables remain valid.
Lesson 300Foreign Keys and Referential Integrity
Referrer
which website/app drove the click
Lesson 1530Analytics and Click Tracking
Referrer data
Log the HTTP `Referer` header to see where traffic originates—social media, search engines, or direct links.
Lesson 1583Analytics and Usage Metrics
Referrer source
Where did the clicker come from?
Lesson 1505Analytics and Tracking Requirements
Refine alert thresholds
If an alert fires 20 times per day but only matters once per week, adjust the threshold or add context filters.
Lesson 1171Log Review and Alert Fatigue
Refresh periodically
as new data arrives and old data ages out
Lesson 1117Adaptive Timeouts Based on Historical Latency
Refresh-Ahead
that you've already learned.
Lesson 140Cache Warming Strategies
region
(geo), then by **hash(customer_id)** within each region.
Lesson 250Hybrid Sharding StrategiesLesson 435HBase Regions and Region Servers
Regional Data Partitioning
Each region owns specific user data exclusively
Lesson 1435Multi-Region Architecture for DR
Regional failover
(if one region goes down, others remain)
Lesson 53Geographic Distribution Benefits
Regional fanout workers
processing feed updates
Lesson 1682Scaling to Billions of Daily Active Users
Regional isolation
to prevent cascading failures
Lesson 1334Geographic Redundancy and Multi-Region
Regional Network Load Balancer
Layer 4 for regional deployments
Lesson 114Cloud Load Balancers (GCP and Azure)
Regional Tier
(mid-level aggregation)
Lesson 182Cache Hierarchies and Tiered Caching
RegionServer
is a worker node that hosts multiple regions.
Lesson 435HBase Regions and Region Servers
Registered claims
(standardized):
Lesson 913JWT Structure and Claims
Registers
Last-write-wins registers using timestamps
Lesson 538Conflict-Free Replicated Data Types (CRDTs)
Registry validates compatibility
The registry checks if the new schema is compatible with existing versions using rules (backward, forward, full compatibility)
Lesson 725Schema Registry and Evolution
Regular posts
30 seconds to 2 minutes is typically acceptable
Lesson 1671Real-Time Requirements for Social Feeds
Regular users (< threshold)
Use **fanout-on-write**.
Lesson 1648Hybrid Fanout Strategy
Reject the request immediately
without calling downstream services.
Lesson 1102Handling Zero or Negative Timeouts
Rejected alternatives
What did you consider but not choose, and why?
Lesson 42Document Your Decisions
Rejection count
requests blocked while open
Lesson 1055Circuit Breaker Observability
Related data stays together
When you query a user's timeline or a tenant's orders, all that data lives on one shard—no cross- shard joins needed.
Lesson 244Entity-Based Sharding
Related logs
Deep links to log queries for the time window and affected hosts
Lesson 1293Alert Context and Enrichment
Relational approach
Open the entire phone book, find your friend, then open it again to find *their* friends listed somewhere else.
Lesson 476Graph Query Performance Characteristics
Relational database (PostgreSQL)
Lesson 1819Per-Tenant Configuration Storage
Relationship Complexity Is High
Lesson 459Graph vs Relational Trade-offs
Relationship features
how often you interact with this author
Lesson 1668Machine Learning for Feed Ranking
Relative timeout (alternative)
Lesson 1112HTTP Header-Based Propagation
Relatively static data
Data that doesn't change every second
Lesson 124Database Query Result Caching
Release locks
they don't know the final outcome
Lesson 573The Blocking Problem in 2PC
Release locks and connections
to prevent resource exhaustion
Lesson 1115Deadline Exceeded Error Handling
Release resources immediately
(connection pool slots, memory buffers, file handles)
Lesson 1094Timeout Cancellation and Cleanup
Relevance
Sophisticated ranking to surface the best matches
Lesson 1730What is a Search Engine?
Relevance metrics
Click-through rate (CTR), mean reciprocal rank (MRR), and normalized discounted cumulative gain (NDCG) tell you if top results match user intent
Lesson 1779Search Analytics and Click Tracking
Reliability per service
If Service A is down, its queue holds messages until it recovers—Service B is unaffected
Lesson 663Hybrid Patterns: Topic + Queue
Reliable
Don't depend on another potentially failing service
Lesson 1061Fallback StrategiesLesson 1402Full Backups
Reliable ID reservation
No race conditions on duplicate paste keys
Lesson 1559Write Path: Synchronous vs Asynchronous Storage
Remove old columns later
in a separate migration phase
Lesson 265Schema Changes in Sharded Environments
Removing fields
from responses: Clients expecting `user.
Lesson 1905Breaking vs Non-Breaking Changes
Removing or renaming endpoints
Old URLs return 404 errors
Lesson 1905Breaking vs Non-Breaking Changes
Renaming fields
`firstName` becoming `first_name` severs existing references
Lesson 1905Breaking vs Non-Breaking Changes
Rendering layer
UI components that display notification badges and lists
Lesson 1687In-App Notifications
Repair outdated replicas
(read-repair) by writing the latest value back to nodes with stale data
Lesson 559Strong Consistency with Quorums
Repair time
Implementing the fix (rollback, restart, patch)
Lesson 1324Mean Time To Repair (MTTR)
Repeatable Read
Guarantees that re-reading the same rows returns identical data within your transaction, even if others commit changes.
Lesson 312Isolation Levels and Concurrent Transactions
Repetitive
same steps over and over
Lesson 1311Toil: The Enemy of Scale
Replace old views
with newly computed ones
Lesson 748Lambda Architecture: Batch Layer
Replay capability
Consumers can restart from any position in the stream
Lesson 697Push vs Pull Consumption Models
Replayability
Consumers can rewind to any offset and re-process events
Lesson 693The Commit Log Abstraction
Replayable Event Log
An immutable, persistent log (like Kafka) that stores all events and allows replay from any offset
Lesson 752Kappa Architecture Overview
Replica connection pool(s)
Separate connection pools to one or more read replicas, used for SELECT queries
Lesson 221Application-Level Connection Management
Replica processing time
Replicas must apply the changes they receive.
Lesson 208Replication Lag: What It Is and Why It Happens
Replica promotion
is the process of elevating one of your read replicas to become the new primary database when the original primary fails.
Lesson 207Replica Promotion and Failover Basics
Replica reads
Route celebrity profile reads to dedicated read replicas with extra capacity.
Lesson 257Celebrity Problem in Social Graphs
Replicas receive updates
– Followers pull or receive the changelog
Lesson 1365Single-Leader Replication Topology
Replicas serve reads
→ Answer queries from their local copy of data
Lesson 199Primary-Replica Architecture
Replicate hot content
Copy popular files to regions with high demand
Lesson 1631Multi-Region Replication Strategy
Replicated
so each shard has backup copies
Lesson 1447Partitioning vs Sharding vs Replication
Replication mode
Asynchronous replication inherently creates lag because the primary doesn't wait for replicas to confirm receipt before completing the write.
Lesson 208Replication Lag: What It Is and Why It Happens
Replication per shard
Each shard has its own replicas (covered earlier), so a single server failure doesn't kill the shard
Lesson 266Shard Failure and Partial Outages
Reprocessing = replaying streams
from earlier offsets
Lesson 753Kappa Architecture: Single Processing Path
Reproducibility
Version-controlled pipeline definitions
Lesson 766Apache Airflow Fundamentals
Reproducing production environments locally
becomes nearly impossible.
Lesson 806Testing Complexity
Request aggregation
from multiple services into one response
Lesson 870What is an API Gateway?
Request and Response Transformation
is the gateway's ability to rewrite, reshape, and adapt messages bidirectionally.
Lesson 882Request and Response Transformation
Request arrives
at your service or API gateway (the PEP)
Lesson 941Policy Decision Points (PDP) and Enforcement Points (PEP)
Request attributes
"Only GET requests allowed from external-facing services"
Lesson 854Request-Level Authorization
Request context
(IP address, time of day)
Lesson 884Authorization and Policy Enforcement
Request count
volume passing through the breaker
Lesson 1055Circuit Breaker Observability
Request ID
Unique identifier for the incoming request
Lesson 1161Context-Rich Logging
Request IDs
Similar to user IDs—practically unbounded.
Lesson 1211Avoiding High-Cardinality Labels
Request IDs or UUIDs
each request creates a new series
Lesson 1178Metric Cardinality and Labels
Request metadata
enough details to verify it's the same request (method, endpoint, user ID)
Lesson 1004Server-Side State for Idempotency
Request quota
(requests per time window)
Lesson 1824Tiered Rate Limiting
Request rate
(requests per second)
Lesson 26Bandwidth Estimation from Data Size
Request template
by ID and locale: `template_id="order_shipped"`, `locale="es"`
Lesson 1701Template Service for Content
Request timeout
(sometimes called "read timeout" or "socket timeout") limits the total time for the entire request- response cycle *after* the connection is established.
Lesson 1088Connection Timeout vs Request Timeout
Request timeouts
ensure calls don't hang indefinitely.
Lesson 852Circuit Breaking at the Mesh Level
Request Transformation
happens before forwarding to backend services:
Lesson 882Request and Response Transformation
Request Validation
Basic checks (malformed JSON, missing headers, invalid content types) happen at the gateway, keeping invalid traffic from consuming service resources.
Lesson 876API Gateway as a Cross-Cutting Concern Hub
request-driven services
web servers, APIs, microservices, and RPC handlers.
Lesson 1190The RED MethodLesson 1265RED Method: Rate, Errors, Duration
Request-Reply
Temporary queues for synchronous-style responses
Lesson 671ActiveMQ and Traditional Enterprise Messaging
Request-response
You ask for a glass of water and wait until someone brings it
Lesson 690What is Event Streaming?
Request/response bodies by default
Logging full payloads on every request kills performance and bloats storage.
Lesson 1129What to Log vs What Not to Log
Request/response transformation
(e.
Lesson 870What is an API Gateway?
Requests are processed
(leak out) at a **constant rate**—say, 100 requests per second
Lesson 965Leaky Bucket Algorithm
Requests arrive
at unpredictable rates (bursty traffic)
Lesson 965Leaky Bucket Algorithm
Requests enter the bucket
(a queue) if there's room
Lesson 965Leaky Bucket Algorithm
Requests flow normally
– Every call goes directly to the downstream service
Lesson 1045The Three States: Closed
Requests vs Limits
Guaranteed minimums (requests) and hard caps (limits)
Lesson 1072CPU and Memory Bulkheads: Resource Quotas
RequestVote RPC
(Remote Procedure Call) to all other servers.
Lesson 621Leader Election: RequestVote RPC
RequestVote RPCs
, and the new leader has all committed entries.
Lesson 634etcd: Distributed Key-Value Store with Raft
Required external services
Can we reach the payment API, authentication service, etc.
Lesson 101Health Check Endpoints
Requirements evolve
as businesses grow.
Lesson 7The Iterative Nature of Design
Requires full chain
you can't skip intermediate backups
Lesson 1422Incremental Backup Strategy
Requires stable, unique ordering
The keyset column must be indexed and provide a deterministic sort order
Lesson 1890Keyset Pagination
Reserve buffer time
for network latency and your own processing (typically 10-20%)
Lesson 1098Per-Hop Timeout Budgets
Reserve keywords
Block system routes like `admin`, `api`, `stats`, or future features.
Lesson 1531Custom Aliases and Vanity URLs
Reserve local processing time
Subtract your expected work duration
Lesson 1119Timeout Budget Management Across Service Chains
Reservoir sampling
Maintain a fixed-size sample from an unbounded stream
Lesson 1217Sampling for Expensive Metrics
Reset behavior
They start at zero when a service restarts
Lesson 1174Counter Metrics
Resilience Improves Dramatically
If one gateway instance crashes, requests simply flow to healthy instances.
Lesson 878Stateless Gateway Design
Resilience patterns
like retries, timeouts, and circuit breaking
Lesson 838Data Plane: Sidecar Proxy Pattern
Resist over-engineering
Build for today's scale, not imaginary future scale
Lesson 1315Simplicity as a Core Value
Resist that urge
The best system designs always start with the simplest possible solution that satisfies the functional and non-functional requirements you've identified.
Lesson 34Start Simple: The Minimum Viable Design
Resolution
→ Fix code/data, then manually replay or discard messages
Lesson 687Dead Letter QueuesLesson 1270Monitoring Resolution and Retention Tradeoffs
Resolution ladder
Create multiple sizes from the same source:
Lesson 1601Video Transcoding Fundamentals
Resolvers
are the implementation functions that fulfill that contract by actually fetching or manipulating data.
Lesson 1912GraphQL Schema and ResolversLesson 1913The N+1 Query Problem in GraphQL
Resource allocation
Budget far more servers and memory for read replicas and cache layers than for write handling.
Lesson 1636Capacity Estimation: Feed Reads vs Writes
Resource attributes
Classification, owner, sensitivity level
Lesson 935Attribute-Based Access Control (ABAC) Introduction
Resource blocking
The API server is tied up during the entire fanout process
Lesson 1651Asynchronous Fanout Processing
Resource efficiency
State storage remains bounded and predictable
Lesson 1005Idempotency Time WindowsLesson 1679Hybrid Pull-Push Model
Resource ownership
(can only access their own data)
Lesson 884Authorization and Policy Enforcement
Resource protection
Prevents CPU, memory, or network exhaustion
Lesson 955What is Rate Limiting?
Resource quotas
act as bulkheads at the infrastructure level.
Lesson 1072CPU and Memory Bulkheads: Resource Quotas
Resource Tuning
Properly sizing proxy CPU/memory limits prevents resource contention
Lesson 841Data Plane: Performance and Latency Overhead
Resource types
Topics, consumer groups, clusters
Lesson 727Kafka Security: Authentication and Encryption
Resource waste
Some servers sit idle while others are overwhelmed
Lesson 1462The Uneven Distribution Problem
Resources
Database connections, queue workers, and write capacity get overwhelmed
Lesson 1649The Celebrity Problem in FanoutLesson 1901Header-Based Versioning
Respect crawl-delay
Don't hammer servers with rapid-fire requests.
Lesson 1831Robots.txt and Crawl Etiquette
Respect quiet periods
Aggregate multiple low-priority notifications before sending, preventing notification fatigue across all channels.
Lesson 1689Multi-Channel Delivery
Respect robots.txt
Some sites block automated scraping
Lesson 1538Link Preview and MetadataLesson 1826What is a Web Crawler
Response
Returns the generated paste key (or confirms custom key) and full URL.
Lesson 1546API Design for Paste Operations
Response latency
P95 or P99 request duration
Lesson 993Adaptive Rate Limiting
Response time
How quickly each server answers requests (milliseconds per request)
Lesson 92Least Response Time Algorithm
Response Transformation
happens before returning to clients:
Lesson 882Request and Response Transformation
Responsibility
Owns the protected data and grants permission to access it.
Lesson 921OAuth2 Roles: Resource Owner, Client, Server
RESTful API
You query using simple JSON over HTTP
Lesson 1150The ELK Stack: Elasticsearch
Restoration requirement
All backups in order (Day 1 → Day 2 → Day 3 → Day 4)
Lesson 1422Incremental Backup Strategy
Result consistency
across retries (may return same response or updated metadata)
Lesson 1008What Makes an API Idempotent
Resume from failure point
Execute only remaining steps
Lesson 1016Idempotency for Multi-Step Operations
Retention and Compliance
requires storing logs immutably for regulatory periods (often 90 days to 7 years), with tamper- proof guarantees.
Lesson 954Distributed Auth Audit Logging
Retention Period
How long do you keep data?
Lesson 25Storage Estimation Basics
Retention policy
Keep posts for 5 years (1,825 days)
Lesson 29Database Size Growth Projection
Retriable errors
(5xx range, timeouts):
Lesson 1018Error Handling in Idempotent APIs
Retriable failures
Timeouts, 503 Service Unavailable, network errors—suggest temporary downstream problems.
Lesson 1057Failure Detection and Counting
Retries
If a request fails, the proxy can automatically retry it (with configurable backoff strategies) without the calling service needing retry logic.
Lesson 839Data Plane: Proxy ResponsibilitiesLesson 1234Span Events and Logs
Retries with Exponential Backoff
Lesson 1656Fanout Failure Handling
Retrieval
Serve media quickly via direct links or embedded players
Lesson 1584Image/Video Hosting: Problem Definition and Scale
Retrieval tiers
Choose speed vs cost (instant, 3-5 hours, 12+ hours)
Lesson 1623Cold Storage and Archival
Retrieve
a specific document directly
Lesson 382Document IDs and Primary Keys
Retrieve pending URLs
The coordinator looks up which URLs were assigned to that worker (often stored in a "worker_id → URL_list" mapping in Redis or a database)
Lesson 1866Worker Health Monitoring and Failover
Retrieve positional postings
for both terms
Lesson 1751Phrase Queries and Positional Indexes
Retrieve template
"Hola {name}, tu pedido {orderId} ha sido enviado"
Lesson 1701Template Service for Content
Retroactive scanning
of popular pastes
Lesson 1581Abuse Prevention and Content Moderation
Retry after timeout
See key `notif_abc123` already exists with status "sent"—skip sending, return success
Lesson 1711Idempotency Keys for Notifications
Retry amplification
Clients retry after timeout, but the original request is still processing, doubling the load
Lesson 1096Why Timeouts Must Propagate
Retry and timeout policies
How proxies should handle failures
Lesson 842Control Plane: Configuration Management
Retry complexity
Operations frequently fail and need multiple attempts
Lesson 654When to Use Async vs Sync
Retry counter increments
after each failure
Lesson 687Dead Letter Queues
Retry rate
How often do fanout operations need retries?
Lesson 1657Measuring Fanout Performance
Retry requests
made in that same window
Lesson 1029Retry Budgets and Rate Limiting
Retry with timestamp
Append the current timestamp milliseconds: `hash(url + timestamp)`.
Lesson 1509Handling Hash Collisions
Return
Application closes/releases connection → Pool marks as idle → Connection rejoins the pool for reuse
Lesson 270Connection Lifecycle in a PoolLesson 355Redis as a Cache
Return immediately
(lower latency, eventual consistency)
Lesson 514Beyond CAP: The Need for PACELC
Return only documents
where positions satisfy this adjacency constraint
Lesson 1751Phrase Queries and Positional Indexes
Return partial results
(show 10 products instead of 100)
Lesson 1083Graceful Degradation Strategies
Return rendered content
"Hola Carlos, tu pedido 54321 ha sido enviado"
Lesson 1701Template Service for Content
Return that latest value
to the client
Lesson 559Strong Consistency with Quorums
Return the data
Give it back to the user
Lesson 131Cache-Aside (Lazy Loading) Pattern
Return the final result
Whether fresh or from the previous attempt
Lesson 1016Idempotency for Multi-Step Operations
Return to frontier
Those URLs are pushed back into the distributed URL frontier as if never assigned
Lesson 1866Worker Health Monitoring and Failover
Return to user
Send the content back
Lesson 1558Read Path: Cache-Aside Pattern
Returning
the top matches in milliseconds
Lesson 1730What is a Search Engine?
Returns
the connection to the pool after the query completes (not closing it)
Lesson 267What is Connection PoolingLesson 1907Gateway-Level Version Routing
Returns partial data
from available nodes
Lesson 315Basically Available: Prioritizing Uptime
Returns tier-specific information
in responses (e.
Lesson 1824Tiered Rate Limiting
Reuses
the same physical connection for subsequent requests
Lesson 267What is Connection Pooling
Revenue per request
Direct financial impact of each transaction
Lesson 1196Business vs Technical Metrics
Review eviction policies
(LRU vs LFU)
Lesson 129Cache Hit Ratio Optimization
Revocation becomes possible
Store refresh tokens in a database; you can invalidate them immediately
Lesson 915Token Expiration and Refresh Tokens
Revocation Lists (Deny Lists)
Maintain a shared, fast-lookup store (Redis, Memcached) containing revoked token IDs (the `jti` claim).
Lesson 948Token Revocation at Scale
Revocation Speed
Stateless tokens can't be instantly invalidated without introducing state
Lesson 947Distributed Session Management
Revokes certificates
– Invalidates compromised or outdated certificates instantly
Lesson 844Control Plane: Certificate Management
Revoking compromised
certificates instantly
Lesson 851Mutual TLS (mTLS) Authentication
Riak
implemented Dynamo's design nearly verbatim, including vector clocks for conflict resolution and active anti-entropy.
Lesson 378Dynamo's Influence on Modern SystemsLesson 521PACELC Tradeoffs in Real Systems
Rich context
Built-in correlation with traces and logs
Lesson 1205OpenTelemetry Metrics SDK
Rich data structures
Can efficiently support lists, sets, sorted sets in memory
Lesson 349Redis In-Memory Storage Model
Rich Observability
Built-in support for metrics, logging, and distributed tracing out of the box
Lesson 840Data Plane: Envoy Proxy Fundamentals
Risk
If your consumer crashes mid-processing, the message is already gone from the queue— **message loss** occurs.
Lesson 683Consumer Acknowledgment TimingLesson 1102Handling Zero or Negative Timeouts
Risk acceptance
Very late retries (beyond the window) might re-execute
Lesson 1005Idempotency Time Windows
Risk of stale data
Since your application code updates the database but not the cache immediately, the cache can serve outdated information until expiration (TTL) or manual invalidation.
Lesson 132Cache-Aside: Pros and Cons
RocksDB
a high-performance key-value store.
Lesson 723Kafka Streams State Stores
Role-Based Access Control (RBAC)
groups permissions into **roles**, then assigns users to those roles.
Lesson 933Role-Based Access Control (RBAC) FundamentalsLesson 1160Security and Access Control for Logs
Roll back partial transactions
to maintain consistency
Lesson 1115Deadline Exceeded Error Handling
rollback
the database uses those logs to undo every change made during the transaction, restoring the data to its pre-transaction state.
Lesson 304Transaction Atomicity in PracticeLesson 1303Incident Mitigation vs Fix
Rollback complexity
If migration fails halfway, how do you recover?
Lesson 258Resharding and Data Migration
Rollback procedures
undo if recovery fails halfway
Lesson 1441Runbooks and Automation
Rollback simplicity
Revert just the problematic service, not the entire system
Lesson 786Independent Deployability of Microservices
Rolling upgrades
Give new servers lower weights initially
Lesson 86Weighted Round Robin
Root cause analysis
The combination of error logs, context, and timing lets you pinpoint why failures occurred.
Lesson 1127What is Logging and Why It MattersLesson 1350What is a Postmortem?
root span
(no parent) represents the entry point of your request
Lesson 1232Span Relationships and HierarchyLesson 1239Root Span and Entry Points
Rotate participants
so multiple team members can execute recovery
Lesson 1438DR Testing Strategies
Rotate refresh tokens
Issue new refresh tokens on each use and invalidate old ones
Lesson 931OAuth2 Security Best Practices
Rotates certificates
– Automatically refreshes certificates before expiration (often every few hours or days) without service downtime
Lesson 844Control Plane: Certificate Management
Rotating certificates
automatically before expiration (often every 24 hours)
Lesson 851Mutual TLS (mTLS) Authentication
Rotation fairness
Distribute difficult shifts equitably; avoid perpetually assigning nights to junior engineers
Lesson 1297On-Call Fundamentals and Rotation Models
Round Robin
would send every 5th customer to desk #5, regardless of whether that desk is still helping someone from 30 minutes ago
Lesson 87Least Connections AlgorithmLesson 96Algorithm Selection TradeoffsLesson 98What Are Health Checks?
Route
requests for that feature to the new service via a proxy/gateway
Lesson 822The Strangler Fig Pattern for Migration
Route Check Request
Send the "have we seen this URL?
Lesson 1854Distributed URL Deduplication
Route Optimization
Instead of your request traveling the public internet's unpredictable path to the origin server, the CDN routes it through its private, optimized backbone network.
Lesson 186Dynamic Content Acceleration
Route the request
Direct `INCR`, Lua scripts, or sliding window operations to that specific node
Lesson 1806Rate Limiting with Consistent Hashing
Routers
automatically forward it to the "nearest" server based on BGP routing metrics
Lesson 176Geographic Routing and Anycast
Routing
The proxy determines where traffic should go based on rules from the control plane.
Lesson 839Data Plane: Proxy ResponsibilitiesLesson 1295Testing Alerts and Dry Runs
Routing complexity
across internet backbone providers
Lesson 169The Latency Problem CDNs Solve
Routing logic
Clients or proxy layers route requests to the correct node
Lesson 360What Makes a Key-Value Store Distributed
RPO
(Recovery Point Objective) and **RTO** (Recovery Time Objective) targets, you're essentially promising faster recovery with less data loss.
Lesson 1413The Cost-Availability Tradeoff
RPO and RTO requirements
guide this decision.
Lesson 1425Hot vs Cold vs Warm Backups
RPO and RTO targets
, backup procedures, failover mechanisms, and infrastructure redundancy.
Lesson 1433Disaster Recovery vs Business Continuity
RPO approaches zero
You cannot afford to lose even minutes of data
Lesson 1427Continuous Data Protection
RPO requirement
to replication mode, then verify distance works with latency constraints.
Lesson 1439Data Replication for DR
RPO Requirements
set the ceiling.
Lesson 1424Backup Scheduling and Frequency
RPO Zero
means that your Recovery Point Objective is literally zero — you cannot afford to lose *any* data, not even one millisecond's worth of writes.
Lesson 1414RPO Zero: Synchronous ReplicationLesson 1415Near-Zero RPO with Asynchronous Replication
RTO
(Recovery Time Objective) targets, you're essentially promising faster recovery with less data loss.
Lesson 1413The Cost-Availability TradeoffLesson 1417Hot Standby vs Cold Standby
RTO = days
Might accept rebuilding from archives
Lesson 1412Defining Recovery Time Objective (RTO)
RTO = hours
Could rely on manual restoration from backups
Lesson 1412Defining Recovery Time Objective (RTO)
RTO = minutes
May use active-passive with quick automated failover
Lesson 1412Defining Recovery Time Objective (RTO)
RTO = seconds
Requires active-active multi-region setups with automatic failover
Lesson 1412Defining Recovery Time Objective (RTO)
Rule of thumb
More connections doesn't mean better performance.
Lesson 275Common Pooling Anti-Patterns
Runaway storage
Storage growing faster than historical rate
Lesson 1574Monitoring Expiration and Storage Health
Runbook clarity
Can someone follow your runbook without confusion?
Lesson 1295Testing Alerts and Dry Runs
Runbook link
Direct pointer to diagnostic steps
Lesson 1293Alert Context and Enrichment

S

Sacrifice availability
(reject requests until partition heals) to maintain consistency (CP choice)
Lesson 506CAP in Normal Operation vs Partition
Sacrifice consistency
(allow divergent data) to remain available (AP choice)
Lesson 506CAP in Normal Operation vs Partition
Safe to retry
without fear of duplication
Lesson 1008What Makes an API Idempotent
Safer failover
You have a known-good replica for promotion
Lesson 217Semi-Synchronous Replication Trade-offs
Safety guarantees
so two leaders never exist simultaneously
Lesson 636Consensus for Leader Election
Safety window
Protects against common retry scenarios (network blips, client crashes, reasonable user behavior)
Lesson 1005Idempotency Time Windows
Saga
splits a distributed transaction into a sequence of **local transactions**, where each local transaction updates data within a single service.
Lesson 585Alternative: Saga Pattern IntroductionLesson 588The Saga Pattern: Motivation and DefinitionLesson 589Saga Fundamentals: Local Transactions and Compensations
Same request parameters
→ Same final system state
Lesson 1008What Makes an API Idempotent
Same-thread execution
Request runs on the caller's thread, reducing context switches
Lesson 1070Semaphore-Based Bulkheads: Limiting Concurrent Requests
Sample aggressively
Log 1 in every 100 successful requests, but capture all errors—you'll still get actionable insights without drowning in data.
Lesson 1170Performance Impact of Logging
Sample more aggressively
for high-cardinality scenarios (as covered in your sampling lessons)
Lesson 1258Cardinality Explosion
Sample size
How many requests to allow through Half-Open (small, like 5–10)
Lesson 1052Circuit Breaker Reset Logic
Sample when necessary
For debugging scenarios, sample high-cardinality data (e.
Lesson 1210Cardinality Management
Sampling decisions
determine whether a trace will be collected or discarded.
Lesson 1238Span Sampling Decisions
Sanitize before logging
by running all log data through filters that detect and mask patterns like credit card numbers or tokens.
Lesson 1131Logging Sensitive Data: Security Concerns
Saturate network bandwidth
between services
Lesson 1654Fanout Rate Limiting
Saturation percentage
(not raw resource usage) → indicates when to scale
Lesson 1215Avoiding Vanity Metrics
Scalability bottleneck
The directory can become a performance choke point
Lesson 242Directory-Based Sharding
Scale changes
as your user base grows.
Lesson 7The Iterative Nature of Design
Scale horizontally
when you hit hardware limits or need redundancy
Lesson 52Hybrid Scaling StrategiesLesson 443BigTable Overview and Motivation
Scale independently
relational metadata database can be optimized differently than object storage
Lesson 1590Metadata Database Design
Scale operations effort sublinearly
One automation handles thousands of instances
Lesson 1308The SRE Philosophy: Treating Operations as Software
Scale-Down Trigger
If queue depth drops below a minimum (e.
Lesson 1872Dynamic Scaling Based on Queue Depth
Scale-Up Trigger
If queue depth exceeds a threshold (e.
Lesson 1872Dynamic Scaling Based on Queue Depth
Scales fan-out
– Can reach thousands of nodes without overwhelming the leader
Lesson 1374Tree Replication Topology
Scaling
| Manual shard splitting/merging | Add brokers, reassign partitions |
Lesson 728AWS Kinesis Overview
Scaling challenges
Vertical scaling (more RAM) hits limits faster than horizontal disk scaling
Lesson 349Redis In-Memory Storage Model
Scaling characteristics
reveal how performance changes with growth.
Lesson 677Message Broker Performance Characteristics
Scaling complications
– Adding/removing nodes disrupts existing session assignments
Lesson 982Sticky Sessions and Rate Limiting
Scaling costs
Write-heavy workloads are harder and more expensive to scale than read-heavy ones
Lesson 296Write Amplification Costs
Scaling Limitations
You can't scale individual features independently.
Lesson 785When Monoliths Become Problematic
Scan operations
Sequential reads are fast because data is ordered
Lesson 1451Range-Based Partitioning
Scenario
Proposer A starts with proposal number 10, while Proposer B simultaneously starts with proposal number 15.
Lesson 615Handling Conflicts and Preemption
Scenario A (Real-World)
You have months, a team of specialists, detailed requirements from the homeowner, and the ability to research materials, consult experts, and iterate on your design multiple times.
Lesson 5Real-World vs Interview System Design
Scenario B (Interview)
You have 45 minutes, minimal information about what the homeowner wants, no internet access, and you must sketch the entire house from foundation to roof while explaining your reasoning out loud.
Lesson 5Real-World vs Interview System Design
Schedule dry runs
Monthly or quarterly, simulate an incident and walk through the response
Lesson 1295Testing Alerts and Dry Runs
Schedule it
give teams advance notice
Lesson 1345Starting with Game Days
Schedule regular restore drills
Monthly or quarterly full restoration tests
Lesson 1408Backup Verification and TestingLesson 1430Backup Verification and Testing
Scheduled job
scans the metadata database for old pastes
Lesson 1557Hot vs Cold Storage Tiering
Scheduled maintenance
is planned downtime where you intentionally take systems offline for upgrades, patches, or infrastructure changes.
Lesson 1328Scheduled Maintenance and Availability Accounting
Scheduled Messages
let you enqueue a message now but delay its availability until a future timestamp.
Lesson 675Azure Service Bus Features
Scheduled warming
runs during low-traffic periods (like 3 AM) to refresh or preload data without impacting peak users.
Lesson 140Cache Warming Strategies
Schema compliance
Does the JSON/XML match expected structure?
Lesson 886Request Validation
Schema enforcement
Only data matching the defined schema gets in
Lesson 764Data Governance and Quality
Schema evolution
When requirements change, just start writing documents with new fields.
Lesson 380Document Structure and Schema Flexibility
Schema ID embedding
Messages include a small schema ID instead of the full schema, saving bandwidth
Lesson 725Schema Registry and Evolution
schema-on-read
you decide how to interpret the data only when you actually read and analyze it.
Lesson 758Data Lake FundamentalsLesson 759Schema-on-Write vs Schema-on-ReadLesson 1154Alternative: Splunk Architecture
schema-on-write
you define the structure *before* loading data.
Lesson 757Data Warehouse FundamentalsLesson 759Schema-on-Write vs Schema-on-Read
Scoped appropriately
(per user, per account, etc.
Lesson 1036Request Token Generation and Management
Scopes
are string identifiers that represent specific permissions.
Lesson 930OAuth2 Scopes and Consent
SCRAM
Salted Challenge Response (more secure than PLAIN)
Lesson 727Kafka Security: Authentication and Encryption
Scrape Interval Matters
Most monitoring systems (like Prometheus) scrape metrics periodically.
Lesson 1187Rate Calculations from Counters
Scraper
Pulls metrics from configured endpoints at fixed intervals (e.
Lesson 1198Prometheus Architecture and Data Model
Scribe
Documents everything in real-time—timeline of events, decisions made, actions taken.
Lesson 1300Incident Command System (ICS)
Scriptable runbook
commands copied into shell scripts
Lesson 1441Runbooks and Automation
Scrubbing and redaction
should happen *before* data reaches your logging pipeline.
Lesson 1160Security and Access Control for Logs
Search engine
If personalized ranking fails, fall back to generic relevance scoring
Lesson 1336Graceful DegradationLesson 1730What is a Search Engine?
Search engines
use knowledge graphs to understand entities and context
Lesson 458Use Cases: Fraud Detection and Knowledge GraphsLesson 1826What is a Web Crawler
Search functionality
by name, tag, or owner
Lesson 1216Metric Documentation and Discovery
Search HFiles on disk
– If the data isn't in memory, HBase must check potentially many immutable HFiles created by previous flushes
Lesson 437HBase Read Path and Bloom Filters
Search Index
Think of this as a library catalog.
Lesson 1582Search and Discovery
Second check
You see $300 (replica B, lagging behind, showing old state)
Lesson 535Monotonic Reads
Second level (Hash)
Within each month, hash by `customer_id`
Lesson 1453Composite Partitioning
Second Normal Form (2NF)
Remove partial dependencies—non-key attributes depend on the entire primary key
Lesson 302Normalization Fundamentals
Second read
Hits Replica A (only caught up through transaction #148)
Lesson 1360Monotonic Reads Across Replicas
Secure Token Storage
Tokens stay server-side, never exposed to the browser or user device.
Lesson 922Authorization Code Flow
Security boundaries
require care.
Lesson 945Token Propagation Across Services
Security policies
Which services can talk to which, mTLS settings, authentication rules
Lesson 842Control Plane: Configuration Management
Security requirements
demanding a single enforcement point before internal systems
Lesson 879When to Introduce an API Gateway
Security requirements matter
High-security operations (financial transactions, admin actions) may skip caching entirely.
Lesson 951Caching Authorization Decisions
Seed URLs
Start with a known list (popular sites, sitemaps, user submissions)
Lesson 1732Crawling and Document CollectionLesson 1828Seed URLs and Starting Point
Segmentation
Each quality version is split into small chunks (~2-10 seconds)
Lesson 1602Adaptive Bitrate Streaming (ABR)
Segments users
by behavior patterns (morning openers, evening readers, weekday vs weekend responders)
Lesson 1729Analytics-Driven Optimization
Selection
A healthy replica is chosen (usually the one with the most up-to-date data)
Lesson 207Replica Promotion and Failover Basics
Selection logic
An algorithm to pick one server from the list (round-robin, random, least-connections, etc.
Lesson 83Client-Side Load Balancing
Selective Application
Apply heavyweight idempotency only where consequences are severe (payments, orders).
Lesson 1042Idempotency vs Performance Tradeoffs
Selective Broadcasting
The WebSocket gateway subscribes to relevant channels and pushes updates only to connected users who should see that content (based on their follow graph).
Lesson 1672WebSocket Architecture for Live Updates
Selective fan-out
Only push to highly engaged followers; others get pull-based delivery.
Lesson 1640Celebrity Problem in Push Models
Selective push
means making smart choices about *who* gets *what* updates *when*, based on user activity, relationship strength, and content importance.
Lesson 1677Selective Push Strategies
Selective Service Meshing
Apply the mesh pattern only to critical services that truly benefit—perhaps those handling payments, authentication, or high-value transactions.
Lesson 869Alternatives to Full Service Mesh
Self-contained requests
Each request includes everything needed to process it—authentication tokens, user IDs, necessary parameters
Lesson 55What Makes a Service Stateless
Self-healing automation
system detects and recovers without human intervention
Lesson 1441Runbooks and Automation
Self-healing behavior
without manual intervention
Lesson 972Adaptive Rate Limiting
Self-hosted
RabbitMQ, NATS, Redis when you need control or hybrid deployments
Lesson 676Choosing Between Message Broker TechnologiesLesson 900Open-Source vs Managed Gateway Tradeoffs
Self-hosted gateways
require you to manage infrastructure, monitoring, scaling, patching, and high availability.
Lesson 900Open-Source vs Managed Gateway Tradeoffs
Self-hosted vs managed
is your first decision.
Lesson 735Choosing a Streaming Platform
Self-managed complexity
You handle updates, monitoring, and troubleshooting (unless using managed services)
Lesson 108Hardware vs Software Load Balancers
Semantic Lock Pattern
places an application-level flag or status field on data that a saga is currently processing.
Lesson 595Semantic Lock Pattern
Semaphore Bulkheads
Limit concurrent calls without separate threads
Lesson 1075Implementing Bulkheads in Practice: Hystrix and Resilience4j
Semaphore-based bulkheads
offer a lighter alternative.
Lesson 1070Semaphore-Based Bulkheads: Limiting Concurrent Requests
Semi-synchronous
offers a middle ground.
Lesson 1364Choosing a Replication Mode
Semi-synchronous replication
sits right in the middle—it requires *at least one* replica to acknowledge the write before confirming success to the client.
Lesson 205Semi-Synchronous Replication
SendGrid
, **Amazon SES**, or **Mailgun** that handle the heavy lifting:
Lesson 1686Email Notifications
Sending notifications
Each retry sends another email
Lesson 1006Natural Idempotency vs Engineered Idempotency
Sends the message
to the broker managing that partition
Lesson 702Producers and Message Publishing
Sensitive data
Never log passwords, credit card numbers, PII, or API keys.
Lesson 1129What to Log vs What Not to Log
SensorOperator
Waits for conditions (like file arrival)
Lesson 767Airflow Operators and Executors
Sent
The notification left your system and was handed off to the channel provider (APNs, FCM, Twilio, SendGrid, etc.
Lesson 1724Notification Analytics Events
Sentinel
(Java): Alibaba's comprehensive resilience framework
Lesson 1062Circuit Breaker Libraries and Frameworks
Separate deployment pipelines
Each service has its own build, test, and release process
Lesson 791Independent Deployability
Separate Routers, Shared Services
Lesson 1904Maintaining Multiple API Versions
Separate service pools
Direct high-volume tenants to dedicated rate limiter nodes
Lesson 1823Hot Tenant Problem
Separate storage tier
Move celebrity profiles to a specialized datastore optimized for read-heavy workloads.
Lesson 257Celebrity Problem in Social Graphs
Sequence
(12 bits): counter within the same millisecond
Lesson 1511Distributed ID Generation
Sequence Numbers (Position-Based)
Lesson 212Measuring Replication Lag
sequential consistency
and **linearizability** are strong consistency models, but they differ in one critical way: **real- time guarantees**.
Lesson 524Sequential Consistency vs LinearizabilityLesson 541The Consistency Spectrum
Sequential processing per host
Each hostname queue is processed by one worker at a time
Lesson 1841Single-Host Queue Pattern
Sequential scans
When someone runs a large report that reads thousands of records once, those records flood the cache and evict frequently-used data
Lesson 151LRU-K and Advanced LRU Variants
Serializable
Strongest isolation—transactions execute as if they ran one after another, with no concurrency at all.
Lesson 312Isolation Levels and Concurrent Transactions
Serializes
the message (converts your data to bytes)
Lesson 702Producers and Message Publishing
Serve cached data
instead of fresh queries when read limits are hit.
Lesson 963Graceful Degradation with Rate Limits
Serve it
when platforms request preview data
Lesson 1538Link Preview and Metadata
Server 1
Handles prefixes `a-d`
Lesson 1764Distributed Trie Architecture
Server 2
Handles prefixes `e-m`
Lesson 1764Distributed Trie Architecture
Server 3
Handles prefixes `n-z`
Lesson 1764Distributed Trie Architecture
Server A
Weight = 3, Current connections = 6 → Score = 6/3 = **2**
Lesson 88Weighted Least Connections
Server B
Weight = 1, Current connections = 1 → Score = 1/1 = **1**
Lesson 88Weighted Least Connections
Server counts
– 847 servers → 850 or even 1,000 (you'll add buffer capacity anyway)
Lesson 32Rounding and Approximation Techniques
Server degradation
→ Switch to Least Response Time to route around slow servers
Lesson 97Dynamic Algorithm Selection
Server errors
(500, 502, 503) → count toward threshold
Lesson 1048Failure Thresholds and Detection
Server looks up session
by ID to verify authentication and retrieve user context
Lesson 909Session-Based Authentication Fundamentals
Server nodes
(your actual machines or instances)
Lesson 1458Mapping Keys and Nodes to the Ring
Server pushes
Resolver sends update through WebSocket to all subscribed clients
Lesson 1915GraphQL Subscriptions for Real-Time Data
Server registers
Maps subscription to event stream (e.
Lesson 1915GraphQL Subscriptions for Real-Time Data
Server restarts
store in a database or distributed cache, not just memory
Lesson 1004Server-Side State for Idempotency
Server sends session ID
to the client as an HTTP cookie
Lesson 909Session-Based Authentication Fundamentals
Server state required
Every instance needs access to shared session storage
Lesson 916Session vs Token Tradeoffs
Server stickiness
Users must return to the same server that has their session
Lesson 356Redis as a Session Store
Server stores session
in memory, Redis, or a database with a unique session ID
Lesson 909Session-Based Authentication Fundamentals
Server validates
credentials against the database
Lesson 909Session-Based Authentication Fundamentals
Server verification
Server hashes the received verifier and compares it to the stored challenge
Lesson 923PKCE: Proof Key for Code Exchange
server-side
(every keystroke triggers a backend query), or use a **hybrid approach**.
Lesson 1762Client-Side vs Server-Side TypeaheadLesson 1789Client-Side vs Server-Side Rate Limiting
Server-side rendering
processes syntax highlighting during paste creation, storing pre-rendered HTML with CSS classes.
Lesson 1575Syntax Highlighting and Language Detection
Server-side timeout
The maximum time a server allows itself to process a request before abandoning work
Lesson 1090Client-Side vs Server-Side Timeouts
Server-side timeouts
protect backend resources from runaway operations and ensure fair resource allocation among all callers.
Lesson 1123Client-Side vs Server-Side Timeout Enforcement
Serves slightly stale data
from a cache or replica
Lesson 315Basically Available: Prioritizing Uptime
Service → Gateway (gRPC)
→ Gateway translates → **Gateway → Client (HTTP)**
Lesson 874Protocol Translation
Service accounts
give each service a unique identity—think of them as machine users with their own credentials.
Lesson 953Service-to-Service Authentication
Service C
depends on B and experiences the same fate
Lesson 1077What is a Cascading Failure
Service discovery
A way to learn which backend servers are available (often from a registry like we'll cover later, but for now imagine a configuration file or API that lists server addresses)
Lesson 83Client-Side Load BalancingLesson 861Istio: Architecture and ComponentsLesson 1197Pull vs Push Metrics Collection Models
Service discovery data
Which instances of a service are currently available and healthy
Lesson 842Control Plane: Configuration Management
Service entry points
When a request arrives (HTTP endpoint, message consumer, RPC handler), create a new span or continue an existing trace using the incoming trace ID and span ID.
Lesson 1223Instrumentation Basics
Service exit points
When making outgoing calls (HTTP clients, message producers, database calls), create a child span and inject the trace context into the outbound request headers.
Lesson 1223Instrumentation Basics
Service Level Indicator (SLI)
is a carefully chosen quantifiable metric that measures a specific aspect of your service's quality from the user's perspective.
Lesson 1272What Are Service Level Indicators (SLIs)
service mesh
solves this by extracting all that communication logic into a separate infrastructure layer.
Lesson 827What is a Service Mesh?Lesson 1126Timeout Configuration in Service Mesh
Service Mesh Foundation
Envoy runs as a **sidecar proxy** alongside each microservice instance.
Lesson 115Envoy Proxy Architecture
Service mesh technologies
add another layer of infrastructure to handle service-to-service communication, security, and observability.
Lesson 811Infrastructure and Tooling Costs
Service meshes
automate this complexity.
Lesson 953Service-to-Service Authentication
Service name
Which component produced this log
Lesson 1161Context-Rich Logging
Service or component
(database alerts → database team)
Lesson 1292Alert Routing and Escalation
Service quality
Spot tenants experiencing high rejection rates who might need help optimizing their integration
Lesson 1825Monitoring and Analytics Per Tenant
Service Registry
The log replicated via Raft, containing all service and health data
Lesson 635Consul: Service Discovery with Raft Consensus
Service Registry Integration
The mesh's control plane connects to a service registry (like Consul, etcd, or Kubernetes' built-in registry) that maintains the current list of all healthy service instances
Lesson 832Service Discovery in a Mesh
Service tiers
Premium customers on high-performance nodes, free tier on standard infrastructure
Lesson 1452List-Based Partitioning
Service topology graphs
showing how services communicate
Lesson 846Control Plane: API and User Interface
Service unavailable
(complete outages)
Lesson 1286Symptoms vs Causes
Service versioning
to track which versions are compatible
Lesson 810Deployment Complexity
Service-to-Service (Internal)
Between different backend services within your system
Lesson 78Load Balancer Placement in Architecture
Services need message replay
New instances can catch up on historical events
Lesson 734NATS Streaming
Session
Read-your-writes within a client session
Lesson 554Consistency Model Examples in Real Systems
session affinity
) solve this by configuring your load balancer to remember which server initially handled a user's request.
Lesson 60Sticky Sessions and Load Balancer AffinityLesson 89IP Hash AlgorithmLesson 543Monotonic Reads Consistency
Session affinity (sticky sessions)
Route all requests from a user to the same replica, ensuring it has seen all their writes.
Lesson 1390Read-Your-Writes Consistency
Session consistency
(strong consistency within a user session, weaker globally)
Lesson 541The Consistency Spectrum
Session identifiers
Full session tokens (use truncated versions if needed)
Lesson 1163Avoid Logging Sensitive Data
Session management
User A's login on Server 1 must work on Server 2
Lesson 49Application Complexity Trade-offsLesson 343Time-to-Live and Expiration
Session replication delays
With session-based auth, copying session data across continents introduces lag.
Lesson 952Cross-Region Authentication
Session stickiness
Pin a user's session to one replica that has their writes.
Lesson 542Read-Your-Writes ConsistencyLesson 1360Monotonic Reads Across Replicas
Session storage
(fast lookups by session ID)
Lesson 338What is a Key-Value Store?
Session-aware routing
Tag requests with a session or user token.
Lesson 1678Read-After-Write Consistency
Sessions
group related messages together, ensuring they're processed in order by the same consumer.
Lesson 675Azure Service Bus FeaturesLesson 916Session vs Token Tradeoffs
Set and enforce SLOs
defining acceptable reliability targets and error budgets (concepts you've already learned)
Lesson 1307What is Site Reliability Engineering (SRE)?
Set baseline metrics
before making changes so you know if improvements worked
Lesson 40Measure Before Optimizing
Set expiration
Use `EXPIRE` to automatically clean up the key after the window passes
Lesson 1794Redis-Based Rate Limiting with INCR
Set membership
`_in=val1,val2,val3`
Lesson 1892Filtering Query Parameters
Set timeout
= `P(chosen percentile) × multiplier` (e.
Lesson 1117Adaptive Timeouts Based on Historical Latency
Setting a value
`SET user_status = "active"` — repeating this produces the same result
Lesson 1006Natural Idempotency vs Engineered Idempotency
Severity level
(critical → on-call engineer immediately, warning → team Slack channel)
Lesson 1292Alert Routing and Escalation
Severity tuning
Not everything warrants paging someone at 3 AM.
Lesson 1171Log Review and Alert Fatigue
SHA-256 hash
a unique cryptographic fingerprint of the file's contents.
Lesson 1591Deduplication Using Content Hashing
Shadow traffic
Route reads to old shard, writes to both, gradually shift reads
Lesson 258Resharding and Data Migration
Shallow checks
are fast, lightweight, and won't overload your infrastructure.
Lesson 102Shallow vs Deep Health Checks
Shard 0
might have copies on Server A (primary), Server B (replica 1), and Server C (replica 2)
Lesson 1770Index Replication for Availability
Shard 2
(users G-M): Primary + 2 replicas
Lesson 70Partitioning and Replication Together
Shard 3
(users N-Z): Primary + 2 replicas
Lesson 70Partitioning and Replication Together
Shard Key Immutability
(lesson 251) matters so much.
Lesson 263Shard Key Immutability Problem
shard map
is essentially a directory or lookup table that tracks which shard key values live on which physical shard.
Lesson 236Shard Mapping and RoutingLesson 1541Sharding and Database Scaling
Shard routing
often uses the primary key to determine which shard stores a record
Lesson 299Primary Keys and Entity Integrity
Shared
Multiple consumers share messages (queue-like behavior)
Lesson 731Pulsar's Unique Features
Shared Database
All components typically read from and write to the same database schema.
Lesson 779What is a Monolithic Architecture?
Shared databases
Multiple services directly querying the same database tables violates data ownership boundaries.
Lesson 824Avoiding Distributed Monoliths
Shared identifiers
50 accounts all using the same device fingerprint or billing address
Lesson 474Fraud Detection Through Pattern Matching
Shared timelines
that depend on multiple teams' sprint cycles
Lesson 808Team Coordination Overhead
Shield caches
act as a middle layer between edge and origin
Lesson 1611Multi-Tier Caching Architecture
Shipping
Manages carriers, tracking, delivery
Lesson 815Domain-Driven Design and Bounded Contexts
Shopping cart
Availability matters (don't block purchases), but you want low latency too → PA/EL with conflict resolution
Lesson 520Practical PACELC Analysis for Design DecisionsLesson 553Choosing Consistency Levels
Shopping cart services
that never want to reject an "add to cart" operation
Lesson 494AP Systems: Prioritizing Availability
Shopping cart updates
might use eventual consistency with conflict resolution, accepting temporary divergence
Lesson 488CAP as a Spectrum, Not Binary
Short circuit
A dangerous direct path allowing excessive current flow
Lesson 1044The Electrical Analogy
Short intervals
(checking every 1-2 seconds) detect failures quickly but generate lots of traffic.
Lesson 100Health Check Intervals and Timeouts
Short timeouts
(500ms) catch stuck servers quickly but may flag slow-but-healthy servers as down.
Lesson 100Health Check Intervals and Timeouts
Short TTLs (1-5 minutes)
Minimize staleness risk but still capture significant performance gains for burst access patterns.
Lesson 942Caching Authorization Decisions
Short-lived access tokens
Limit blast radius if tokens leak
Lesson 931OAuth2 Security Best Practices
Short-lived tokens
Use 5-15 minute expiration times, making clock skew less impactful relative to token lifetime.
Lesson 949Clock Skew and Token Validation
Short-Lived Tokens with Refresh
Issue access tokens with very short lifespans (5-15 minutes).
Lesson 948Token Revocation at Scale
Short-term burst limits
Apply token bucket or sliding window for seconds/minutes
Lesson 994Quota Management and Burst Allowances
Shorter intervals
(50–100ms): Better accuracy, higher Redis load
Lesson 1802Synchronization Strategies for Local Caches
Side effects happen once
(database writes, external API calls, charges)
Lesson 1008What Makes an API Idempotent
Side-by-side examples
(old vs new patterns)
Lesson 1909Client SDK Versioning and Distribution
Sidecar Interception
When Service A makes a request to "Service B," its sidecar proxy intercepts it
Lesson 832Service Discovery in a Mesh
sidecar proxies
to manage network communication, there are two ways those proxies can intercept traffic between services:
Lesson 831Transparent vs Explicit ProxyingLesson 833Polyglot Microservices SupportLesson 850Service Discovery Integration
Signal cancellation
to the underlying operation (close connections, interrupt threads, send cancel RPCs)
Lesson 1094Timeout Cancellation and Cleanup
Signal preservation
You still get representative traffic patterns and catch all critical issues
Lesson 1164Sampling for High-Volume Logs
Signature
(cryptographic proof it's authentic)
Lesson 1627Access Control and Signed URLs
Signed URLs
solve this by embedding cryptographic proof that the request is authorized and valid only for a limited time.
Lesson 1615Signed URLs and Token-Based Access
Silent failures
Users never know their data was discarded
Lesson 1381Limitations of Last-Write-Wins
Silent timeout bugs
Does your cleanup logic actually fire when timeouts trigger?
Lesson 1125Timeout Testing and Chaos Engineering
SIMD operations
Use vectorized CPU instructions to compare multiple characters simultaneously during trie traversal.
Lesson 1776Typeahead Index Optimization
Simhash
from lesson 1855: for near-duplicates (pages differing by ads/timestamps), use locality-sensitive hashing to cluster similar content and store only canonical versions.
Lesson 1870Content Storage and Deduplication
Similar to URL Shortener
You've just studied URL shorteners (lessons 1494-1541).
Lesson 1542Pastebin System Overview
Simple architecture
Fewer moving parts mean less complexity
Lesson 734NATS Streaming
Simple Architectures
When your services communicate in straightforward patterns without complex routing, retries, or circuit breaking needs, standard HTTP libraries and basic load balancers suffice.
Lesson 835When You Don't Need a Service MeshLesson 1260Cost-Benefit Analysis
Simple consistency
One source of truth for counters
Lesson 1791Single Data Center vs Distributed Setup
Simple domains
with tightly coupled business logic gain nothing from service boundaries.
Lesson 814When Complexity Outweighs Benefits
Simple hash-based
is easiest to implement and understand
Lesson 253Evaluating Sharding Strategy Tradeoffs
Simple implementation
just random number generation + DB check
Lesson 1512Random String Generation
Simple logic
– No complex fan-out background jobs
Lesson 1637Pull (Read-Time) Feed Model
Simple operations, predictable behavior
→ Single-leader
Lesson 1376Topology Selection Tradeoffs
Simple queries
Lookups by short code are fast, cacheable, and index-friendly
Lesson 1522Read-Heavy Workload and Database Scaling
Simple reads
"Get user X's preferences" — one key, one document, millisecond response.
Lesson 1721Preference Storage Strategy
Simple reasoning
The linear structure makes it easier to reason about ordering and consistency
Lesson 1362Chain Replication
Simple Recovery
Immutable files are naturally crash-safe.
Lesson 427SSTables and Immutable Storage
Simple request-response
The operation is fast and unlikely to fail
Lesson 654When to Use Async vs Sync
Simple to reason about
One source of truth for writes prevents conflicts
Lesson 71Single-Leader Replication Model
Simpler backend services
that can focus on business logic, not identity verification
Lesson 883Authentication at the Gateway
Simpler broker design
The platform doesn't need complex flow control
Lesson 697Push vs Pull Consumption Models
Simpler Clients
Mobile apps and web frontends don't need complex orchestration logic
Lesson 887API Composition and Aggregation
Simpler deployment
One build, one deploy process—leveraging the deployment simplicity advantage
Lesson 820When a Monolith is the Right Choice
Simpler implementation
– Standard in-memory algorithms work perfectly
Lesson 982Sticky Sessions and Rate LimitingLesson 1830Breadth-First vs Depth-First Crawling
Simpler networking
No service discovery needed
Lesson 1197Pull vs Push Metrics Collection Models
Simpler origin infrastructure
Your origin can potentially communicate with the CDN over plain HTTP or use simpler TLS configurations, reducing complexity.
Lesson 187SSL/TLS Termination at the Edge
Simpler recovery
Clear ownership transfer path
Lesson 1461Removing Nodes Gracefully
Simpler recovery than incrementals
You need just two backups to restore—the last full backup plus the last differential
Lesson 1404Differential Backups
Simpler Refactoring
When each service has a narrow responsibility, you can rewrite, restructure, or upgrade it without touching other services—as long as the API contract remains stable.
Lesson 797Improved Code Maintainability
Simpler service code
Services just expose a `/metrics` endpoint
Lesson 1197Pull vs Push Metrics Collection Models
Simpler services
Participants don't need to know about the overall flow
Lesson 591Orchestration-Based Sagas
Simpler tuning
One number (max permits) instead of pool sizes, queue depths, and rejection policies
Lesson 1070Semaphore-Based Bulkheads: Limiting Concurrent Requests
Simplicity matters
Small teams prefer less operational overhead than Kafka
Lesson 734NATS Streaming
Simplified Backend
Backend servers can be simpler, lighter applications without SSL libraries and certificate handling code.
Lesson 118SSL/TLS Termination at Load Balancers
Simplified CI/CD Pipeline
Your continuous integration and deployment pipeline has one clear job: build the monolith, run tests, and deploy it.
Lesson 783Deployment Simplicity: Monolith Advantage
Simplified code
just change the version identifier
Lesson 165Versioned Cache Keys
Simplified operations
Automatic upgrades, built-in monitoring, and IAM-based security reduce operational burden compared to self-hosted meshes.
Lesson 864AWS App Mesh and Cloud-Native Meshes
Simplified retention policies
Delete 2023's data by dropping its partition
Lesson 1473Range Partitioning Benefits
Simplified service code
Backend services don't need SSL libraries or certificate configuration.
Lesson 891SSL/TLS Termination
Simulate realistic failure scenarios
(region outage, database corruption, full datacenter loss)
Lesson 1419Measuring and Testing RPO/RTO Compliance
Simulate realistic traffic patterns
, not just brute-force requests.
Lesson 997Testing and Monitoring Rate Limiters
Simulating network partitions
creates the split-brain scenarios you learned about.
Lesson 1347Common Chaos Experiments
Simultaneously
, a background process fetches fresh data from the source
Lesson 162Stale-While-Revalidate Pattern
Single Codebase
All functionality exists in one repository.
Lesson 779What is a Monolithic Architecture?
Single Deployment
When you make any change—even a minor bug fix in one feature—you must rebuild and redeploy the entire application.
Lesson 779What is a Monolithic Architecture?
Single endpoint
instead of multiple versioned REST paths
Lesson 1910GraphQL Fundamentals and Query Language
Single Entry Point Pattern
solves this by placing an API gateway as the sole entry point for all client requests.
Lesson 871The Single Entry Point Pattern
Single Responsibility Principle
you've learned and respects **Bounded Contexts** from Domain-Driven Design.
Lesson 817Identifying Service Boundaries by Data Ownership
Single Shared Cache
Use a distributed cache like Redis shared by all servers.
Lesson 167Cache Coherence in Distributed Systems
Single-digit milliseconds
→ Distributed cache like Redis
Lesson 130Choosing the Right Caching Layer
Single-field indexes
accelerate queries on one field: indexing `email` speeds up lookups by email address.
Lesson 385Indexing in Document Stores
Single-Host Queue Pattern
maintains a separate queue for each hostname in your URL frontier.
Lesson 1841Single-Host Queue Pattern
Single-leader architecture
(in primary-replica setups) where writes go to one source of truth
Lesson 308Strong Consistency by Default
Single-system image
The distributed database behaves as if it's one atomic system
Lesson 484Consistency in CAP Context
Singular for instantaneous states
Gauges showing current values use singular: `memory_usage_bytes`, `active_connection_count`.
Lesson 1182Metric Naming Conventions
Sink Connectors
Read data *from* Kafka topics and push it *into* external systems (databases, data warehouses, search indexes)
Lesson 721Kafka Connect Framework
Site completion
Better for focused crawls that want to exhaust one domain completely before moving on
Lesson 1830Breadth-First vs Depth-First Crawling
Sitemap endpoints
Many sites publish XML sitemaps (`example.
Lesson 1828Seed URLs and Starting Point
Sitemap locations
Optional hints for URL discovery
Lesson 1861Robots.txt Caching and Parsing
Size Limits
Enforce maximum file sizes to prevent storage abuse and resource exhaustion.
Lesson 1592Upload Validation and Virus ScanningLesson 1599Upload Validation and Virus Scanning
Size-aware eviction
if URL metadata varies significantly
Lesson 1525Cache Eviction Policy for URL Shortener
Size-based
Keep the last 100 GB of data
Lesson 695Stream Retention and Replay
Size-based retention
Keep the most recent messages up to a certain total size (e.
Lesson 711Message Retention and Log Segments
Size-Tiered Compaction (STCS)
Lesson 428Compaction Strategies
Sizes
`<resource>_size_bytes` → `cache_size_bytes`
Lesson 1182Metric Naming Conventions
skewed access patterns
a small set of items accessed far more often than others—LFU (Least Frequently Used) shines.
Lesson 153Choosing an Eviction PolicyLesson 256Hotspots and Uneven Data Distribution
Skewed data distribution
happens when certain shard key values are far more common than others.
Lesson 234Data Distribution and HotspotsLesson 256Hotspots and Uneven Data Distribution
Skip completed steps
Don't re-charge that card
Lesson 1016Idempotency for Multi-Step Operations
Skip entirely
– Ignore images, videos, executables
Lesson 1833Content Type Detection
Skip overloaded nodes
(those at or above the bound)
Lesson 1468Bounded Loads Extension
Skip pointers
Jump over irrelevant sections of posting lists
Lesson 1741Search Latency and Response Time
Skipped steps
rushing through complex sequences
Lesson 1441Runbooks and Automation
Skips
processing if it's a duplicate
Lesson 1035Idempotency in Event Processing
Skyrocketing costs
You throw expensive hardware at problems that better design would solve cheaply
Lesson 2Why System Design Matters
SLA guarantees
on data freshness (e.
Lesson 1397Bounded Staleness Consistency
SLAs (Service Level Agreements)
are *external* contracts with customers that include financial or legal consequences when breached.
Lesson 1283SLOs vs SLAs: The Critical Difference
Sliding window counter
hybrids use two counters, doubling fixed window memory but staying manageable.
Lesson 970Fixed vs Sliding Window TradeoffsLesson 975Algorithm Selection CriteriaLesson 1813Memory Footprint per User and Limits
Sliding window counters
add weighted calculations but remain simpler than full logs.
Lesson 970Fixed vs Sliding Window Tradeoffs
Sliding window logs
require maintaining sorted timestamps, pruning old entries, and careful data structure management.
Lesson 970Fixed vs Sliding Window TradeoffsLesson 1808Redis Data Structures for Rate Limiting
Sliding windows
(log or counter) eliminate this edge case entirely.
Lesson 970Fixed vs Sliding Window TradeoffsLesson 1053Sliding Window vs Fixed Window
Slight replication lag
between regions (typically seconds)
Lesson 202Why Replicate: Geographic Distribution
SLIs
measure reliability from the user's perspective
Lesson 1313Monitoring and Observability for SRE
SLO setting
– MTBF informs realistic availability targets (remember: availability and reliability are different but related)
Lesson 1323Mean Time Between Failures (MTBF)
SLOs (Service Level Objectives)
are your *internal* targets—the reliability goals your team commits to achieving.
Lesson 1283SLOs vs SLAs: The Critical Difference
Slow for popular users
– If you follow 5,000 people, querying their posts takes time
Lesson 1637Pull (Read-Time) Feed Model
Slow page loads
(high latency)
Lesson 1286Symptoms vs Causes
Slow Query Logs
are built into most databases (MySQL, PostgreSQL, etc.
Lesson 287Monitoring Slow QueriesLesson 1777Query Performance Monitoring
Slow reads
– if you follow 1,000 people, you're querying 1,000 users' posts
Lesson 1647Fanout-on-Read (Pull Model)
Slow response times
The user waits while thousands (or millions) of feeds update
Lesson 1651Asynchronous Fanout Processing
Slower delivery
of what users actually need now
Lesson 36YAGNI: You Aren't Gonna Need It
Slower reads
Interpretation happens at query time
Lesson 759Schema-on-Write vs Schema-on-Read
Sluggish performance
Pages take 30 seconds to load instead of milliseconds
Lesson 2Why System Design Matters
Small Deployments
If you have 3-5 microservices, manually configuring communication is straightforward.
Lesson 835When You Don't Need a Service Mesh
Small images
(~few MB): Synchronous processing works well.
Lesson 1598Synchronous vs Asynchronous Processing
Small operational footprint
Easy to deploy and maintain
Lesson 734NATS Streaming
Small or Medium-Sized Datasets
Lesson 239When Not to Shard
Small scale (100 req/sec)
180ms × 100 = 18 seconds wasted/second
Lesson 276Why Query Optimization Matters at Scale
Small server
(2 cores, 4GB RAM): 50 virtual nodes
Lesson 1465Heterogeneous Node Weights
Small team
If you have 3 engineers, maintaining 3+ BFFs drains resources better spent elsewhere.
Lesson 908When to Use BFF Pattern
Smaller client payload
Just a random session ID
Lesson 916Session vs Token Tradeoffs
Smaller community
when you need help at 3 AM
Lesson 37Prefer Boring Technology
Smart Routing
Circuit breaking and locality-aware load balancing reduce unnecessary hops
Lesson 841Data Plane: Performance and Latency Overhead
Smart TV BFF
Handles remote control navigation patterns and low-resolution constraints
Lesson 902Backend-for-Frontend (BFF) Pattern Overview
Smooth migration path
When a module becomes a bottleneck or needs independent scaling, you can extract it as a microservice because the boundaries already exist.
Lesson 825Starting with a Modular Monolith
Smooth scaling
Adding capacity doesn't create hot spots
Lesson 372Consistent Hashing in Dynamo
Smoother scaling
When you add a new server, its virtual nodes "steal" small chunks from many existing servers, not just neighbors
Lesson 363Virtual Nodes and Load Distribution
Smoothing
Single-interval rates can be noisy.
Lesson 1187Rate Calculations from Counters
SMTP
(Simple Mail Transfer Protocol) is the foundational protocol for sending email.
Lesson 1686Email Notifications
Snapshots
are **instantaneous, space-efficient copies** of data at a specific moment.
Lesson 1401Backup vs Replication vs Snapshots
Snapshotting
is Raft's log compaction mechanism.
Lesson 632Log Compaction: Snapshotting
Snowflake Schema
normalizes dimensions further—breaking a customer dimension into separate customer, city, and country tables.
Lesson 760Data Warehouse Architecture
Social media feed
→ Eventual consistency (stale likes won't break anything)
Lesson 553Choosing Consistency LevelsLesson 1336Graceful Degradation
Social media metrics
If a "like" counter is temporarily off, it's not catastrophic
Lesson 137Write-Behind: Risks and Use Cases
Social recommendations
"Friends who liked this post" combines friendship and interaction edges
Lesson 457Use Cases: Social Networks and RecommendationsLesson 464Traversal Queries: Friends of Friends
Soft bounces
(temporary): Mailbox full, server temporarily down
Lesson 1686Email Notifications
Soft purge
Marks content as stale but keeps it available as a backup while fetching fresh content
Lesson 185Purging and Cache Invalidation Strategies
Soft state
Data state may change over time, even without new input
Lesson 314BASE Properties OverviewLesson 316Soft State and Eventual Consistency
Solution
Insert intermediate cache tiers between edge and origin.
Lesson 182Cache Hierarchies and Tiered Caching
Solution 1 - JOIN
Fetch everything in one query using table joins:
Lesson 282Avoiding N+1 Query Problems
Solution approaches
Use headless browsers (like Puppeteer/Selenium) that execute JavaScript, or detect and call the underlying APIs directly.
Lesson 1834HTML Parsing Challenges
Somewhere in between
(user profiles, notifications): **Semi-synchronous** offers a middle ground.
Lesson 1364Choosing a Replication Mode
Son (daily backups)
Keep the most recent 7 daily backups.
Lesson 1431Backup Retention Policies
Sorted arrays
Insertion is O(n) — too slow when millions of URLs arrive.
Lesson 1847Heap-Based Priority Queue Implementation
Sorted by document ID
for faster intersection operations during multi-term queries
Lesson 1745Posting Lists and Document IDs
Source Connectors
Pull data *from* external systems (databases, files, APIs) and write it *into* Kafka topics
Lesson 721Kafka Connect Framework
span
is one runner's segment—when they received the baton, when they passed it, and how long they ran.
Lesson 855Observability: Distributed TracingLesson 1221Traces, Spans, and Parent-Child Relationships
Span attributes
(also called **tags** or **labels**) let you attach key-value metadata to spans, transforming them from skeletal timing information into rich, searchable debugging narratives.
Lesson 1225Span Attributes and TagsLesson 1229Service Dependency Graphs
span events
are timestamped snapshots of interesting moments *within* that span.
Lesson 1226Span Events and LogsLesson 1234Span Events and Logs
Span Sampling Decisions
(lesson 1238)—the mechanism that determines whether to record a trace.
Lesson 1252Sampling Strategies Overview
Sparse posting lists
(rare terms): Delta + variable-byte encoding
Lesson 1752Index Compression Techniques
Special character handling
decide how to treat punctuation, numbers
Lesson 1733Document Processing Pipeline
Specialized regional requirements
→ Hybrid topology
Lesson 1376Topology Selection Tradeoffs
Specific
"Add timeout to database query X" not "improve monitoring"
Lesson 1352Postmortem Structure and Action Items
spectrum
, adjusting their behavior based on the operation type, business priority, and acceptable risk.
Lesson 488CAP as a Spectrum, Not BinaryLesson 507Consistency is a Spectrum in Practice
Speed counts
NATS's low latency fits real-time service-to-service communication
Lesson 734NATS Streaming
speed layer
handles only the most recent data—streaming it incrementally to provide near-instant views.
Lesson 749Lambda Architecture: Speed LayerLesson 750Lambda Architecture: Serving Layer
Speed mismatch
Producers can work faster or slower than consumers without blocking
Lesson 646The Producer-Consumer Model
Speed requirements
sub-second executive dashboards
Lesson 762Query Performance Tradeoffs
Speed vs. Consistency
Your writes are fast because you're not waiting on replicas, but there's a window where replicas are "behind" the primary.
Lesson 204Asynchronous Replication Explained
Spike detected
→ Switch to Least Connections to prevent overload
Lesson 97Dynamic Algorithm Selection
SPL (Search Processing Language)
to query logs, create dashboards, and build alerts.
Lesson 1154Alternative: Splunk Architecture
Split logically
Store celebrity data separately from regular users entirely
Lesson 1483Celebrity User Problem
split-brain
two nodes both thinking they're the leader, causing data corruption.
Lesson 603Consensus Use CasesLesson 636Consensus for Leader Election
Split-brain scenarios
need resolution mechanisms
Lesson 1338Stateless vs Stateful Redundancy
Splunk
is an enterprise-grade, proprietary platform built specifically for machine-generated data at massive scale.
Lesson 1154Alternative: Splunk Architecture
SQL Database
holds paste metadata: `paste_id`, `created_at`, `expiration_time`, `user_id`, `language`, `views`, and crucially, a **reference pointer** to where the actual content lives
Lesson 1556Hybrid Storage: Metadata + Content References
SQL databases
come with decades of battle-tested tooling, extensive documentation, and a vast pool of experienced database administrators (DBAs).
Lesson 326Operational Complexity ConsiderationsLesson 327Polyglot Persistence Pattern
SQL for management
(admin dashboards, user settings pages with complex filters) and **cache preferences in Redis/Memcached** for the hot path.
Lesson 1721Preference Storage Strategy
SSL/TLS pass-through
(load balancer forwards encrypted traffic without decrypting) or **re-encryption** (terminate at load balancer, then re-encrypt to backends).
Lesson 118SSL/TLS Termination at Load Balancers
SSTable
(Sorted String Table) is an immutable, on-disk file that stores sorted key-value pairs.
Lesson 427SSTables and Immutable StorageLesson 446SSTable and GFS Dependencies
SSTables
(Sorted String Tables)—immutable, sorted files containing key-value pairs.
Lesson 439Google BigTable ArchitectureLesson 446SSTable and GFS Dependencies
Stability
New inserts don't shift your pages; the keyset anchors your position in the dataset
Lesson 1890Keyset Pagination
Stable, well-understood systems
If your service's behavior is predictable and issues are rare, aggressive sampling or trace-on- demand strategies work better than always-on tracing.
Lesson 1260Cost-Benefit Analysis
Stack trace
– The call chain showing exactly which line of code threw the error and how execution reached it.
Lesson 1142Logging Exceptions and Stack Traces
Stackdriver Trace API
(legacy, direct integration)
Lesson 1244Google Cloud Trace
Stage 1
(`$match`): Keep only `status: "completed"` orders
Lesson 399Aggregation Pipeline
Stage 2
(`$group`): Group by `customer`, sum the `amount` fields
Lesson 399Aggregation Pipeline
stages
, each represented by an operator starting with `$`.
Lesson 399Aggregation PipelineLesson 768Apache Spark Overview
Stagger warming
Avoid overwhelming your database by spreading loads over time
Lesson 161Cache Warming Strategies
Staging environments
identical to production, where you verify changes under realistic conditions
Lesson 1314Release Engineering and Safe Deployment
Stakeholder channel
Sanitized updates for leadership, customer support, sales
Lesson 1301War Rooms and Communication Channels
Stale data is unacceptable
medical records, legal documents
Lesson 518PC/EC Systems: Consistency Always
Stale reads
from any follower: instant, but might be outdated
Lesson 640Performance Characteristics of Consensus
Stale-While-Revalidate
pattern solves a common caching dilemma: what happens when cached data expires?
Lesson 162Stale-While-Revalidate Pattern
Standard queues
prioritize throughput and availability.
Lesson 669Amazon SQS Architecture
Standardized APIs and SDKs
for creating spans, propagating context, and recording telemetry
Lesson 1240OpenTelemetry Overview
Standardized naming
Consistent metric semantics across tools
Lesson 1205OpenTelemetry Metrics SDK
Star Schema
is the simplest: one central fact table surrounded by dimension tables, like planets around a sun.
Lesson 760Data Warehouse Architecture
Start at that position
Locate where `87` sits on the 0–359 ring
Lesson 1459Clockwise Key Assignment Rule
Start vertical
when your system is new and traffic is manageable
Lesson 52Hybrid Scaling Strategies
Start with application requirements
Lesson 553Choosing Consistency Levels
Start with business requirements
What's the maximum acceptable latency for your users?
Lesson 1091Default Timeout Pitfalls
Start with consistency requirements
Lesson 1364Choosing a Replication Mode
Startup Speed
Proxies initialize in milliseconds, not seconds, enabling faster pod scaling and deployments.
Lesson 862Linkerd: Lightweight Service Mesh
Startup warming
loads critical data when your application or cache service boots up—before accepting traffic.
Lesson 140Cache Warming Strategies
Starve other operations
, making the entire system unresponsive
Lesson 1654Fanout Rate Limiting
Stat
Big numbers with sparklines (total errors in last hour)
Lesson 1200Grafana for Metrics Visualization
State changes
track transitions between healthy and unhealthy.
Lesson 107Monitoring Health Check MetricsLesson 1234Span Events and Logs
State checkpointing
resume from last known-good state
Lesson 777Workflow Orchestration Patterns
State machine replication
What's the next operation all replicas should execute?
Lesson 599What Is Distributed Consensus?
State management
Maintaining computation state across events
Lesson 744Stream Processing FrameworksLesson 756Hybrid and Modern Alternatives
State query
"Has step N completed for workflow X?
Lesson 1037Idempotency in Distributed Workflows
State Synchronization
The backup must have up-to-date state—often achieved through replication or shared storage—so users don't lose data mid-failover.
Lesson 1335Failover Mechanisms
State transition history
how often states change
Lesson 1055Circuit Breaker Observability
State transitions
Order status changes, workflow progressions, job completions.
Lesson 1129What to Log vs What Not to Log
State-based CRDTs (CvRDTs)
Replicas send their entire state to each other and merge using a commutative, associative, idempotent merge function.
Lesson 538Conflict-Free Replicated Data Types (CRDTs)Lesson 1384Conflict-Free Replicated Data Types (CRDTs)
stateful
each server must either store sessions locally (breaking horizontal scaling) or share session storage (adding complexity and latency).
Lesson 61Stateless Authentication with TokensLesson 1338Stateless vs Stateful Redundancy
Stateful algorithms
(Least Connections, Least Response Time) require tracking server metrics, adding complexity and requiring coordination between load balancer instances.
Lesson 96Algorithm Selection Tradeoffs
Stateful interactions
The bug might only appear with specific data flowing through the system
Lesson 807Debugging and Troubleshooting
Stateless algorithms
(Round Robin, Random, IP Hash) make decisions using only current request data—easier to scale and make highly available.
Lesson 96Algorithm Selection Tradeoffs
Stateless server
no queues to manage, no acknowledgments to track
Lesson 673NATS and Lightweight Messaging
Stateless servers
Any application server can handle any request since session data lives externally.
Lesson 356Redis as a Session Store
Stateless Sessions (Token-Based)
Lesson 947Distributed Session Management
Stateless verification
No database lookup needed, just cryptographic check
Lesson 916Session vs Token Tradeoffs
Statement level
Individual queries can override defaults
Lesson 285Query Timeout and Statement Limits
Static analysis
Tools that flag potential credential logging
Lesson 1163Avoid Logging Sensitive Data
Static assets
are files that remain the same for all users and don't change on every request.
Lesson 173Content Types Suited for CDNs
static content
images, videos, CSS, JavaScript, fonts—anything that doesn't change per user.
Lesson 125CDN as Edge Caching LayerLesson 130Choosing the Right Caching Layer
Static sizing pain
Pre-allocating 50 threads per bulkhead when average usage is 5 wastes memory
Lesson 1076Bulkhead Tradeoffs: Complexity and Resource Overhead
StatsD with Graphite
handles moderate loads well, while **Prometheus** excels at hundreds of thousands of time series but may need federation beyond single-datacenter deployments.
Lesson 1208Choosing a Metrics System for Your Scale
Status = completed
Return the cached result from the first request
Lesson 1013Handling In-Progress Requests
Status = in-progress
Return `HTTP 409 Conflict` or `HTTP 202 Accepted` with a message like "Request is being processed"
Lesson 1013Handling In-Progress Requests
Status updates
When a user goes idle (no heartbeat), their presence key expires automatically
Lesson 1676Presence Detection and User Status
STCS
Default, good for mixed or write-heavy workloads
Lesson 428Compaction Strategies
Steady, predictable traffic
**Leaky Bucket** or **Fixed Window**
Lesson 975Algorithm Selection Criteria
Stemming/Lemmatization
Reducing words to root forms using language rules.
Lesson 1778Multi-Language Search Support
Sticky
Minimizes partition movement during rebalancing
Lesson 716Consumer Groups and Partition Assignment
Sticky routing
Route users consistently to one region when possible, falling back to others only during failures.
Lesson 987Multi-Region Rate Limiting Challenges
Sticky Sessions (Session Affinity)
Lesson 947Distributed Session Management
Stock tickers
showing prices within seconds (not milliseconds) of market changes
Lesson 549Bounded Staleness
Stop at first node
Assign the key to the first node you encounter
Lesson 1459Clockwise Key Assignment Rule
Stop ongoing work immediately
to free resources
Lesson 1115Deadline Exceeded Error Handling
Stop word removal
Optionally drop common words like "the" or "a"
Lesson 1738Query Processing Flow
Stop Words
Common words to filter differ per language ("the," "a" in English; "le," "la" in French).
Lesson 1778Multi-Language Search Support
Stop-the-world migration
Shut down, move data, restart (rarely acceptable)
Lesson 258Resharding and Data Migration
Stop-word removal
drop common words like "the", "is", "a"
Lesson 1733Document Processing Pipeline
Storage and Indexing
A centralized database optimized for log search and retention (e.
Lesson 1148Centralized Logging Architecture
Storage bloat
with massive disk usage
Lesson 1211Avoiding High-Cardinality Labels
Storage budget
Cold storage costs ~1% of hot storage per GB.
Lesson 1165Log Retention Policies
Storage corruption
over time without errors being reported
Lesson 1430Backup Verification and Testing
Storage Cost
Replication multiplies storage requirements linearly with replicas
Lesson 947Distributed Session Management
Storage Cost Tiers
Offer permanent storage as a premium feature.
Lesson 1573Handling Never-Expiring Pastes
Storage explosion
More disk space consumed exponentially
Lesson 1207Metrics Cardinality and Performance Impact
Storage growth rate
Monitor total object storage size and metadata database size over time.
Lesson 1574Monitoring Expiration and Storage Health
Storage Patterns
monitor upload frequency, file size distributions, format preferences, and deduplication savings.
Lesson 1628Usage Analytics and Metrics
Storage requirements
You're not duplicating unchanged files repeatedly
Lesson 1403Incremental Backups
Storage savings
Automatically reclaim space from temporary content
Lesson 1565Expiration Requirements and TTL Basics
Storage scalability
Distribute terabytes or petabytes across hundreds of nodes
Lesson 1446What is Data Partitioning?
Storage upgrade
HDD → SSD for faster database reads
Lesson 43What is Vertical Scaling?
Storage utilization
Set thresholds on storage tier capacity (hot vs cold from lesson 1557).
Lesson 1574Monitoring Expiration and Storage Health
Storage vs Retention
Keeping logs forever for compliance or deep analysis requires exponentially growing storage.
Lesson 1159Log Aggregation Performance Considerations
Storage-intensive
– redundantly stores unchanged data every time
Lesson 1402Full Backups
Store
Save the raw document in a distributed storage system
Lesson 1732Crawling and Document Collection
Store in object storage
(S3, Azure Blob) and keep only metadata in the database
Lesson 1553Object Storage vs Database for Paste Content
Store in the database
alongside metadata (title, expiration, owner)
Lesson 1553Object Storage vs Database for Paste Content
Store metadata
– Record that the URL exists but don't parse it
Lesson 1833Content Type Detection
Store receipt records
in your database, linked to the original notification ID
Lesson 1693Delivery Receipt Tracking
Store the hash
(32 bytes for SHA-256) in a seen-content set
Lesson 1852Content Fingerprinting with Hashing
Store the key
(receipt number) with the operation outcome
Lesson 1011Idempotency Key Storage and Lookup
Store this metadata
alongside the short URL mapping
Lesson 1538Link Preview and Metadata
Stores only recent results
old data is discarded once the batch layer catches up
Lesson 749Lambda Architecture: Speed Layer
Stores session data
in the server's memory (user preferences, authentication tokens, shopping cart contents)
Lesson 56What Makes a Service Stateful
Storing the key locally
for potential retries
Lesson 1007Idempotency and Client Responsibilities
Strangler Fig Pattern
is named after a tropical plant that grows around a host tree, eventually replacing it.
Lesson 822The Strangler Fig Pattern for Migration
Stream
demands immediacy (seconds or less).
Lesson 746Choosing Batch vs Stream
Stream processing
is like instant messaging—every message arrives immediately, but requires constant connectivity and resources.
Lesson 746Choosing Batch vs StreamLesson 1726Aggregation and Reporting
Stream Processing Layer
All data flows through a stream processing framework that handles both real-time and historical data
Lesson 752Kappa Architecture Overview
Stream processing minimizes latency
by handling records immediately as they arrive.
Lesson 740Latency vs Throughput Tradeoffs
Streaming applications
(video calls, live audio, gaming) need the server to maintain buffers, connection quality metrics, and synchronization state for each active session.
Lesson 62When Stateful Services Are Necessary
Streamlined accepts
The leader sends accept requests for new log entries without repeating prepare, as long as it remains unchallenged
Lesson 616Multi-Paxos for Log Replication
Stress-induced mistakes
typos in critical commands
Lesson 1441Runbooks and Automation
Strict consistency needed
(financial transactions, inventory): Lean toward **synchronous** or **quorum-based replication**.
Lesson 1364Choosing a Replication Mode
Strict consistency rules
Database constraints that must never be violated
Lesson 322Transaction Requirements and Trade-offs
Strict serializability
is the marriage of both: transactions execute in a serial order *that respects real-time*, meaning if transaction T1 commits before T2 begins in wall-clock time, T1 must appear before T2 in the serial order.
Lesson 525Strict Serializability
String formatting
for timestamps, numbers, and escaped characters
Lesson 1143Performance Impact of Structured Logging
Strong ACID guarantees
that NoSQL doesn't provide
Lesson 336NewSQL Tradeoffs
Strong consistency is non-negotiable
→ Single-leader or chain replication
Lesson 1376Topology Selection Tradeoffs
Strong consistency needed
→ Database internal caching (buffer pool)
Lesson 130Choosing the Right Caching Layer
Strong consistency needed immediately
You must know the outcome before proceeding
Lesson 654When to Use Async vs Sync
Strong guarantees
(bank balances, inventory counts, distributed locks) → **CP**
Lesson 503Choosing Between CP and AP
Strong read
"Give me data only after checking all replicas"
Lesson 1398Consistency Level Per-Operation
Strong referential integrity
enforcement
Lesson 405When Joins Are Required
Strong Transactional Guarantees
Lesson 320When SQL Is the Right Choice
Strongly consistent reads
Guarantees read-your-writes, but higher latency and less availability during partitions
Lesson 554Consistency Model Examples in Real Systems
structure
of your entire graph to answer questions like "Who's most influential?
Lesson 468Graph Algorithms: PageRank and CentralityLesson 1034Database Patterns for Idempotency
Structured Format
ensures consistency.
Lesson 954Distributed Auth Audit Logging
Structured Streaming
(the newer API) can use either micro-batching or **continuous processing** for true low-latency streaming.
Lesson 769Spark Streaming and Structured Streaming
Sub-millisecond needs
→ In-memory application cache (fastest)
Lesson 130Choosing the Right Caching Layer
Subscriber 1
Email Service sends confirmation
Lesson 662Fan-Out with Pub-Sub
Subscriber 2
Inventory Service updates stock
Lesson 662Fan-Out with Pub-Sub
Subscriber 3
Analytics Service records metrics
Lesson 662Fan-Out with Pub-Sub
Subscriber 4
Loyalty Service awards points
Lesson 662Fan-Out with Pub-Sub
Subscribes
to relevant events from other services
Lesson 590Choreography-Based Sagas
Substitute variables
Pass `{name: "Carlos", orderId: "54321"}`
Lesson 1701Template Service for Content
Subtract
to get remaining budget
Lesson 1110Calculating Remaining Time
Success count
during recovery attempts
Lesson 1056Circuit Breaker State Machine
Success rates
show what percentage of health checks are passing.
Lesson 107Monitoring Health Check Metrics
Suffix notation
(popular, readable):
Lesson 1892Filtering Query Parameters
Support contract-first design
where you define the API before implementation
Lesson 1885API Documentation with OpenAPI/Swagger
Support multiple client types
(mobile, web, partners) with different needs
Lesson 882Request and Response Transformation
Supported media types
(plain text vs HTML vs rich media)
Lesson 1692Channel-Specific Formatting
Supports ranking
Store metadata (frequency, popularity) at terminal nodes to rank suggestions
Lesson 1758Trie Data Structure for Prefix Matching
Survivability
If a node fails, the system automatically promotes replicas and continues operating without data loss—true high availability without manual intervention.
Lesson 334CockroachDB and Distributed SQL
Sustained load
Ensure long-running traffic doesn't cause memory leaks or state drift
Lesson 997Testing and Monitoring Rate Limiters
sweet spot
between what users need, what you can afford, and what your system can reliably deliver.
Lesson 1276Setting Realistic SLOsLesson 1652Fanout Worker Parallelization
Switch to simpler alternatives
(serve cached data instead of fresh queries)
Lesson 1083Graceful Degradation Strategies
Switchover Logic
Once failure is detected, the mechanism updates routing (DNS, load balancer configuration, virtual IP reassignment) to point traffic to the backup.
Lesson 1335Failover Mechanisms
Symptoms vs Causes
alerting fires on what users feel, not what machines think
Lesson 1313Monitoring and Observability for SRE
Sync
Validate file format immediately, return success/error
Lesson 654When to Use Async vs Sync
Synchronization delay
Propagating counter updates takes milliseconds to seconds
Lesson 1791Single Data Center vs Distributed Setup
Synchronized deployments
If you must deploy services A, B, and C simultaneously or risk breaking things, they're a distributed monolith.
Lesson 824Avoiding Distributed Monoliths
Synchronous I/O operations
that block your application threads while waiting for logs to be written.
Lesson 1170Performance Impact of Logging
Synchronous logging
blocks your application thread until the log message is written to disk or sent over the network.
Lesson 1134Synchronous vs Asynchronous LoggingLesson 1143Performance Impact of Structured Logging
Synchronous processing
means the upload request waits until all operations complete—validation, virus scanning, thumbnail generation, and storage.
Lesson 1598Synchronous vs Asynchronous Processing
synchronous replication
waits for *all* replicas to confirm writes (slow but safe), while **asynchronous replication** doesn't wait for any confirmations (fast but risky)?
Lesson 205Semi-Synchronous ReplicationLesson 217Semi-Synchronous Replication Trade-offsLesson 509Latency: The Hidden Cost of CAPLesson 1354Synchronous vs Asynchronous ReplicationLesson 1414RPO Zero: Synchronous Replication
System identifies all followers
(from the follow graph)
Lesson 1646Fanout-on-Write (Push Model)
System Load Patterns
dictate timing.
Lesson 1424Backup Scheduling and Frequency
System resources
CPU utilization, memory pressure, connection pool saturation
Lesson 993Adaptive Rate Limiting
System understanding
Logs reveal actual runtime behavior versus what you *think* the code does.
Lesson 1127What is Logging and Why It Matters

T

Table
Structured data display for multiple series
Lesson 1200Grafana for Metrics Visualization
Table bloat
Large TEXT/BLOB columns make tables enormous, slowing down backups and maintenance
Lesson 1550Object Storage for Paste Content
Tablet servers
are the workers that manage tablets.
Lesson 445Tablet Architecture and Distribution
Tactical
interrupt-driven, not strategic
Lesson 1311Toil: The Enemy of Scale
tags
(indexed metadata like `host=server1`) and **fields** (measured values like `cpu=45.
Lesson 1203InfluxDB and Time-Series DatabasesLesson 1225Span Attributes and TagsLesson 1233Span Tags and Attributes
tail
, with each replica forwarding the write to the next.
Lesson 1362Chain ReplicationLesson 1373Chain Replication
Tail acknowledges
→ Only when tail applies the update does it respond to the client
Lesson 1373Chain Replication
tail latency
those rare but painful slow requests that hurt user experience.
Lesson 1031Hedged Requests and Speculative ExecutionLesson 1188Percentiles and Tail Latencies
Take a snapshot
Periodically, a server serializes its current state machine state (e.
Lesson 632Log Compaction: Snapshotting
Tamper-evident storage
Cryptographic proof logs weren't modified
Lesson 944Auditing and Compliance for Authorization
Task Distribution
A producer adds work items to the queue—image resize jobs, email sends, report generation, anything time-consuming.
Lesson 659Queue Use Cases: Work Distribution
TaskManagers
are the worker nodes that execute the actual stream processing logic.
Lesson 770Apache Flink Architecture
TCP Optimization
The edge server is geographically close to you, so the initial TCP connection completes quickly.
Lesson 186Dynamic Content Acceleration
Team Capabilities
Can you support multiple independent teams?
Lesson 826Decision Framework for Microservices Adoption
Team size
Small teams benefit from managed solutions.
Lesson 1251Choosing a Tracing System
Team structure supports it
You have separate frontend teams with backend capabilities who can own their BFFs end-to-end.
Lesson 908When to Use BFF Pattern
Technical Lead(s)
The hands-on problem solvers.
Lesson 1300Incident Command System (ICS)
Technology Flexibility
Different services can use different programming languages, databases, or frameworks based on what fits best for that specific capability.
Lesson 781What are Microservices?
technology heterogeneity
the freedom for each service to use different programming languages, frameworks, and databases optimized for its particular use case.
Lesson 787Technology Heterogeneity in MicroservicesLesson 792Technology HeterogeneityLesson 799Progressive Technology Adoption
Technology Lock-In
The entire system typically uses one language and framework.
Lesson 785When Monoliths Become Problematic
Template definitions
with variable placeholders like `{username}` or `{amount}`
Lesson 1701Template Service for Content
Template Service
acts as the central repository for all notification content across channels (push, email, SMS).
Lesson 1701Template Service for Content
Temporal
is a workflow orchestration platform that treats long-running sagas as durable workflows.
Lesson 598Saga Frameworks and Real-World Adoption
Temporal complexity
Events happen in distributed time, not sequential order
Lesson 807Debugging and Troubleshooting
Temporal decoupling
Services don't need to be available simultaneously
Lesson 654When to Use Async vs Sync
Temporal hotspots
occur when recent data gets disproportionate access.
Lesson 234Data Distribution and Hotspots
temporal locality
the pattern where recently accessed data is likely to be accessed again soon.
Lesson 146Least Recently Used (LRU)Lesson 153Choosing an Eviction Policy
Temporal Ordering
Every event in a stream has a strict position in time.
Lesson 692Streams vs Traditional Databases
Temporal patterns
With time-based sharding, the "current" shard (today's data) receives all writes while historical shards sit cold.
Lesson 256Hotspots and Uneven Data DistributionLesson 997Testing and Monitoring Rate LimitersLesson 1482The Hot Partition Problem
Temporary inconsistencies
between nodes
Lesson 494AP Systems: Prioritizing Availability
Temporary Storage
Messages persist in the queue until successfully consumed, surviving consumer crashes or temporary unavailability.
Lesson 647Message Queue Basics
Temporary tokens
Authentication tokens that self-destruct after a time window
Lesson 343Time-to-Live and Expiration
Temporary unavailability
Service restarting or experiencing a brief GC pause
Lesson 1020Why Retries Are Necessary in Distributed Systems
Tenant A
might use Google Workspace, **Tenant B** relies on Azure AD, and **Tenant C** manages users internally.
Lesson 932Multi-Tenant OAuth2 and Identity FederationLesson 1819Per-Tenant Configuration Storage
Tenant affinity
All requests for a tenant hit the same node and Redis shard
Lesson 1822Scaling Rate Limiter Horizontally
Tenant isolation
is enforced through namespaces and authorization rules
Lesson 860Multi-Cluster and Multi-Tenancy
Tenant-level limits
protect multi-tenant systems where one organization shouldn't affect others (e.
Lesson 973Multi-Tier Rate Limiting
Tenant-specific IdP mapping
Store configurations per tenant—which IdP they use, OAuth2 client credentials, redirect URIs, and custom claim mappings.
Lesson 932Multi-Tenant OAuth2 and Identity Federation
Term begins
→ Election starts
Lesson 620Terms: Logical Time in Raft
Term comparison first
Higher term number = more up-to-date
Lesson 627Safety: Leader Completeness Property
Term frequency (TF)
How many times the term appears in that document
Lesson 1736Posting Lists and Document IDsLesson 1740TF-IDF Scoring Fundamentals
Term-based sharding
partitions by terms themselves—each shard handles a specific range of vocabulary (e.
Lesson 1753Distributed Index Sharding
termination
(the decision completes eventually)—all while nodes crash and networks partition.
Lesson 599What Is Distributed Consensus?Lesson 608The Problem Paxos Solves
Test at different times
including off-hours when fewer experts are available
Lesson 1438DR Testing Strategies
Test different failure scenarios
(single file, full database, entire region)
Lesson 1430Backup Verification and Testing
Test realistic scenarios
Simulate actual disaster conditions, not just happy paths
Lesson 1408Backup Verification and Testing
Test under failure
Simulate downstream delays and measure resource consumption at various timeout values.
Lesson 1091Default Timeout Pitfalls
Test-on-borrow
Execute a lightweight query (like `SELECT 1`) before giving the connection to the application.
Lesson 271Connection Validation and Stale Connections
Test-on-return
Validate when returning a connection to the pool, marking bad ones for removal.
Lesson 271Connection Validation and Stale Connections
Test-while-idle
Periodically check idle connections in the background.
Lesson 271Connection Validation and Stale Connections
Testing alerts
means deliberately triggering conditions that should generate alerts, verifying they reach the right people, and confirming responders know how to act.
Lesson 1295Testing Alerts and Dry Runs
Testing and debugging
Replay production events in a test environment to reproduce issues safely.
Lesson 695Stream Retention and Replay
Testing challenges
Hard to verify failover readiness
Lesson 1436Active-Passive vs Active-Active DR
Testing overhead
to ensure nothing breaks
Lesson 328Migration and Legacy System Constraints
Thanos
extends Prometheus by uploading data blocks to cheap object storage (S3, GCS).
Lesson 1206Metrics Federation and Long-Term Storage
Then acknowledges
to the producer that the message is safely stored
Lesson 651Message Durability
Then consider latency tolerance
Lesson 1364Choosing a Replication Mode
Theoretically unlimited growth
Keep adding machines as needed
Lesson 44What is Horizontal Scaling?
They multiply load
When one service fails, retries from upstream services create even more pressure on remaining healthy instances
Lesson 1077What is a Cascading Failure
They're hard to trace
The root cause may be buried under layers of dependent failures
Lesson 1077What is a Cascading Failure
Think like this
An e-commerce shopping cart can tolerate eventual consistency (items may briefly appear/disappear across sessions).
Lesson 1399Consistency Pattern Tradeoffs in Practice
Third Normal Form (3NF)
Remove transitive dependencies—non-key attributes depend only on the primary key, not on other non-key attributes
Lesson 302Normalization Fundamentals
Third-party integrations
Your app needs to post to users' Twitter accounts without storing their Twitter passwords.
Lesson 920OAuth2 Fundamentals and Use Cases
Thread Pool Bulkheads
Fully isolated execution (via CompletableFuture)
Lesson 1075Implementing Bulkheads in Practice: Hystrix and Resilience4j
Thread pool size
(threads in use at this instant)
Lesson 1175Gauge MetricsLesson 1184Gauge Metrics
Threshold checking
– After each failure, it checks: "Have we hit our failure threshold yet?
Lesson 1045The Three States: Closed
Threshold-based
Replicate after N requests from a specific region
Lesson 1631Multi-Region Replication Strategy
Threshold-based switching
Automatically transition users to pull-based delivery once they exceed a follower threshold (e.
Lesson 1640Celebrity Problem in Push Models
throttling
may slow down or delay requests that exceed thresholds rather than rejecting them outright.
Lesson 885Rate Limiting and ThrottlingLesson 957Rate Limiting vs Throttling
Throughput increase
Multiple servers handle queries in parallel
Lesson 1446What is Data Partitioning?
Throughput Metrics
measure crawling velocity: pages crawled per second/minute, bytes downloaded, and URLs processed per worker.
Lesson 1871Monitoring Crawler Fleet Performance
Throughput Needs
With millions of daily active users (DAU), you must handle peak traffic.
Lesson 1633Non-Functional Requirements: Scale and Performance
Thumbnails
are scaled-down versions of images (e.
Lesson 1624Thumbnail and Preview Generation
Tier 1 (Critical)
RPO < 5 minutes, RTO < 15 minutes → synchronous replication, multi-region active-active
Lesson 1420Business Impact Analysis for RPO/RTO
Tier 1 (Hot)
Active users get WebSocket connections with sub-second updates
Lesson 1683Cost Optimization for Real-Time Features
Tier 1 (Local/In-Memory)
Fastest but smallest.
Lesson 143Multi-Tier Caching Pattern
Tier 2 (Distributed/Remote)
Slower but shared across services.
Lesson 143Multi-Tier Caching Pattern
Tier 2 (Important)
RPO < 1 hour, RTO < 4 hours → asynchronous replication, hot standby
Lesson 1420Business Impact Analysis for RPO/RTO
Tier 2 (Warm)
Recently active users receive Server-Sent Events (SSE) with delayed updates
Lesson 1683Cost Optimization for Real-Time Features
Tier 3 (Cold)
Inactive users get no real-time connection; they pull on next app open
Lesson 1683Cost Optimization for Real-Time Features
Tier 3 (Database/Origin)
Slowest but authoritative source.
Lesson 143Multi-Tier Caching Pattern
Tier 3 (Standard)
RPO < 24 hours, RTO < 24 hours → daily backups, cold standby
Lesson 1420Business Impact Analysis for RPO/RTO
Tier/plan
(free, premium, enterprise)
Lesson 1819Per-Tenant Configuration Storage
Tiered rate limiting
means assigning different rate limit policies based on user class—typically determined by subscription level, authentication status, or organizational size.
Lesson 990Tiered Rate Limits for Different User Classes
Tight memory budget
Use **Fixed Window Counter** (single counter) or **Leaky Bucket** (timestamp + counter)
Lesson 975Algorithm Selection Criteria
Time buckets
Minute, hour, day, week, month
Lesson 1726Aggregation and Reporting
Time elapsed
since state changes
Lesson 1056Circuit Breaker State Machine
Time is critical
and you can't afford to wait
Lesson 1021Immediate Retry vs Delayed Retry
Time of day
(route to regional on-call during business hours)
Lesson 1292Alert Routing and Escalation
Time To Live (TTL)
tells clients how long to cache a DNS response.
Lesson 116DNS-Based Load Balancing
Time travel
Replay events to see what state was at any point in the past
Lesson 691Events as First-Class Citizens
Time zone differences
A batch job in one region processes messages created hours earlier in another region
Lesson 650Temporal Decoupling
Time-Based Expiration (TTL)
concept you learned earlier, but instead of blocking on expiration, it serves stale data gracefully.
Lesson 162Stale-While-Revalidate Pattern
Time-based expiry (TTL)
Accept eventual consistency—caches expire after a set time, guaranteeing freshness within that window.
Lesson 128Cache Coherence Across Layers
Time-based retention
Keep messages for a specified duration (e.
Lesson 711Message Retention and Log Segments
Time-based row keys
are the secret weapon.
Lesson 418Time-Series and Time-Ordered Data
Time-based windows
group events by temporal boundaries:
Lesson 741Windowing in Stream Processing
Time-bound
Reads lag by at most 10 seconds
Lesson 1397Bounded Staleness Consistency
Time-consuming
– backing up terabytes fully each night may exceed your maintenance window
Lesson 1402Full Backups
Time-of-Day Patterns
If timeouts spike every morning at 9 AM, you have a capacity problem during peak load, not a random failure.
Lesson 1124Timeout Metrics and Anomaly Detection
Time-ordered results
Newest posts first without expensive sorting
Lesson 1661Timeline Schema Design
Time-series data
`sensor_id + timestamp`
Lesson 413Row Keys and Clustering
Time-series data at scale
When tracking billions of sensor readings, user events, or financial ticks with random access patterns, these systems excel.
Lesson 442When to Use HBase or BigTable
Time-series database
(InfluxDB, TimescaleDB) for efficient time-based queries
Lesson 1530Analytics and Click Tracking
Time-series logs
Shard key = `(device_id, timestamp)` → groups device logs together, spreads load across devices.
Lesson 245Composite Shard Keys
Time-to-first-byte (TTFB)
benchmarks for your target regions
Lesson 191CDN Provider Feature Comparison
Time-to-Live (TTL)
is a mechanism in key-value stores that automatically expires and deletes keys after a defined duration.
Lesson 343Time-to-Live and ExpirationLesson 1523Caching Layer ArchitectureLesson 1810Counter Expiration and TTL Management
Time-Window Compaction (TWCS)
Lesson 428Compaction Strategies
Timed drills
Measure how long restoration actually takes (crucial for meeting RTO targets)
Lesson 1408Backup Verification and Testing
Timeline
Minute-by-minute sequence of events (when alerts fired, what actions were taken, what commands were run)
Lesson 1304Blameless PostmortemsLesson 1350What is a Postmortem?Lesson 1352Postmortem Structure and Action Items
Timeline cache/DB
→ retrieved on next pull
Lesson 1677Selective Push Strategies
Timeout budget management
means intelligently dividing the *remaining time* among all downstream dependencies so every hop has a realistic chance to succeed.
Lesson 1119Timeout Budget Management Across Service Chains
Timeout Distribution by Endpoint
Different operations have different normal timeout rates.
Lesson 1124Timeout Metrics and Anomaly Detection
Timeout Duration Patterns
Are timeouts happening exactly at your configured limit?
Lesson 1124Timeout Metrics and Anomaly Detection
Timeout exceeded
← This is your early warning system
Lesson 1049Timeout as a Failure Signal
Timeout limits
External sites may be slow; cap fetch time at 5-10 seconds
Lesson 1538Link Preview and Metadata
Timeout Management
Set aggressive per-shard timeouts (e.
Lesson 1780Distributed Query Coordination
Timeout propagation failures
Do downstream timeouts respect the remaining deadline budget?
Lesson 1125Timeout Testing and Chaos Engineering
Timeout propagation overhead
itself
Lesson 1098Per-Hop Timeout Budgets
Timeout Rate
The percentage of requests that time out.
Lesson 1124Timeout Metrics and Anomaly Detection
Timeout simulation
Inject artificial delays that exceed your timeout threshold
Lesson 1065Testing Circuit Breaker Behavior
Timeout strategy
Use circuit breakers per shard.
Lesson 1780Distributed Query Coordination
Timeouts
Proxies enforce maximum wait times for responses, preventing services from hanging indefinitely when downstream dependencies slow down.
Lesson 839Data Plane: Proxy Responsibilities
Timers/Histograms
calculates percentiles, mean, max, min
Lesson 1201StatsD and Metric Aggregation Daemons
Timestamp-based keys
Log entries with `timestamp` as the partition key
Lesson 1474Hotspot Problems in Range Partitioning
Timestamp-based routing
Tag writes with timestamps.
Lesson 1359Read-Your-Writes Consistency with Replicas
Timing
Some layers may invalidate faster than others, creating temporary inconsistencies
Lesson 163Multi-Level Cache InvalidationLesson 1624Thumbnail and Preview Generation
Timing matters
Trigger the hedge too early and you waste resources; too late and you've already suffered the delay
Lesson 1031Hedged Requests and Speculative Execution
To improve availability
, you have two levers:
Lesson 1325Availability Formula: MTBF and MTTR Relationship
Together
Bulkheads limit blast radius through isolation, while circuit breakers detect failures quickly and stop wasting resources.
Lesson 1074Bulkheads vs Circuit Breakers: Complementary PatternsLesson 1354Synchronous vs Asynchronous Replication
Token claim normalization
External IdPs return different claim structures.
Lesson 932Multi-Tenant OAuth2 and Identity Federation
Token exchange
Client sends the original `code_verifier` (not the hash)
Lesson 923PKCE: Proof Key for Code Exchange
Token introspection
means the resource server asks the authorization server "Is this token still valid?
Lesson 927Token Introspection and Validation
Token Issuance
Server validates credentials and generates a signed token (often a JWT—JSON Web Token)
Lesson 912Token-Based Authentication Fundamentals
Token propagation delays
In service-to-service calls, ensure refreshed tokens cascade properly through the call chain, typically by updating headers in flight.
Lesson 946Token Refresh in Distributed Systems
Token Request
Your backend server exchanges this code for actual access tokens by making a **server-to-server request**, including your app's secret credentials (client ID + client secret).
Lesson 922Authorization Code Flow
Token revocation propagation
When you revoke a token (user logs out or password changes), that revocation must reach all regions.
Lesson 952Cross-Region Authentication
Token Scoping
Not all operations need system-wide uniqueness.
Lesson 1042Idempotency vs Performance Tradeoffs
Token validation across regions
If you use JWT tokens, every region needs access to the same signing keys.
Lesson 952Cross-Region Authentication
Tokenization
transforms unstructured text into discrete, normalized units (tokens) that can be indexed and matched against user queries.
Lesson 1734Tokenization and Text AnalysisLesson 1738Query Processing FlowLesson 1744Building an Inverted Index: TokenizationLesson 1778Multi-Language Search Support
Tolerate small skew
Add safety margins to deadlines (e.
Lesson 1114Clock Skew and Time Synchronization
Tombstones
are deletion markers that accumulate when you frequently delete data.
Lesson 432Data Modeling Best Practices
Too few labels
"Which region is slow?
Lesson 1214Tagging Strategy for Filtering
Too many labels
Millions of unique series crash your metrics system.
Lesson 1214Tagging Strategy for Filtering
Tooling compatibility
Some API frameworks and generators expect specific conventions
Lesson 1877Singular vs Plural Resource Names
Top-K pre-computation
Instead of scoring suggestions at query time, pre-compute and store the top-K results (e.
Lesson 1776Typeahead Index Optimization
Topic subscriptions with filtering
let each subscriber define rules about which messages they actually want to receive.
Lesson 658Topic Subscriptions and Filtering
Topic-based routing
Messages are categorized by topic, not destination
Lesson 656Pub-Sub Pattern Fundamentals
Total daily queries
= DAU × actions per user per day
Lesson 23QPS and Daily Active Users Estimation
Total database queries executed
Lesson 1174Counter Metrics
Total errors
encountered across all services
Lesson 1174Counter Metrics
Total messages processed
from a queue
Lesson 1174Counter Metrics
Total order
All operations appear to happen in a single, global sequence
Lesson 523Linearizability DefinedLesson 633ZooKeeper: Coordination Service Built on Consensus
Total ordering
All operations appear to happen in a strict global order
Lesson 484Consistency in CAP Context
Total per URL
~607 bytes, round to **1 KB** for indexes and overhead
Lesson 1498Storage Capacity Estimation
Total requests served
by an API endpoint
Lesson 1174Counter Metrics
Total storage
112 GB across all backups
Lesson 1422Incremental Backup Strategy
Total URLs
100M/day × 365 days × 10 years = **365 billion URLs**
Lesson 1498Storage Capacity Estimation
Total write time
10ms × 10M = ~28 hours of sequential work
Lesson 1640Celebrity Problem in Push Models
Tower of Hanoi
schemes or custom policies like:
Lesson 1431Backup Retention Policies
Trace context
is the set of identifiers that gets passed along with each request, enabling this linkage.
Lesson 1230Trace Context FundamentalsLesson 1237Baggage and Cross-Cutting ConcernsLesson 1238Span Sampling Decisions
trace IDs
(identifies the whole request) and **span IDs** (identifies individual operations).
Lesson 1219What is Distributed Tracing?Lesson 1249Integrating Traces with Logs and Metrics
Trace sampling
means intentionally recording only a subset of traces—say 1% or 0.
Lesson 1228Trace Sampling Fundamentals
Traces → Logs
When instrumenting your application, inject the current trace ID and span ID into your logging context.
Lesson 1249Integrating Traces with Logs and Metrics
Traces → Metrics
Add trace ID or service-level identifiers as metric labels (carefully, to avoid cardinality explosion).
Lesson 1249Integrating Traces with Logs and Metrics
Track
migration status to handle failures gracefully
Lesson 1572Storage Tier Migration
Track a single request
across multiple services
Lesson 1140Contextual Fields
Track access patterns
Monitor where requests originate and which files are frequently accessed
Lesson 1631Multi-Region Replication Strategy
Track client writes
The client remembers the timestamp/version of its last write and only reads from replicas caught up to at least that point.
Lesson 542Read-Your-Writes Consistency
Track metadata
Keep records of where each tablet lives and which tablet servers are active
Lesson 447Master Server and Metadata Management
Track operation status
(pending, completed, failed)
Lesson 1011Idempotency Key Storage and Lookup
Track recent latencies
in a sliding time window (e.
Lesson 1117Adaptive Timeouts Based on Historical Latency
Track remaining time
Calculate deadline minus current time
Lesson 1119Timeout Budget Management Across Service Chains
Track write timestamps
The client remembers when they last wrote.
Lesson 1390Read-Your-Writes Consistency
Tracked
Added to your project management system with deadlines
Lesson 1352Postmortem Structure and Action Items
Tracks elapsed time
as the request travels hop-to-hop
Lesson 1101Timeout Propagation in Service Meshes
Trade-off: precision vs efficiency
You lose exact values but gain efficient storage and queryability across millions of requests.
Lesson 1185Histogram Metrics
Trade-offs
More complexity, potential for stale data across tiers, and cache coherence challenges.
Lesson 143Multi-Tier Caching Pattern
Trade-offs Over Best Practices
from earlier—there's no universal "best")
Lesson 42Document Your Decisions
Traditional hashing
(modulo-based): roughly `K/N` keys move, where `K` is total keys—requires rehashing almost everything
Lesson 1460Adding Nodes with Minimal Disruption
Traditional RDBMS
(PostgreSQL, MySQL with strong settings):
Lesson 518PC/EC Systems: Consistency Always
Traditional relational databases
with synchronous replication (like PostgreSQL with synchronous standby)
Lesson 493CP Systems: Prioritizing Consistency
Traffic Filtering
Edge servers analyze incoming requests and block suspicious patterns (unusual request rates, malformed packets, known bad actors) before forwarding legitimate traffic to your origin.
Lesson 195CDN for DDoS Protection
Traffic management
to gradually shift load to new versions (blue-green, canary deployments)
Lesson 810Deployment ComplexityLesson 827What is a Service Mesh?
Traffic Manager
DNS-based global routing (similar to Route 53)
Lesson 114Cloud Load Balancers (GCP and Azure)
Traffic policies
(retries, circuit breaking) apply uniformly
Lesson 860Multi-Cluster and Multi-Tenancy
Traffic rerouting
Redirect requests away from failing components to healthy ones
Lesson 1303Incident Mitigation vs Fix
Traffic Shifts
New user requests resolve to the backup location
Lesson 1440DNS and Traffic Management in DR
Train new team members
on incident response procedures
Lesson 1345Starting with Game Days
Train teams
to respond to real incidents
Lesson 1343What is Chaos Engineering?
Train the model
Use algorithms like LambdaMART, RankNet, or gradient-boosted trees to learn which feature combinations predict clicks
Lesson 1781Machine Learning for Ranking
Training data
comes from historical engagement:
Lesson 1668Machine Learning for Feed Ranking
Transaction challenges
What if the move fails halfway through?
Lesson 263Shard Key Immutability Problem
Transaction commit
Should we commit or abort this distributed transaction?
Lesson 599What Is Distributed Consensus?
Transaction complexity
Keeping denormalized data consistent requires larger transactions or eventual consistency patterns
Lesson 296Write Amplification Costs
Transaction count
Number of operations queued
Lesson 1358Replication Lag in Async Systems
Transaction isolation levels
coordinating concurrent operations
Lesson 308Strong Consistency by Default
Transactional coupling
between message acknowledgment and side effects
Lesson 680Exactly-Once Delivery
Transactional guarantees
Depending on your broker and database, you might leverage distributed transactions (expensive) or design around eventual consistency with compensating actions (as you learned with Sagas).
Lesson 688Transactional Semantics
Transactional insert
that fails fast if the code exists
Lesson 1514Custom Short URL Support
Transactional Outbox Pattern
solves this by treating notification events as data.
Lesson 1716Transactional Outbox Pattern
Transactional semantics
means coordinating database operations and message operations so they succeed or fail together, maintaining consistency.
Lesson 688Transactional Semantics
Transactions with mixed operations
Even a `SELECT` inside a transaction that will later `UPDATE` should stay on primary for consistency
Lesson 223Detecting Read vs Write Queries
Transfer direction
(upload, download, or both)
Lesson 26Bandwidth Estimation from Data Size
Transform
field values (normalize timestamps, convert data types)
Lesson 1151The ELK Stack: Logstash
Transformations
Apply functions to each event (e.
Lesson 722Kafka Streams APILesson 768Apache Spark Overview
Transforms data structures
to match client-specific needs
Lesson 905BFF Implementation Patterns
Transient errors
are temporary hiccups that might succeed if you try again:
Lesson 1026Retry on Which ErrorsLesson 1048Failure Thresholds and Detection
transient failures
favor forward recovery, **permanent failures** require backward recovery.
Lesson 596Forward Recovery vs Backward RecoveryLesson 1020Why Retries Are Necessary in Distributed Systems
Translate
to the outgoing protocol's format
Lesson 1113Cross-Protocol Deadline Handling
Transparent proxying
automatically captures all outbound network traffic from your application without any code changes.
Lesson 831Transparent vs Explicit Proxying
transparently
you configure policies once in the control plane, and every sidecar enforces them automatically.
Lesson 852Circuit Breaking at the Mesh LevelLesson 855Observability: Distributed Tracing
Transport Layer
of the OSI networking model—where TCP and UDP protocols operate.
Lesson 109Layer 4 (Transport) Load BalancingLesson 1148Centralized Logging Architecture
Tree Replication Topology
, data replication follows a hierarchical structure similar to an organizational chart or family tree.
Lesson 1374Tree Replication Topology
trie
(pronounced "try," from re**trie**val) is a tree-like data structure where each node represents a single character of a string.
Lesson 1758Trie Data Structure for Prefix MatchingLesson 1767Personalized Typeahead
Trigger actions
based on status (retry on failure, update analytics, alert on high bounce rates)
Lesson 1693Delivery Receipt Tracking
Triggers
or application logic that update summaries when new data arrives
Lesson 294Aggregation Tables
Troubleshooting tools
to trace request paths and identify failures
Lesson 846Control Plane: API and User Interface
TrueTime API
backed by atomic clocks and GPS receivers in every Google data center.
Lesson 333Google Spanner Architecture
Truly distributed
No single point of contention for ID generation
Lesson 1520Primary Key Selection: Auto-Increment vs UUID
Trust
Clear boundaries reduce security anxiety
Lesson 930OAuth2 Scopes and Consent
TSDB
Stores time-series data efficiently on disk in compressed chunks
Lesson 1198Prometheus Architecture and Data Model
TTL balancing
Shorter TTLs (5-60 seconds) reduce staleness risk but increase PDP load.
Lesson 951Caching Authorization Decisions
TTL Support
Automatic cleanup of expired time windows
Lesson 980Redis-Based Distributed Rate Limiting
tunable consistency
(ONE, QUORUM, ALL), and configurable replication strategies.
Lesson 370Distributed Key-Value Store Architectures in PracticeLesson 563Tunable Consistency in Practice
TWCS
Ideal for time-series data with TTL expiration
Lesson 428Compaction Strategies
Two-phase commit (2PC)
distributes the transaction across multiple nodes.
Lesson 5772PC vs Single-Node TransactionsLesson 1489Cross-Partition Transactions
Type checking
Each column accepts only its declared type (integers, strings, dates, etc.
Lesson 301Schema Enforcement and Type SafetyLesson 773Prefect and Dagster for Modern Workflows
Type constraints
Is this value the right data type?
Lesson 305Consistency Guarantees
Type/Label
Describes the relationship (e.
Lesson 452Graph Model: Nodes and Edges
Typeahead
happens as the user is *still typing*.
Lesson 1757Typeahead vs Full Search
Types
Object structures with fields (e.
Lesson 1912GraphQL Schema and Resolvers
Typical range
5-20% based on your sync frequency and acceptable variance
Lesson 986Local Rate Limiting with Overage Buffers

U

Ubiquitous Language
Each bounded context uses consistent terminology understood by both developers and domain experts
Lesson 815Domain-Driven Design and Bounded Contexts
UI
web interface for visualization
Lesson 1242Zipkin Architecture and Design
Ultra-lightweight, low latency
NATS Streaming
Lesson 735Choosing a Streaming Platform
Unbounded convergence
means the system *will* converge, but provides no upper limit on how long it takes.
Lesson 533Convergence Guarantees
Unbounded Data
Streams have no defined "end"—they keep flowing.
Lesson 737What is Stream Processing?
Uncertainty
about long-term support
Lesson 37Prefer Boring Technology
Unclear boundaries
You don't yet know where service boundaries *should* be—premature splitting leads to the distributed monolith anti-pattern
Lesson 820When a Monolith is the Right Choice
Underutilization
Idle resources in one bulkhead can't help an overwhelmed bulkhead next door
Lesson 1076Bulkhead Tradeoffs: Complexity and Resource Overhead
Uneven distribution
If traffic is imbalanced, some nodes may reject requests while others sit idle
Lesson 979Centralized vs Decentralized ApproachesLesson 1451Range-Based Partitioning
Uneven load distribution
– Popular users all route to the same node, creating hotspots
Lesson 982Sticky Sessions and Rate Limiting
Unified APIs
Write your processing logic once, then run it on historical data (batch mode) or real-time streams (streaming mode)
Lesson 756Hybrid and Modern Alternatives
Unified Runtime
The application runs as one process (or a few identical copies for scaling).
Lesson 779What is a Monolithic Architecture?
Unified technology stack
Team members can move freely across the codebase without context-switching
Lesson 820When a Monolith is the Right Choice
Uniform behavior
All services retry the same way, report metrics identically
Lesson 833Polyglot Microservices Support
Uniform load distribution
is the primary benefit.
Lesson 1450Hash-Based Partitioning
Unique constraint
on the short code column in your database
Lesson 1514Custom Short URL Support
Unique constraints
Will this create a duplicate where none is allowed?
Lesson 305Consistency Guarantees
Unique Feature
GCP's load balancers use Google's global network backbone, routing traffic along optimal paths *before* reaching your servers.
Lesson 114Cloud Load Balancers (GCP and Azure)
Units
milliseconds, bytes, requests/second, percentage
Lesson 1216Metric Documentation and Discovery
Universal Parser Support
Every programming language has mature, fast JSON libraries.
Lesson 1138JSON as Log Format
Unknown failure modes
you haven't encountered yet
Lesson 37Prefer Boring Technology
Unlimited scalability
No capacity planning needed; add petabytes without provisioning volumes
Lesson 1588Object Storage vs Block Storage
Unlisted
pastes are accessible to anyone *with the URL*, but aren't indexed or listed in user profiles.
Lesson 1576Access Control and Privacy Settings
Unpredictability
Random identifiers prevent enumeration attacks
Lesson 1516Counter-Based vs UUID Approaches
Unpredictable dependencies
Third-party APIs with unstable behavior
Lesson 1076Bulkhead Tradeoffs: Complexity and Resource Overhead
Unpredictable load spikes
**Adaptive Rate Limiting**
Lesson 975Algorithm Selection Criteria
Unpredictable questions
ad-hoc queries you can't anticipate
Lesson 762Query Performance Tradeoffs
Unpredictable variance
Adding or removing a node shifts load unpredictably
Lesson 1462The Uneven Distribution Problem
Unreliable
(occasionally returns stale data due to replication lag or returns corrupt results from partial disk failures)
Lesson 1322Availability vs Reliability: Key Differences
Unsorted queues
Finding the minimum is O(n) — unacceptable per fetch.
Lesson 1847Heap-Based Priority Queue Implementation
Untested procedures
that contain logical errors
Lesson 1430Backup Verification and Testing
Update metadata
– inform the routing layer about the new boundary
Lesson 1475Dynamic Range SplittingLesson 1557Hot vs Cold Storage Tiering
Updates
invalidate cache entry when user changes settings
Lesson 1702User Preferences Lookup
Updates real-time views
in fast-access stores (Redis, Cassandra, in-memory databases)
Lesson 749Lambda Architecture: Speed Layer
Upload
Accept images (JPEG, PNG) and videos (MP4, MOV) from users
Lesson 1584Image/Video Hosting: Problem Definition and Scale
Upload bandwidth
User posts a 2 MB photo → your servers receive it
Lesson 26Bandwidth Estimation from Data Size
Upload Service
The workhorse layer.
Lesson 1585Upload Flow Architecture Overview
Uptime
is the actual duration your system was available during a measurement period.
Lesson 1318Defining Availability and Uptime
URL bloat
Every endpoint carries version prefix
Lesson 1899URI Versioning (Path-Based)
URL length
How many characters in each short code?
Lesson 1500URL Length and Encoding Constraints
URL paths
`/api/users` goes to the user service, `/api/orders` goes to the order service
Lesson 110Layer 7 (Application) Load Balancing
URL versioning
`/api/v1/orders` vs `/api/v2/orders` — explicit and visible, but creates route proliferation.
Lesson 809Versioning and Backward Compatibility
Usability through refresh tokens
Lesson 926Access Tokens vs Refresh Tokens
Usage
Application executes queries while connection stays in active state
Lesson 270Connection Lifecycle in a Pool
Usage examples
showing common queries and dashboards
Lesson 1216Metric Documentation and Discovery
Use an idempotency key
to guarantee the core operation (like order creation) happens exactly once
Lesson 1038Side Effect Management
Use async logging frameworks
that write to memory buffers and flush to disk on background threads.
Lesson 1170Performance Impact of Logging
Use base units
Prefer seconds over milliseconds, bytes over kilobytes.
Lesson 1182Metric Naming Conventions
Use case
Last 7-30 days of backups for quick restores
Lesson 1405Backup Storage TiersLesson 1439Data Replication for DR
Use consistent structure
Most teams follow a hierarchical pattern like `<namespace>_<subsystem>_<metric>_<unit>`.
Lesson 1182Metric Naming Conventions
Use profiling tools
to see which parts of your code consume the most resources
Lesson 40Measure Before Optimizing
Use query timeouts
Cancel queries exceeding a threshold (e.
Lesson 1897Performance Considerations and Limits
Use relative timeouts
Propagate "seconds remaining" instead of absolute timestamps when possible
Lesson 1114Clock Skew and Time Synchronization
Use span events
for contextual data instead of creating new tag dimensions
Lesson 1258Cardinality Explosion
Use stale data
temporarily (yesterday's inventory counts)
Lesson 1083Graceful Degradation Strategies
Use testing modes
Many alerting systems support "test alerts" that don't page anyone
Lesson 1295Testing Alerts and Dry Runs
Use that budget
for downstream calls and local operations
Lesson 1110Calculating Remaining Time
Use version/logical clocks
Pass a version token with writes; only read from replicas that have processed at least that version.
Lesson 1390Read-Your-Writes Consistency
Use when
You have a single data center and simple infrastructure.
Lesson 424Replication Strategy and FactorLesson 1425Hot vs Cold vs Warm Backups
User A
inserts "Hi" at position 0
Lesson 1385Operational Transformation
User activity
`user_id + event_date`
Lesson 413Row Keys and Clustering
User affinity
(how often you interact with a specific user)
Lesson 1644Feed Personalization and Ranking Requirements
User authentication
Mixed strategy (PA/EC): stay available during partitions but maintain consistency during normal ops for security
Lesson 520Practical PACELC Analysis for Design Decisions
User Authorization
Your app redirects the user to the authorization server (e.
Lesson 922Authorization Code Flow
User B
(unaware) inserts "Yo" at position 0
Lesson 1385Operational Transformation
User behavior patterns
What you've engaged with historically
Lesson 1665Feed Ranking Fundamentals
User creates a post
→ triggers fanout process
Lesson 1646Fanout-on-Write (Push Model)
User engagement
Click-through rates, dwell time (how long users stay), and bounce rates reveal what actually satisfies users.
Lesson 1755Relevance Tuning: Boosting and Signals
User Expectations
Start by understanding what actually impacts your users.
Lesson 1276Setting Realistic SLOsLesson 1565Expiration Requirements and TTL Basics
User experience
(faster load times, less buffering)
Lesson 1621Compression and Format Optimization
User experience suffers
requests time out or return errors
Lesson 105Graceful Degradation and Circuit Breaking
User features
your past likes, follows, typical engagement time
Lesson 1668Machine Learning for Feed Ranking
User feedback
Direct customer complaints or support tickets reveal what *actually* frustrates users.
Lesson 1284Iterating on SLIs and SLOs
User History
If you've searched for "coffee beans" ten times this month, that phrase gets a ranking boost when you type "coff" — even if globally it's less popular than "coffee shop.
Lesson 1767Personalized Typeahead
User installs your app
→ App requests notification permission
Lesson 1684Push Notifications: Mobile and Web
User intent
Merge non-overlapping changes; prioritize certain fields
Lesson 1383Application-Level Conflict Resolution
User logs in
with credentials (username/password)
Lesson 909Session-Based Authentication Fundamentals
User Profile
team deploys avatar updates Thursday afternoon—completely independently.
Lesson 791Independent Deployability
User profile API
Don't cache (or cache per-user with vary headers)
Lesson 194CDN for API Acceleration
User profile reads
(AP): Show slightly stale data from local region rather than wait 200ms for cross-ocean consistency check
Lesson 510Real Systems: Multi-Region Trade-offs
User profile updates
Strong consistency (W=ALL, R=ONE) ensures no stale reads after writes
Lesson 563Tunable Consistency in Practice
User-agent rules
Which paths are `Disallow`ed or `Allow`ed for your crawler
Lesson 1861Robots.txt Caching and Parsing
User-class segmentation
Premium tiers may need higher thresholds than free users
Lesson 997Testing and Monitoring Rate Limiters
User-level limits
prevent individual users from monopolizing resources (e.
Lesson 973Multi-Tier Rate Limiting
User-specific data
(session info, personalized content) → Application-level cache or distributed cache
Lesson 130Choosing the Right Caching Layer
User-specified language
is simpler: let users tag their paste explicitly.
Lesson 1575Syntax Highlighting and Language Detection
User-to-Role Assignment
Mapping which users have which roles
Lesson 933Role-Based Access Control (RBAC) Fundamentals
UserInfo endpoint
an API that returns extended profile information when presented with a valid access token.
Lesson 929ID Tokens and the UserInfo Endpoint
Users don't notice
The difference between 99.
Lesson 1310Embracing Risk: The 100% Availability Trap
Users expect responsiveness
a loading spinner that never completes is worse than slightly stale data
Lesson 532Why Eventual Consistency Exists
Users request
these assets via a CDN URL (e.
Lesson 192CDN for Static Asset Delivery
UUIDs
offer global uniqueness and easy distribution but create much longer identifiers.
Lesson 1516Counter-Based vs UUID ApproachesLesson 1520Primary Key Selection: Auto-Increment vs UUID

V

Vacuum operations
in PostgreSQL or compaction in NoSQL to reclaim disk space
Lesson 1532Expiration and Time-to-Live
Valid
Gateway may add user context (user ID, roles) to request headers and forwards to appropriate service
Lesson 883Authentication at the Gateway
Validate assumptions
about how your system behaves during failures
Lesson 1343What is Chaos Engineering?Lesson 1345Starting with Game Days
Validate format
Ensure it's alphanumeric, appropriate length, no special characters (or define allowed patterns)
Lesson 1514Custom Short URL Support
Validate input
Enforce length limits, character restrictions (alphanumeric only?
Lesson 1531Custom Aliases and Vanity URLs
Validate parameter combinations
Reject requests where sort fields aren't in allowed filter contexts, preventing expensive full-table scans.
Lesson 1896Combining Pagination, Filtering, and Sorting
Validate redirect URIs strictly
Don't allow wildcards; exact-match registered URIs only
Lesson 931OAuth2 Security Best Practices
Validate sort fields
Only allow sorting on indexed columns; reject arbitrary field names
Lesson 1897Performance Considerations and Limits
validity
(the agreed value was actually proposed by someone), and **termination** (the decision completes eventually)—all while nodes crash and networks partition.
Lesson 599What Is Distributed Consensus?Lesson 608The Problem Paxos Solves
Value frequency
describes how often each key value appears.
Lesson 1491Data Skew and Cardinality Issues
Value ranges
Does a number fall within acceptable limits?
Lesson 886Request Validation
Values
can be anything: strings, JSON objects, binary data, or serialized structures
Lesson 338What is a Key-Value Store?
Vanity metrics
lack decision context:
Lesson 1215Avoiding Vanity Metrics
Variable latency patterns
When percentile metrics (p99, p95) spike but you can't pinpoint the cause, traces reveal the specific code paths or service interactions responsible.
Lesson 1260Cost-Benefit Analysis
Variable request duration
(some queries take 10ms, others 10 seconds)
Lesson 87Least Connections Algorithm
Variable-length paths
let you specify a range of relationship traversals in a single pattern using special syntax like `[*1.
Lesson 465Variable-Length Paths
Velocity anomalies
Single entity involved in unusually many transactions in short time
Lesson 474Fraud Detection Through Pattern Matching
Vendor Flexibility
Swap providers with configuration changes, not code rewrites
Lesson 1690Channel Provider Abstraction
Vendor portability
Switch backends without reinstrumentation
Lesson 1205OpenTelemetry Metrics SDK
Vendor support
Professional support contracts and guarantees
Lesson 108Hardware vs Software Load Balancers
Verification time
Confirming the system is healthy again
Lesson 1324Mean Time To Repair (MTTR)
Verify data completeness
by checking how much data was lost
Lesson 1419Measuring and Testing RPO/RTO Compliance
Verify each batch
before proceeding to the next
Lesson 265Schema Changes in Sharded Environments
Version compatibility matrices
documenting which versions work together
Lesson 810Deployment Complexity
Version control strategies
(MVCC implementations differ)
Lesson 582Transaction Isolation Across Systems
Version mismatches
between backup creation and restore environments
Lesson 1430Backup Verification and Testing
version vector
(also called vector clock) is like a scoreboard where each replica tracks how many writes it has seen from every replica in the system.
Lesson 562Version Vectors and Conflict DetectionLesson 1382Version Vectors and Causality
Version vectors
Track what version the client has seen; reject reads from lagging replicas
Lesson 535Monotonic ReadsLesson 559Strong Consistency with QuorumsLesson 1382Version Vectors and Causality
Version Vectors/Timestamps
Your application tracks which version of data it's working with.
Lesson 219Application-Level Consistency Patterns
Version-based
"Reads lag by at most K updates behind the primary"
Lesson 549Bounded Staleness
Version-bound
Reads lag by at most 100 write operations
Lesson 1397Bounded Staleness Consistency
Versioning
is the practice of explicitly marking API changes, and **backward compatibility** means new versions still support old clients.
Lesson 809Versioning and Backward CompatibilityLesson 1015Conditional Writes for Idempotency
Vertical Partitioning
means splitting your table by **columns**.
Lesson 231Vertical Partitioning vs Horizontal Partitioning
vertical scaling
(upgrading a single machine) and **horizontal scaling** (adding more machines), you're also choosing between two very different cost models.
Lesson 45Comparing Cost StructuresLesson 229What is Sharding?
VictorOps
centralize alerting, escalation, and communication—but the real power comes from *automation*.
Lesson 1305On-Call Tooling and Automation
Video streaming
services managing active playback sessions
Lesson 56What Makes a Service Stateful
Video transcoding
produces multiple bitrates (covered previously)
Lesson 1602Adaptive Bitrate Streaming (ABR)
View and Engagement Metrics
track video plays, view duration, completion rates, and pause/skip patterns.
Lesson 1628Usage Analytics and Metrics
Virtual Network integration
for private connectivity to backend services
Lesson 899Azure API Management Features
Virtual nodes (vnodes)
split each physical server into multiple "virtual" positions on the hash ring.
Lesson 363Virtual Nodes and Load Distribution
Visibility
See the entire pipeline status at a glance
Lesson 766Apache Airflow Fundamentals
Visual snapshots
Links to pre-filtered dashboard views or embedded graphs
Lesson 1293Alert Context and Enrichment
Voice calls
Very expensive (~$0.
Lesson 1694Channel Costs and Economics
Volatility risk
Without persistence, server restart = data loss
Lesson 349Redis In-Memory Storage Model
Volume
Each service generates its own logs, overwhelming traditional search tools
Lesson 807Debugging and TroubleshootingLesson 1257Storage and Retention Costs
Volume of writes
Heavy write traffic creates a backlog.
Lesson 208Replication Lag: What It Is and Why It Happens
Vote YES or NO
and send that vote back to the coordinator
Lesson 570Phase 1: Prepare Phase

W

W (write quorum)
How many replicas must acknowledge a write
Lesson 1361Quorum-Based Replication
W = 3
(3 nodes must confirm writes)
Lesson 555What is a Quorum?
Wait for acknowledgments
from a quorum or all replicas
Lesson 526The Cost of Strong Consistency
Wait for expiration
(let TTL handle it)
Lesson 155Cache Invalidation Problem
Waits for confirmation
from that service
Lesson 591Orchestration-Based Sagas
Wall-clock time
(system time) can jump backward due to NTP corrections, leap seconds, or admin changes.
Lesson 1114Clock Skew and Time Synchronization
Warehouses
Compression is balanced with indexing and columnar organization for fast scans.
Lesson 763Cost and Storage Efficiency
warm standby
might be provisioned but not actively running, requiring startup and sync time (minutes to hours RTO).
Lesson 1417Hot Standby vs Cold StandbyLesson 1443DR Cost Optimization
Warm storage (7-30 days)
Retain only high-value traces: errors, slow requests (P99 latency), or specific user-flagged transactions.
Lesson 1246Trace Data Retention Policies
Warm tier
Older logs (days to weeks) moved to slower, cheaper storage.
Lesson 1156Indexing Strategies and RetentionLesson 1620Storage Tiering for Cost Optimization
Warm tier (8-90 days)
Compressed storage with slower query performance.
Lesson 1165Log Retention Policies
Warm up caches
preemptively with popular data
Lesson 129Cache Hit Ratio Optimization
WARN
Potentially problematic situations that don't stop execution
Lesson 1141Log Levels in Structured Logs
Warning (P2/P3)
These signal degraded performance or impending problems that need attention within hours, not minutes.
Lesson 1291Alert Severity Levels
Wasted capacity
Standby resources provide no value during normal operation
Lesson 1436Active-Passive vs Active-Active DR
Wasted development time
on unused features
Lesson 36YAGNI: You Aren't Gonna Need It
Wasted effort
computing feeds for inactive users who may never read them
Lesson 1638Push (Write-Time) Feed Model
Wasted resources
Redundant work computing the same result hundreds of times
Lesson 159Cache Stampede Problem
Watch for changes
Get notified when data updates
Lesson 633ZooKeeper: Coordination Service Built on Consensus
Weather API
Cache for 30 minutes per location
Lesson 194CDN for API Acceleration
Web BFF
Aggregates data for rich dashboards and complex UI requirements
Lesson 902Backend-for-Frontend (BFF) Pattern Overview
Web dashboard
requests `/user/profile` → Gateway aggregates user data + recent activity from two services into one enriched response
Lesson 875Client-Specific API Composition
Web Push
Uses VAPID keys and service workers.
Lesson 1684Push Notifications: Mobile and Web
Web servers
are the classic example.
Lesson 51When to Choose Horizontal Scaling
Web tier
5-10 second intervals, 2-3 second timeouts
Lesson 100Health Check Intervals and Timeouts
WebP
or **AVIF** instead of JPEG (30–50% smaller)
Lesson 1621Compression and Format Optimization
WebSocket
Bidirectional, real-time streaming
Lesson 874Protocol Translation
WebSocket → HTTP
Long-lived WebSocket connections from clients translate to individual HTTP requests per message
Lesson 881Protocol Translation
WebSocket Gateway
A specialized service layer that maintains open connections with active users.
Lesson 1672WebSocket Architecture for Live Updates
WebSocket/long-polling
Push updates instantly (resource-intensive)
Lesson 1671Real-Time Requirements for Social Feeds
Weekly backups
(fathers): Keep 4–8 weeks
Lesson 1406Backup Retention Policies
Weighted Least Connections
shine.
Lesson 96Algorithm Selection Tradeoffs
Weighted Round Robin
solves this by assigning each server a weight that reflects its capacity.
Lesson 86Weighted Round RobinLesson 88Weighted Least ConnectionsLesson 96Algorithm Selection Tradeoffs
Well-understood patterns
Decades of proven operational experience
Lesson 71Single-Leader Replication Model
What is wrong
(The symptom, not just a metric name)
Lesson 1287Actionability: Every Alert Needs a Runbook
What Went Well
Effective responses and mitigations during the incident
Lesson 1350What is a Postmortem?
What's the acceptable compromise
Can users tolerate 100ms extra latency if it means 99.
Lesson 18Prioritizing Requirements Under Constraints
What's the business priority
A startup might prioritize fast launch over perfect reliability—you can improve later.
Lesson 18Prioritizing Requirements Under Constraints
When assigning a key
, walk clockwise around the hash ring as usual
Lesson 1468Bounded Loads Extension
When combined with authentication
The short URL just redirects; actual access requires login
Lesson 1515Short URL Predictability Tradeoffs
When to escalate
If the incident isn't resolved within the SLO response time for that severity, or if the on-call engineer needs expertise they don't have.
Lesson 1298Incident Severity Levels and Escalation
Where to cache
At the Policy Enforcement Point (PEP) in each service, at the API Gateway, or in a shared cache like Redis.
Lesson 951Caching Authorization Decisions
Whisper
The fixed-size database format that stores time-series data on disk
Lesson 1202Graphite Time-Series Database
Whitebox Monitoring
instruments your system internally, exposing detailed metrics, logs, and traces.
Lesson 1266Blackbox vs Whitebox Monitoring
Who to page
Start with the primary on-call for the affected service.
Lesson 1298Incident Severity Levels and Escalation
Why does it matter
(Impact on users or business)
Lesson 1287Actionability: Every Alert Needs a Runbook
Why eventual consistency exists
is the direct answer to this constraint.
Lesson 532Why Eventual Consistency Exists
Why graphs shine here
Fraudsters often create rings of fake accounts that share subtle connections—same phone number, overlapping IP addresses, or circular fund transfers.
Lesson 458Use Cases: Fraud Detection and Knowledge Graphs
Why it matters
Crash-stop failures require simpler algorithms (like Raft or Paxos).
Lesson 602Crash-Stop vs Byzantine Failures
Why it works
Indexes are sorted and compressed, making scanning and grouping much faster.
Lesson 284Aggregation Query Optimization
Why this helps
Two replicas can compare root hashes first.
Lesson 369Anti-Entropy and Merkle Trees
Why this matters
All URLs from `example.
Lesson 1865Distributed URL Frontier Architecture
Wide rows
occur when a single partition key accumulates too many clustering columns—imagine millions of columns in one row.
Lesson 432Data Modeling Best Practices
Wide-column stores
(like Cassandra, HBase) organize data by row keys with sparse, flexible columns.
Lesson 419Wide-Column vs Document Stores
WiredTiger
as its default storage engine since version 3.
Lesson 400Storage Engines: WiredTiger
With correlation IDs
, you can filter all logs by that ID and see the complete story: the authentication attempt, the slow database query that caused it, and how it cascaded to the order service.
Lesson 1132Correlation IDs and Request Tracing
With Origin Shield
All 500 stores call one regional warehouse (shield), which calls the factory once → 1 request hits the factory, warehouse serves the 500 stores
Lesson 1614Origin Shield Pattern
Without Origin Shield
500 retail stores (edge nodes) each call the factory (origin) directly for the same product → 500 requests hit the factory
Lesson 1614Origin Shield Pattern
Without shielding
, every edge location missing content would hit your origin directly.
Lesson 1611Multi-Tier Caching Architecture
Work Distribution
In distributed crawlers, the frontier distributes URLs across multiple worker machines efficiently.
Lesson 1838URL Frontier: Definition and Purpose
Worker Health
includes CPU/memory utilization, active connections, and heartbeat signals.
Lesson 1871Monitoring Crawler Fleet Performance
Worker Parallelization
Deploy multiple identical worker instances that all pull from the same queue.
Lesson 1708Scalability and Horizontal Expansion
Worker Pools
Multiple identical worker processes run in parallel.
Lesson 659Queue Use Cases: Work Distribution
Worker utilization
Are your fanout workers idle or maxed out?
Lesson 1657Measuring Fanout Performance
Worker-Level DNS Cache
Each crawler worker maintains its own DNS cache (building on the caching strategies from lesson 1857).
Lesson 1869Scaling DNS Resolution
Workers
Multiple independent machines that receive URL assignments, perform HTTP requests, parse content, extract links, and report results back to the coordinator.
Lesson 1863Coordinator-Worker Pattern for Crawling
Working set size
Fewer documents fit in RAM, reducing cache effectiveness
Lesson 409Data Size and Storage Considerations
Works behind firewalls
or NAT without inbound connectivity
Lesson 1197Pull vs Push Metrics Collection Models
Works with retries
Combines well with retry budgets and circuit breakers you've already learned
Lesson 1031Hedged Requests and Speculative Execution
Write amplification
A single post triggers millions of database writes
Lesson 1640Celebrity Problem in Push ModelsLesson 1649The Celebrity Problem in Fanout
Write arrives
→ Client sends write to the **head**
Lesson 1373Chain Replication
Write back
the resolved version, which becomes the new authoritative state
Lesson 377Eventual Consistency and Application Reconciliation
Write bottleneck
– The single leader can become overwhelmed
Lesson 1365Single-Leader Replication Topology
Write concern
specifies how many replicas must acknowledge a write before MongoDB reports success:
Lesson 395Read and Write Concerns
Write distribution
(no single hot shard)
Lesson 397Shard Key Selection
Write efficiency
Only new data is processed, not the entire corpus
Lesson 1772Real-Time Index Updates
Write events, not state
When something happens (order placed, payment received), you write an immutable event describing what happened
Lesson 586Alternative: Event Sourcing for Consistency
Write latency
Slower than async (must wait for network + replica write)
Lesson 217Semi-Synchronous Replication Trade-offsLesson 296Write Amplification Costs
Write latency increases
because you must wait for both the cache write *and* the slower database write before responding.
Lesson 134Write-Through Caching Pattern
write path
handles user submissions, validates content, generates unique IDs, and persists data.
Lesson 1548Read vs Write Path ArchitectureLesson 1562Content Compression and Encoding
Write phase
Data is written to N *available* nodes (not necessarily the "right" ones)
Lesson 1372Sloppy Quorums and Hinted Handoff
Write queries
(`INSERT`, `UPDATE`, `DELETE`) → Primary database
Lesson 222Proxy-Based Read-Write Splitting
Write replication
New URL creation in one region propagates to others asynchronously
Lesson 1535Multi-Region Deployment
Write time
User A posts → store post in database with `user_id` and `timestamp`
Lesson 1647Fanout-on-Read (Pull Model)
Write timeout
controls how long your client will wait to *send* data to the server before giving up.
Lesson 1089Read Timeout and Write Timeout
Write to page cache
Kafka appends the message to an in-memory buffer (the OS page cache) immediately
Lesson 713Kafka's Write Path and Durability
write-ahead log
(called HLog) and **memstores** that flush to immutable files (called HFiles, similar to SSTables).
Lesson 433What is HBase?Lesson 574Recovery Protocols and Logs
Write-ahead logging
ensuring durability
Lesson 308Strong Consistency by Default
Write-Ahead Logging (WAL)
Before modifying any data in memory or on disk, the database first writes the change to a sequential log file.
Lesson 313Durability: Surviving System FailuresLesson 470Transaction Model and ACID in Neo4j
Write-Back
) pattern, writes are immediately accepted by the cache and acknowledged to the client *before* being persisted to the database.
Lesson 136Write-Behind (Write-Back) Caching PatternLesson 1528Write-Through vs Write-Back for URL Creation
Write-Behind
(also called **Write-Back**) pattern, writes are immediately accepted by the cache and acknowledged to the client *before* being persisted to the database.
Lesson 136Write-Behind (Write-Back) Caching Pattern
Write-heavy optimization
`N=5, R=3, W=2`
Lesson 558N, R, W Configuration Trade-offs
Write-Through Caching
(which you learned in the previous lesson), every write operation updates *both* the cache and the underlying database before returning success to the client.
Lesson 135Write-Through: Latency and Consistency Tradeoffs
Writer publishes event
to message queue (e.
Lesson 158Event-Based Invalidation
Writer updates data
in the database
Lesson 158Event-Based Invalidation
Writes are blocked
to prevent conflicting updates
Lesson 511Banking Systems: Consistency Over Availability
Writes per day
10 million new posts daily
Lesson 29Database Size Growth Projection
Writes-follow-reads
consistency (also called "session causality") ensures that if a client reads some data and then performs a write, that write is guaranteed to happen *after* the values the client observed during the read.
Lesson 537Writes-Follow-Reads ConsistencyLesson 545Writes-Follow-Reads Consistency
Writes-follow-reads consistency
(also called *session causality*) guarantees that if a client reads some data and then performs a write, that write will be applied to a system state that includes the data the client just read—or a later state.
Lesson 1393Writes-Follow-Reads Consistency

X

X happened before Y
if every counter in X's vector ≤ Y's vector (and at least one is strictly less)
Lesson 1382Version Vectors and Causality
XA transaction managers
in Java EE) implement 2PC across these services.
Lesson 576When 2PC is Used in Practice

Y

Yearly backups
Keep 3–7 years for compliance
Lesson 1406Backup Retention Policies
You avoid false positives
High CPU might be fine if users are happy
Lesson 1313Monitoring and Observability for SRE
You lose parallelism
With one partition, only one consumer can process messages at a time while maintaining order.
Lesson 685Message Ordering Guarantees
You upload
your static assets (images, CSS, JS) to your origin server or directly to the CDN
Lesson 192CDN for Static Asset Delivery
You value operational simplicity
Fewer moving parts mean less infrastructure to maintain, monitor, and debug.
Lesson 755When to Choose Lambda vs Kappa
Your comment isn't visible
You panic and click "Post" again, creating duplicates
Lesson 209Read-After-Write Consistency Problem
Your feed
A blend of pre-computed posts (from regular users you follow) plus real-time queries (for celebrities you follow).
Lesson 1648Hybrid Fanout Strategy
Your IP address
to determine your approximate location
Lesson 176Geographic Routing and Anycast
Your queries are simple
If real-time and batch processing use similar logic (counting events, simple aggregations), Kappa's replay capability handles both without complexity.
Lesson 755When to Choose Lambda vs Kappa
Your service
(the "client") has been pre-registered with the authorization server and given two pieces of secret information: a `client_id` and a `client_secret`
Lesson 925Client Credentials Flow
Your service's processing time
(parsing responses, business logic)
Lesson 1098Per-Hop Timeout Budgets

Z

ZAB (ZooKeeper Atomic Broadcast)
, which works similarly to Raft.
Lesson 633ZooKeeper: Coordination Service Built on Consensus
Zero application changes
Legacy apps can gain read-write splitting without code modifications—just point them to the proxy.
Lesson 222Proxy-Based Read-Write Splitting
Zero coordination
Any server can generate UUIDs independently
Lesson 1520Primary Key Selection: Auto-Increment vs UUID
Zero infrastructure
No servers to provision or manage
Lesson 895AWS API Gateway and Serverless Integration
Zero-Trust Security Requirements
Lesson 868When Service Mesh Adds Value
Zipkin
offer full control, zero licensing costs, and community support.
Lesson 1251Choosing a Tracing System
Zipkin wins on simplicity
single binary deployment works for many use cases.
Lesson 1242Zipkin Architecture and Design
Zone failures
Simulate an entire availability zone going dark
Lesson 1342Testing Redundancy with Fault Injection