System Design Glossary
Key terms from the System Design course, linked to the lesson that introduces each one.
5,528 terms.
#
- 1. Access Patterns
- Lesson 75 — Choosing Partition and Replication StrategiesLesson 130 — Choosing the Right Caching Layer
- 6 characters
- 62^6 ≈ **56 billion** unique URLs
- Lesson 1500 — URL Length and Encoding ConstraintsLesson 1501 — Collision Probability and Namespace SizeLesson 1507 — Computing Short URL Length
- 7 characters
- 62^7 ≈ **3.
- Lesson 1500 — URL Length and Encoding ConstraintsLesson 1501 — Collision Probability and Namespace Size
- 99.9% (three nines)
- ~8.
- Lesson 1318 — Defining Availability and UptimeLesson 1319 — The Nine Nines: Understanding Availability SLAs
- 99.99% (four nines)
- ~52.
- Lesson 1318 — Defining Availability and UptimeLesson 1319 — The Nine Nines: Understanding Availability SLAs
A
- ABAC policy engine
- is a dedicated component that evaluates authorization policies written in a specialized language.
- Lesson 936 — ABAC Policy Engines
- Aborted
- The transaction was rolled back (could happen from any state)
- Lesson 572 — Participant State Transitions
- Absolute deadline (recommended)
- Lesson 1112 — HTTP Header-Based Propagation
- Absolutely forbidden
- Lesson 1131 — Logging Sensitive Data: Security Concerns
- Abuse detection
- Identify tenants hitting limits frequently or attempting to bypass restrictions
- Lesson 1825 — Monitoring and Analytics Per Tenant
- Accelerates learning
- Without defensive behavior, teams uncover root causes faster and identify systemic weaknesses that might affect other areas.
- Lesson 1351 — Blameless Postmortem Culture
- Accent-Insensitive Indexing
- Build tries that strip diacritics for matching but preserve them for display.
- Lesson 1768 — Typeahead for Multi-Language Support
- Accept Bounded Overage
- Design for 10-20% overage buffer; adjust advertised limits accordingly (limit=90, enforce at 100).
- Lesson 981 — Race Conditions in Distributed Counters
- Accept Header Pattern
- Lesson 1901 — Header-Based Versioning
- Accept phase
- , the proposer sends its value (or adopts a value learned in Phase 1) to acceptors.
- Lesson 612 — The Two-Phase Protocol
- Accept requests
- and allow temporary inconsistency (AP system)
- Lesson 532 — Why Eventual Consistency Exists
- Accept temporary inconsistency
- Data might be out of sync for seconds, minutes, or longer
- Lesson 583 — Alternative: Best Effort with Eventual Consistency
- Accept webhook requests
- from providers at dedicated endpoints (e.
- Lesson 1693 — Delivery Receipt Tracking
- Acceptable Degradation
- Define what "graceful" means.
- Lesson 1073 — Bulkhead Sizing: Balancing Isolation and Utilization
- Acceptable Eventual Consistency
- Lesson 290 — When to DenormalizeLesson 598 — Saga Frameworks and Real-World Adoption
- Acceptors
- form the voting body that decides which proposal to accept.
- Lesson 610 — The Three Roles in Paxos
- Access control
- You can grant permissions at the database or collection level
- Lesson 383 — Collections and Databases
- Access Control List
- is a table attached to a resource that specifies exactly which users or groups have what permissions on that specific resource.
- Lesson 937 — Access Control Lists (ACLs)
- Access logs
- capture the who, what, and when of each request: timestamps, endpoints hit, HTTP methods, status codes, response times, and client identifiers.
- Lesson 890 — Logging and Metrics CollectionLesson 1135 — Log Retention and Volume Management
- Access pattern
- Most users view the latest version.
- Lesson 1577 — Paste Editing and Version HistoryLesson 1663 — Hot and Cold Timeline Data
- Access token
- = your room keycard (expires daily)
- Lesson 915 — Token Expiration and Refresh TokensLesson 925 — Client Credentials FlowLesson 926 — Access Tokens vs Refresh TokensLesson 930 — OAuth2 Scopes and Consent
- access tokens
- that expire quickly (minutes to hours) for API requests, paired with **refresh tokens** that last much longer (days to weeks) but are only used to obtain new access tokens.
- Lesson 915 — Token Expiration and Refresh TokensLesson 1615 — Signed URLs and Token-Based Access
- Access-Based Extension
- Implement a "last accessed" timestamp.
- Lesson 1573 — Handling Never-Expiring Pastes
- Account balance checks
- Quorum reads (R=QUORUM) balance consistency and availability
- Lesson 563 — Tunable Consistency in Practice
- Account balance updates
- (CP): Require strong consistency across all regions before confirming transaction
- Lesson 510 — Real Systems: Multi-Region Trade-offs
- accuracy
- , and **implementation complexity**.
- Lesson 970 — Fixed vs Sliding Window TradeoffsLesson 978 — Why Distributed Rate Limiting Is HardLesson 985 — Trade-offs: Accuracy vs LatencyLesson 1789 — Client-Side vs Server-Side Rate LimitingLesson 1793 — Centralized vs Distributed Rate Limiting
- Accuracy is non-negotiable
- If your business requires absolutely correct results (financial reporting, compliance audits, regulatory data), Lambda's batch layer provides a "source of truth" that corrects any errors from the speed layer.
- Lesson 755 — When to Choose Lambda vs Kappa
- Accurate expiration tracking
- TTL policies work from the moment of creation
- Lesson 1559 — Write Path: Synchronous vs Asynchronous Storage
- Accurate percentiles
- for that specific service instance (no bucketing error)
- Lesson 1186 — Summary Metrics
- ACID
- stands for **Atomicity, Consistency, Isolation, and Durability** — four properties that ensure your database transactions are safe, predictable, and reliable even when things go wrong.
- Lesson 303 — ACID Properties Overview
- ACID guarantees
- Strong consistency when preferences affect billing or compliance.
- Lesson 1721 — Preference Storage Strategy
- ACID properties
- (Atomicity, Consistency, Isolation, Durability) across these boundaries becomes extraordinarily complex.
- Lesson 1489 — Cross-Partition Transactions
- ACID transactions
- to guarantee that either all changes happen together or none do.
- Lesson 261 — Distributed Transactions Across ShardsLesson 322 — Transaction Requirements and Trade- offsLesson 331 — What NewSQL IsLesson 332 — The NewSQL Value PropositionLesson 470 — Transaction Model and ACID in Neo4j
- Acknowledgment flows
- track these stages through a series of confirmations from various components in the delivery chain.
- Lesson 1718 — Acknowledgment and Confirmation Flows
- ACLs
- excel when you need per-resource control—like giving specific users access to specific documents without creating roles for every combination.
- Lesson 937 — Access Control Lists (ACLs)
- Acquisition
- Thread requests connection → Pool checks for idle connection → Validates if configured → Marks as active → Returns to application
- Lesson 270 — Connection Lifecycle in a Pool
- Across regions
- Lesson 1375 — Hybrid Topologies
- Action
- What operation they attempted (read, write, delete)
- Lesson 944 — Auditing and Compliance for Authorization
- Action attributes
- What operation is being attempted
- Lesson 935 — Attribute-Based Access Control (ABAC) Introduction
- Action Items
- Concrete follow-ups with owners and deadlines—"Add canary deployment step" or "Increase monitoring coverage for X"
- Lesson 1304 — Blameless PostmortemsLesson 1350 — What is a Postmortem?Lesson 1352 — Postmortem Structure and Action Items
- Actionable
- Should trigger automatic remediation (failover, alerts)
- Lesson 1339 — Health Checks and Failure Detection
- Actionable alerts only
- Every alert should have a clear action.
- Lesson 1171 — Log Review and Alert Fatigue
- Activation
- Promote read replicas, restore from backups if needed
- Lesson 1437 — Failover and Failback Procedures
- Active
- While active, the span can collect attributes, events, and status updates
- Lesson 1231 — Span Lifecycle and Structure
- Active connection count
- Number of in-flight HTTP requests to prevent overwhelming the host with concurrent connections
- Lesson 1848 — Politeness Table and Per-Host State
- Active connections
- How many requests each server is currently handling
- Lesson 92 — Least Response Time AlgorithmLesson 1175 — Gauge MetricsLesson 1184 — Gauge Metrics
- Active health checks
- work like a heartbeat monitor—the load balancer regularly sends test requests to each server (like pinging "Are you alive?
- Lesson 99 — Active vs Passive Health ChecksLesson 180 — DNS-Based Request Routing
- Active invalidation on write
- means your system explicitly deletes or updates cache entries the instant you modify the source data.
- Lesson 157 — Active Invalidation on Write
- Active State
- Lesson 270 — Connection Lifecycle in a Pool
- Active users
- Engagement and retention indicators
- Lesson 1196 — Business vs Technical MetricsLesson 1681 — Mobile Push Notification Integration
- Active vs Idle Connections
- Lesson 273 — Connection Pool Monitoring
- Active-Active
- Lesson 81 — Single Point of Failure: Load Balancer HALesson 726 — Multi-Datacenter ReplicationLesson 1327 — Active-Active vs Active-Passive AvailabilityLesson 1332 — Active-Active vs Active-Passive RedundancyLesson 1334 — Geographic Redundancy and Multi-RegionLesson 1335 — Failover MechanismsLesson 1435 — Multi-Region Architecture for DR
- Active-Active (Dual-Operation)
- Both regions actively serve traffic simultaneously.
- Lesson 1436 — Active-Passive vs Active-Active DR
- Active-Passive
- One primary cluster, replicas for failover only
- Lesson 726 — Multi-Datacenter ReplicationLesson 1327 — Active-Active vs Active-Passive AvailabilityLesson 1332 — Active-Active vs Active-Passive RedundancyLesson 1335 — Failover MechanismsLesson 1435 — Multi-Region Architecture for DR
- Active-Passive (Primary-Secondary)
- Lesson 81 — Single Point of Failure: Load Balancer HA
- Active-Passive (Standby Failover)
- Your primary region handles all traffic.
- Lesson 1436 — Active-Passive vs Active-Active DR
- Acyclic
- No circular dependencies (Task A can't depend on Task B if B depends on A)
- Lesson 766 — Apache Airflow Fundamentals
- Adaptation
- If bandwidth drops, player seamlessly switches to lower quality mid-stream
- Lesson 1602 — Adaptive Bitrate Streaming (ABR)
- Adaptive bitrate streaming
- encodes the same video at multiple quality levels (240p, 480p, 720p, 1080p, etc.
- Lesson 193 — CDN for Video StreamingLesson 1630 — Live Streaming Architecture
- Adaptive Format Serving
- Lesson 1621 — Compression and Format Optimization
- Adaptive freshness
- If a page hasn't changed in 10 consecutive crawls, exponentially back off the recrawl frequency.
- Lesson 1873 — Handling Recrawls and Freshness
- Adaptive Rate Limiting
- Lesson 975 — Algorithm Selection CriteriaLesson 993 — Adaptive Rate LimitingLesson 995 — Graceful Degradation Through ThrottlingLesson 1654 — Fanout Rate Limiting
- Adaptive sampling
- Increase sample rate when detecting anomalies or errors (sample failures more aggressively than successes)
- Lesson 1217 — Sampling for Expensive Metrics
- Adaptive sync
- Increase frequency when approaching limits
- Lesson 1802 — Synchronization Strategies for Local Caches
- Adaptive timeouts
- continuously measure actual request latencies and adjust timeout values based on **percentile calculations**—typically P95 or P99.
- Lesson 1117 — Adaptive Timeouts Based on Historical Latency
- Add logging and monitoring
- to track latency, throughput, and error rates
- Lesson 40 — Measure Before Optimizing
- Add smart filtering
- Use correlation IDs to group related failures, avoiding duplicate alerts for the same incident.
- Lesson 1171 — Log Review and Alert Fatigue
- Adding new endpoints
- Existing routes continue unchanged
- Lesson 1905 — Breaking vs Non-Breaking Changes
- Adding optional fields
- to requests: Old clients omit them; new clients can include them
- Lesson 1905 — Breaking vs Non-Breaking Changes
- Adding RAM
- improves caching (more data in memory = faster queries)
- Lesson 54 — Scaling Databases: Special Considerations
- Additive changes only
- Add new optional fields; never remove or rename existing ones
- Lesson 809 — Versioning and Backward CompatibilityLesson 1919 — API Design for Polyglot Clients and Backwards Compatibility
- Adds metadata
- Automatically includes it in the `grpc-timeout` header
- Lesson 1104 — gRPC Timeout Propagation
- Adjusts frequency caps
- dynamically (reduce from 5/day to 2/day if engagement drops)
- Lesson 1729 — Analytics-Driven Optimization
- Advance notice requirements
- SLAs typically specify how far ahead you must announce maintenance (e.
- Lesson 1328 — Scheduled Maintenance and Availability Accounting
- Advanced features
- Proxies can provide connection pooling, query caching, lag-aware routing, and even query rewriting.
- Lesson 222 — Proxy-Based Read-Write Splitting
- Advantage
- Strong consistency — if the primary fails immediately after a write, the replica has the exact same data.
- Lesson 203 — Synchronous Replication ExplainedLesson 1534 — Rate Limiting for URL Creation
- Advantages
- Lesson 79 — Hardware vs Software Load BalancersLesson 122 — Application-Level In-Memory CachingLesson 158 — Event-Based InvalidationLesson 242 — Directory-Based ShardingLesson 289 — Normalized vs Denormalized Schema DesignLesson 349 — Redis In-Memory Storage ModelLesson 350 — Redis Persistence: RDB SnapshotsLesson 769 — Spark Streaming and Structured Streaming (+17 more)
- Advantages of summaries
- Lesson 1186 — Summary Metrics
- Affected scope
- Which service instances, regions, or user segments
- Lesson 1293 — Alert Context and Enrichment
- After compaction
- Lesson 720 — Log Compaction
- Agent overhead
- consuming precious CPU and memory on production hosts
- Lesson 1252 — Sampling Strategies Overview
- Agent resource usage
- Tracing agents running on each host consume CPU and memory to process, buffer, and forward spans.
- Lesson 1259 — Network and Agent Overhead
- Aggregate metrics
- Count events by type, group by user, calculate percentiles automatically
- Lesson 1137 — What is Structured Logging
- Aggregate related events
- Instead of alerting on every individual timeout, alert when timeout rate exceeds 5% over 5 minutes.
- Lesson 1171 — Log Review and Alert Fatigue
- Aggregation
- replaces repetitive logs with summaries.
- Lesson 1135 — Log Retention and Volume ManagementLesson 1167 — Avoid Log ExplosionLesson 1179 — Aggregation and Roll-UpsLesson 1218 — Testing Metric PipelinesLesson 1505 — Analytics and Tracking RequirementsLesson 1652 — Fanout Worker Parallelization
- Aggregation accuracy
- Test that roll-ups, percentiles, and rate calculations produce correct results.
- Lesson 1218 — Testing Metric Pipelines
- Aggregation and Roll-Ups
- (lesson 1179) and **Metrics Federation and Long-Term Storage** (lesson 1206).
- Lesson 1270 — Monitoring Resolution and Retention Tradeoffs
- Aggregation overhead
- Computing sums, averages, or counts across large datasets requires reading many documents into memory.
- Lesson 408 — Query Performance Limitations
- Aggregation structures
- that count documents matching each facet value
- Lesson 1775 — Faceted Search and Filters
- Aggregation tables
- (also called **summary tables** or **rollup tables**) store pre-computed metrics and summaries.
- Lesson 294 — Aggregation TablesLesson 297 — Denormalization in Practice
- Aggregations
- Compute running totals, averages, or counts over time windows.
- Lesson 722 — Kafka Streams API
- Aggregations Across Shards
- Lesson 238 — Query Limitations in Sharded Systems
- Aggressive caching
- Use local caches more heavily for hot tenants to reduce Redis load
- Lesson 1823 — Hot Tenant ProblemLesson 1875 — HTTP Methods: GET, POST, PUT, DELETE Semantics
- agreement
- , **validity** (the agreed value was actually proposed by someone), and **termination** (the decision completes eventually)—all while nodes crash and networks partition.
- Lesson 599 — What Is Distributed Consensus?Lesson 608 — The Problem Paxos Solves
- Alert enrichment
- solves this by packaging critical context *inside* the alert itself.
- Lesson 1293 — Alert Context and EnrichmentLesson 1295 — Testing Alerts and Dry Runs
- alert fatigue
- , and it's dangerous because real incidents get missed in the noise.
- Lesson 1171 — Log Review and Alert FatigueLesson 1287 — Actionability: Every Alert Needs a Runbook
- Alert firing
- Does the condition correctly trigger the alert?
- Lesson 1295 — Testing Alerts and Dry Runs
- Alert quality matters
- Actionable, low-noise alerts (remember alert fatigue?
- Lesson 1297 — On-Call Fundamentals and Rotation Models
- alerting
- (will someone know when things break?
- Lesson 1218 — Testing Metric PipelinesLesson 1410 — Backup Monitoring and Alerting
- Alerting logic
- Trigger threshold violations deliberately and confirm alerts fire within expected timeframes.
- Lesson 1218 — Testing Metric Pipelines
- Alerting Systems
- Lesson 739 — Stream Processing Use Cases
- Alertmanager
- (separate component): Handles alert routing and notifications
- Lesson 1198 — Prometheus Architecture and Data Model
- Alerts
- are urgent, actionable signals that require immediate human intervention to prevent or mitigate user impact.
- Lesson 1285 — Alert vs Notification
- Alerts become actionable
- "Users can't log in" is clearer than "Database CPU at 87%"
- Lesson 1313 — Monitoring and Observability for SRE
- all
- replicas fail, you can fall back to reading from the primary database (accepting the increased load) or return degraded service errors, depending on your requirements.
- Lesson 227 — Handling Replica Failures in RoutingLesson 425 — Tunable Consistency LevelsLesson 877 — The API Gateway Bottleneck RiskLesson 1321 — Redundancy and Parallel Availability
- all conflicting versions
- to the client.
- Lesson 367 — Vector Clocks and Conflict DetectionLesson 377 — Eventual Consistency and Application Reconciliation
- all relevant shards
- (or sometimes all shards if you can't determine which ones hold relevant data).
- Lesson 255 — Scatter-Gather PatternLesson 1780 — Distributed Query Coordination
- Allowed
- `GET user:12345` retrieves a user by their ID
- Lesson 342 — No Secondary Indexes or Query Language
- Allowlisting
- Define which fields are safe to log; drop everything else by default.
- Lesson 1145 — Sensitive Data in Structured Logs
- Alphabet reduction
- (map rare characters to shared slots)
- Lesson 1759 — Trie Space Optimization Techniques
- Also Layer 4
- can operate at the transport layer for pure TCP/UDP load balancing
- Lesson 111 — NGINX as a Load Balancer
- Alternative Service Paths
- Route to a backup service or simplified version.
- Lesson 1061 — Fallback Strategies
- always
- see the updated balance.
- Lesson 308 — Strong Consistency by DefaultLesson 351 — Redis Persistence: AOF Logs
- Always available
- DNS servers respond to queries even if they haven't received the latest updates yet
- Lesson 500 — DNS Systems (AP)
- Always fresh
- – You see the latest content immediately
- Lesson 1637 — Pull (Read-Time) Feed ModelLesson 1647 — Fanout-on-Read (Pull Model)
- Always use HTTPS
- Tokens in transit over HTTP are trivial to intercept
- Lesson 931 — OAuth2 Security Best Practices
- Always-On Protection
- Unlike scrambling to respond after an attack begins, CDN-based protection is continuously active at the edge, closest to attack sources.
- Lesson 195 — CDN for DDoS Protection
- Amazon DynamoDB
- is the managed service evolution of the original paper.
- Lesson 378 — Dynamo's Influence on Modern SystemsLesson 554 — Consistency Model Examples in Real Systems
- Amazon SQS
- or **Apache Kafka** distribute file processing tasks across worker pools, ensuring reliable, scalable, and fault-tolerant job handling for image and video operations.
- Lesson 1604 — Message Queue for Processing Jobs
- Analogy
- Like splitting a massive library into multiple buildings when one building's shelves are full.
- Lesson 66 — Why Partition Data?Lesson 95 — Geographic/Proximity-Based RoutingLesson 174 — CDN Benefits Beyond Latency ReductionLesson 176 — Geographic Routing and AnycastLesson 177 — When Not to Use a CDNLesson 183 — Pull vs Push CDN ModelsLesson 197 — When NOT to Use a CDNLesson 212 — Measuring Replication Lag (+95 more)
- Analytics and metrics
- – view counts or "likes" don't need instant global accuracy
- Lesson 318 — When to Choose ACID or BASE
- Analytics events
- Recording the same action multiple times skews metrics
- Lesson 1001 — Side Effects and Idempotency
- Analytics payloads
- If logging metadata, increase per-request size accordingly
- Lesson 1499 — Bandwidth Requirements for Redirects
- Analytics pipeline
- – writes the event to a data warehouse for reporting
- Lesson 1725 — Analytics Pipeline Architecture
- analytics service
- subscribes with *no filter* (receives everything)
- Lesson 658 — Topic Subscriptions and FilteringLesson 1067 — Bulkhead Pattern: Isolating Resources to Prevent Total Failure
- Analyze real traffic patterns
- from production (not just synthetic tests)
- Lesson 40 — Measure Before Optimizing
- Answer
- Yes, because Bob → member_of → Team Y → can_view → Document X
- Lesson 938 — Relationship-Based Access Control (ReBAC)
- Anti-entropy
- is a continuous background reconciliation process where replicas compare their data and synchronize differences.
- Lesson 369 — Anti-Entropy and Merkle Trees
- Anti-pattern
- Drawing service boundaries on a whiteboard without considering your org chart.
- Lesson 819 — Team Structure and Conway's Law
- any
- rule is violated, the transaction fails and rolls back completely (thanks to atomicity).
- Lesson 311 — Consistency: Maintaining Database InvariantsLesson 569 — The Coordinator Role in 2PCLesson 1770 — Index Replication for Availability
- Anycast
- is a network addressing method where multiple edge servers share the *same IP address* across different locations worldwide.
- Lesson 181 — Anycast Routing for CDNsLesson 1616 — Geographic Routing and DNS
- AOF rewriting
- compacts the log by creating a minimal command set that produces the same final state.
- Lesson 351 — Redis Persistence: AOF Logs
- AP approach
- "Withdrawal processed!
- Lesson 483 — The CAP Tradeoff During PartitionsLesson 513 — Hybrid Approaches: Different Guarantees Per OperationLesson 532 — Why Eventual Consistency Exists
- AP response
- Keep taking orders in both cities, reconcile conflicts later (preserve availability)
- Lesson 505 — The Partition Question: When, Not If
- AP system
- (prioritizing availability) might allow both customers to complete their purchase during a network partition, discovering the problem only when you try to fulfill orders.
- Lesson 499 — Inventory Management (CP)Lesson 500 — DNS Systems (AP)Lesson 502 — Mixed Strategies: Hybrid Systems
- AP systems
- (like Cassandra with eventual consistency): Always respond, accepting temporary inconsistencies
- Lesson 481 — What CAP Theorem StatesLesson 494 — AP Systems: Prioritizing AvailabilityLesson 512 — Social Media: Availability Over Consistency
- Apache Cassandra
- adopted Dynamo's core architecture almost wholesale: consistent hashing with virtual nodes, tunable consistency via quorum reads/writes, and gossip-based membership.
- Lesson 378 — Dynamo's Influence on Modern SystemsLesson 554 — Consistency Model Examples in Real Systems
- Apache Flink
- is purpose-built for stream processing with true event-time processing and stateful computations.
- Lesson 744 — Stream Processing FrameworksLesson 756 — Hybrid and Modern Alternatives
- Apache Kafka
- is designed for high-throughput event streaming and log-based messaging.
- Lesson 665 — Overview of Message Broker LandscapeLesson 1604 — Message Queue for Processing Jobs
- Apache Spark
- modernized the approach:
- Lesson 743 — Batch Processing FrameworksLesson 756 — Hybrid and Modern Alternatives
- Apache ZooKeeper
- coordinates distributed systems with linearizable operations.
- Lesson 530 — Strong Consistency in Practice
- API access delegation
- Users grant your analytics dashboard read-only access to their Stripe data.
- Lesson 920 — OAuth2 Fundamentals and Use Cases
- API composition needs
- where clients require aggregated data from multiple services
- Lesson 879 — When to Introduce an API Gateway
- API contract
- defines the structured conversation between the client and rate limiter.
- Lesson 1786 — API Contract: Request and Response Format
- API Gateway
- is a server that acts as a centralized front door for all client requests before they reach your backend services.
- Lesson 870 — What is an API Gateway?Lesson 872 — API Gateway vs Reverse ProxyLesson 1132 — Correlation IDs and Request TracingLesson 1585 — Upload Flow Architecture Overview
- API Gateway + Observability
- Combine a robust API gateway with distributed tracing (via OpenTelemetry) and centralized logging.
- Lesson 869 — Alternatives to Full Service Mesh
- API Gateway Cache
- Frequently-requested queries (top 5%) cached before reaching your application servers.
- Lesson 1771 — Query Caching Strategies
- API Gateway/Load Balancer
- Token bucket or sliding window algorithms reject excess requests before they hit your servers
- Lesson 1596 — Upload Rate Limiting and Quotas
- API gateways
- and **reverse proxies** sit in front of backend services and route incoming requests, but they serve different architectural needs.
- Lesson 872 — API Gateway vs Reverse ProxyLesson 1239 — Root Span and Entry Points
- API management platform
- beyond basic gateway features.
- Lesson 898 — Apigee and Enterprise API Management
- API monetization
- you can create pricing tiers (free, basic, premium), enforce usage quotas, generate invoices, and integrate with payment systems.
- Lesson 898 — Apigee and Enterprise API Management
- API version
- evolves independently from your **SDK version**.
- Lesson 1909 — Client SDK Versioning and Distribution
- Apigee
- (now Google Cloud Apigee) positions itself as a full **API management platform** beyond basic gateway features.
- Lesson 898 — Apigee and Enterprise API Management
- App servers
- handle business logic: validating input, generating short keys, and coordinating between storage layers.
- Lesson 1552 — Initial Architecture Diagram
- Append a salt
- Add a random string to the original URL before hashing again: `hash(original_url + random_salt)`.
- Lesson 1509 — Handling Hash Collisions
- Append-Only Logs
- Streams never update or delete events—they only append new ones.
- Lesson 692 — Streams vs Traditional DatabasesLesson 1737 — Index Building and Updates
- Append-only workloads
- play to the strengths of LSM trees (the write path we covered earlier).
- Lesson 418 — Time-Series and Time-Ordered Data
- AppendEntries RPC
- the fundamental replication mechanism that the leader sends periodically to all followers.
- Lesson 624 — AppendEntries RPC: Replication MechanismLesson 635 — Consul: Service Discovery with Raft Consensus
- AppendEntries RPCs
- as both a probe and a repair mechanism:
- Lesson 629 — Log Inconsistencies and RepairLesson 634 — etcd: Distributed Key-Value Store with Raft
- Application cache
- Shared across users, flexible
- Lesson 120 — Caching Hierarchy OverviewLesson 126 — Database Internal Caching (Buffer Pool)Lesson 1771 — Query Caching Strategies
- Application Cache Layer
- Between your API gateway and trie cluster, maintain an in-memory cache (Redis, Memcached) of the hottest queries.
- Lesson 1766 — Caching Suggestions at Multiple Layers
- Application code is simpler
- less defensive validation needed
- Lesson 301 — Schema Enforcement and Type Safety
- Application controls everything
- Your code explicitly manages cache reads and writes
- Lesson 131 — Cache-Aside (Lazy Loading) Pattern
- Application Gateway
- Layer 7 with WAF (Web Application Firewall) capabilities
- Lesson 114 — Cloud Load Balancers (GCP and Azure)
- Application impact
- Heavy instrumentation slows down services
- Lesson 1228 — Trace Sampling Fundamentals
- Application Insights
- for built-in telemetry and distributed tracing
- Lesson 899 — Azure API Management Features
- Application latency
- Serializing span data and sending it synchronously blocks request threads.
- Lesson 1259 — Network and Agent Overhead
- application layer
- your code explicitly routes the query, not some middleware or proxy.
- Lesson 221 — Application-Level Connection ManagementLesson 1596 — Upload Rate Limiting and Quotas
- Application logic burden
- Your app needs special code to handle these moves
- Lesson 263 — Shard Key Immutability Problem
- Application reads data
- → Load balancer distributes read requests across Replicas
- Lesson 199 — Primary-Replica Architecture
- Application rewrites
- if query patterns differ significantly
- Lesson 328 — Migration and Legacy System Constraints
- Application-Layer Validation Burden
- Lesson 407 — Schema Flexibility Trade-offs
- Application-level
- Each microservice maintains its own connection pool configuration
- Lesson 1071 — Connection Pool Bulkheads: Database and Service Isolation
- Application-Level Connection Management
- (lesson 221), your application code had to decide whether each query should go to the primary or a replica.
- Lesson 222 — Proxy-Based Read-Write Splitting
- Application-level coordination
- Handle multi-step operations in application code with compensation logic
- Lesson 261 — Distributed Transactions Across Shards
- Application-Level Retry Logic
- If a read-after-write fails (you write to primary, then read stale data from replica), your code detects this mismatch and retries the read against the primary.
- Lesson 219 — Application-Level Consistency Patterns
- Apply filters before sorting
- Filters reduce the dataset; sorting operates on that reduced set.
- Lesson 1896 — Combining Pagination, Filtering, and Sorting
- Appropriate log levels
- Use `debug` for verbose loops, not `info` or `error`
- Lesson 1167 — Avoid Log Explosion
- Approximate counting
- Use probabilistic data structures (HyperLogLog) when exact counts aren't critical
- Lesson 977 — Algorithm Implementation PatternsLesson 1785 — Non-Functional Requirements: Accuracy vs Performance
- Approximate limits
- The actual global limit may be exceeded (up to N×local_limit in worst case)
- Lesson 979 — Centralized vs Decentralized Approaches
- Approximate limits acceptable
- Use **Fixed Window Counter** (suffers from boundary issues but very efficient)
- Lesson 975 — Algorithm Selection Criteria
- Approximation
- Apply less precise algorithms (gossip-based) for tenants exceeding thresholds
- Lesson 1823 — Hot Tenant Problem
- Architecture
- The big-picture structure (like deciding if your city is a grid or has winding streets)
- Lesson 1 — What Is System Design?
- Archival becomes trivial
- move `logs_2023_06` to cold storage without touching active shards.
- Lesson 249 — Time-Based Sharding
- Archiving analytics
- before deletion (preserves click history)
- Lesson 1532 — Expiration and Time-to-Live
- Array-based representations
- replace pointer-heavy node structures with packed arrays, improving cache locality.
- Lesson 1776 — Typeahead Index Optimization
- As you mature
- You discover that "success rate" hides important nuances.
- Lesson 1284 — Iterating on SLIs and SLOs
- Ask recursive resolver
- (ISP or public DNS like 8.
- Lesson 1856 — DNS Resolution Fundamentals for Crawlers
- Ask these questions
- Lesson 18 — Prioritizing Requirements Under ConstraintsLesson 1273 — Choosing Good SLIs
- Assign to healthy workers
- The frontier redistributes them to active workers in the next pull cycle
- Lesson 1866 — Worker Health Monitoring and Failover
- Async
- Queue thumbnail generation, image optimization, backup to storage — user doesn't wait
- Lesson 654 — When to Use Async vs Sync
- Async flush to disk
- The OS eventually flushes data from page cache to physical disk in the background
- Lesson 713 — Kafka's Write Path and Durability
- Async Processing
- For non-critical operations, accept the request immediately and process it asynchronously.
- Lesson 1042 — Idempotency vs Performance Tradeoffs
- Async reconciliation
- Regions track locally, sync periodically (every 5-10 seconds) to redistribute quotas.
- Lesson 987 — Multi-Region Rate Limiting Challenges
- asynchronous
- (as you learned earlier), there's a delay — **replication lag** — before data reaches replicas.
- Lesson 209 — Read-After-Write Consistency ProblemLesson 1134 — Synchronous vs Asynchronous LoggingLesson 1354 — Synchronous vs Asynchronous ReplicationLesson 1365 — Single-Leader Replication Topology
- Asynchronous checking
- Don't slow down legitimate URL creation—queue suspicious URLs for deeper analysis
- Lesson 1540 — Spam and Malicious Link Detection
- Asynchronous logging
- Use buffered, non-blocking loggers that write in background threads, preventing I/O from blocking request handling.
- Lesson 1133 — Logging Performance ImpactLesson 1134 — Synchronous vs Asynchronous LoggingLesson 1143 — Performance Impact of Structured Logging
- Asynchronous processing
- means you immediately acknowledge the upload, store the raw file, and queue processing tasks (like transcoding, thumbnail generation) for background workers to handle later.
- Lesson 1598 — Synchronous vs Asynchronous ProcessingLesson 1698 — Message Queue for Decoupling
- asynchronous replication
- , when your application writes data to the primary database, the primary acknowledges the write as "successful" immediately—without waiting for replica databases to confirm they've copied the data.
- Lesson 204 — Asynchronous Replication ExplainedLesson 205 — Semi-Synchronous ReplicationLesson 217 — Semi-Synchronous Replication Trade-offsLesson 1354 — Synchronous vs Asynchronous ReplicationLesson 1356 — Asynchronous Replication: Speed and RiskLesson 1364 — Choosing a Replication Mode
- Asynchronous transmission
- Never block request threads waiting for trace data to be sent.
- Lesson 1259 — Network and Agent Overhead
- At creation time
- , store an `expires_at` timestamp alongside each URL.
- Lesson 1532 — Expiration and Time-to-Live
- at least one
- replica to confirm the write before declaring success to the client.
- Lesson 217 — Semi-Synchronous Replication Trade-offsLesson 1357 — Semi-Synchronous Replication
- At rest
- means data sitting in storage—on disk, tape, or cloud object storage.
- Lesson 1409 — Backup Encryption and Security
- At-least-once
- fits most production workloads: order processing, notifications, event logging.
- Lesson 689 — Choosing Delivery SemanticsLesson 710 — Offsets and Commit StrategiesLesson 1709 — At-Most-Once, At-Least-Once, Exactly-Once SemanticsLesson 1710 — Why Exactly-Once Is Hard in Notifications
- at-least-once delivery
- .
- Lesson 657 — Message Ownership in QueuesLesson 663 — Hybrid Patterns: Topic + QueueLesson 681 — Acknowledgment MechanismsLesson 687 — Dead Letter QueuesLesson 734 — NATS StreamingLesson 1035 — Idempotency in Event Processing
- At-least-once with durability
- RabbitMQ, SQS, Azure Service Bus
- Lesson 676 — Choosing Between Message Broker Technologies
- at-most-once
- delivery semantics: a message is delivered either zero or one time, never more.
- Lesson 673 — NATS and Lightweight MessagingLesson 689 — Choosing Delivery SemanticsLesson 710 — Offsets and Commit Strategies
- At-most-once (fast, lossy)
- Redis pub/sub, basic NATS
- Lesson 676 — Choosing Between Message Broker Technologies
- Atomic broadcast
- Updates either succeed everywhere or nowhere
- Lesson 633 — ZooKeeper: Coordination Service Built on Consensus
- Atomic increment
- Use `INCR` to increment the counter for that key
- Lesson 1794 — Redis-Based Rate Limiting with INCR
- Atomic Lua Scripts
- Lesson 980 — Redis-Based Distributed Rate Limiting
- atomic operations
- (prevents race conditions), and handles **expiration automatically** (TTL cleans up old data).
- Lesson 358 — Redis for Rate LimitingLesson 977 — Algorithm Implementation PatternsLesson 980 — Redis-Based Distributed Rate LimitingLesson 981 — Race Conditions in Distributed Counters
- Atomic state update
- Mark complete in same transaction as the work
- Lesson 1037 — Idempotency in Distributed Workflows
- Atomic Write
- The snapshot is written to a temporary file, then atomically renamed to replace the old snapshot
- Lesson 350 — Redis Persistence: RDB Snapshots
- Atomically
- update all affected rows in one operation
- Lesson 567 — The ACID Problem in Distributed SystemsLesson 977 — Algorithm Implementation PatternsLesson 1795 — Redis Lua Scripts for Atomicity
- Atomicity
- means all-or-nothing execution.
- Lesson 309 — ACID Properties OverviewLesson 470 — Transaction Model and ACID in Neo4jLesson 567 — The ACID Problem in Distributed SystemsLesson 568 — Two-Phase Commit (2PC) OverviewLesson 1489 — Cross-Partition Transactions
- Attack mitigation
- Slows down brute-force attacks, DDoS attempts, and web scrapers
- Lesson 955 — What is Rate Limiting?
- attributes
- rather than just roles.
- Lesson 935 — Attribute-Based Access Control (ABAC) IntroductionLesson 1226 — Span Events and LogsLesson 1233 — Span Tags and Attributes
- Audit log requirements
- Lesson 1728 — Opt-Out and Compliance Tracking
- Audit trail
- Compliance record of what couldn't be delivered
- Lesson 1705 — Retry and Dead Letter Queues
- Audit trails
- Logs provide evidence of who did what and when — critical for security and compliance.
- Lesson 1127 — What is Logging and Why It MattersLesson 1807 — In-Memory vs Persistent Storage for Rate Limiting
- Auditing
- Easy to see "who has admin access?
- Lesson 933 — Role-Based Access Control (RBAC) Fundamentals
- Auditing requires precision
- regulatory compliance systems
- Lesson 518 — PC/EC Systems: Consistency Always
- Authenticate
- the provider using signatures or tokens to prevent spoofing
- Lesson 1693 — Delivery Receipt Tracking
- Authenticated Requests
- Client includes the token in request headers (usually `Authorization: Bearer <token>`)
- Lesson 912 — Token-Based Authentication Fundamentals
- authentication
- (verifying identities), and **authorization** (controlling who can do what).
- Lesson 727 — Kafka Security: Authentication and EncryptionLesson 851 — Mutual TLS (mTLS) AuthenticationLesson 920 — OAuth2 Fundamentals and Use Cases
- Authentication & Authorization
- Instead of every service validating JWT tokens or checking API keys, the gateway handles it once.
- Lesson 876 — API Gateway as a Cross-Cutting Concern Hub
- Authentication Layer
- Implement OAuth2 or JWT-based authentication to verify user identity.
- Lesson 1578 — User Accounts and Paste Management
- Author relationship
- Posts from close friends vs distant connections
- Lesson 1665 — Feed Ranking FundamentalsLesson 1666 — Ranking Signals and Features
- authorization
- (controlling who can do what).
- Lesson 727 — Kafka Security: Authentication and EncryptionLesson 851 — Mutual TLS (mTLS) AuthenticationLesson 884 — Authorization and Policy EnforcementLesson 920 — OAuth2 Fundamentals and Use CasesLesson 928 — OpenID Connect (OIDC) Overview
- Authorization request
- Send `code_challenge` and `code_challenge_method=S256` with the auth request
- Lesson 923 — PKCE: Proof Key for Code Exchange
- Auto-ack
- Message is acknowledged automatically upon delivery (risky—enables at-most-once delivery)
- Lesson 681 — Acknowledgment Mechanisms
- Auto-ack (pre-processing)
- The broker considers the message delivered as soon as it sends it to the consumer
- Lesson 683 — Consumer Acknowledgment Timing
- Auto-commit
- Kafka automatically saves your progress at regular intervals (e.
- Lesson 710 — Offsets and Commit Strategies
- Auto-generate interactive docs
- (Swagger UI) where developers can make real API calls
- Lesson 1885 — API Documentation with OpenAPI/Swagger
- Auto-incrementing IDs
- User IDs like `user_10001`, `user_10002`, `user_10003`.
- Lesson 1474 — Hotspot Problems in Range PartitioningLesson 1515 — Short URL Predictability Tradeoffs
- Auto-scaling
- Handles one request or one million identically
- Lesson 895 — AWS API Gateway and Serverless IntegrationLesson 1708 — Scalability and Horizontal Expansion
- Auto-scaling integration
- Automatically adjusts to your EC2 Auto Scaling groups
- Lesson 113 — Cloud Load Balancers (AWS ELB/ALB)
- Auto-scaling policies
- take this further: based on metrics like CPU usage (>70%), request rate, or response latency, the infrastructure automatically provisions new redirect servers from a template.
- Lesson 1536 — Horizontal Scaling of Redirect Servers
- Automate integrity checks
- immediately after each backup completes
- Lesson 1430 — Backup Verification and Testing
- Automate Systematically
- Lesson 1312 — Measuring and Reducing Toil
- Automate toil
- repetitive manual work that doesn't provide lasting value
- Lesson 1307 — What is Site Reliability Engineering (SRE)?
- Automate verification
- Run checksum validation after every backup job
- Lesson 1408 — Backup Verification and Testing
- Automated extraction
- from metric systems using their metadata APIs
- Lesson 1216 — Metric Documentation and Discovery
- Automated failover
- uses health checks and monitoring to detect failures and trigger the switch without human intervention.
- Lesson 1437 — Failover and Failback Procedures
- Automated testing gates
- unit tests, integration tests, and end-to-end tests must pass before code proceeds
- Lesson 1314 — Release Engineering and Safe Deployment
- Automatic cleanup
- If the lock holder crashes, its session expires and the lock is automatically released
- Lesson 637 — Distributed Locks via Consensus
- Automatic expiration
- Use Time-to-Live (TTL) to automatically delete inactive sessions.
- Lesson 356 — Redis as a Session Store
- Automatic failover
- If one edge server goes down, routers naturally redirect traffic to the next-closest healthy server without DNS changes
- Lesson 181 — Anycast Routing for CDNsLesson 201 — Why Replicate: Availability and FailoverLesson 394 — Replica Sets for High AvailabilityLesson 706 — Leaders and FollowersLesson 1366 — Leader Election and Failover
- Automatic instrumentation
- means frameworks detect and wrap common libraries (HTTP clients, database drivers, message queues) to create spans without code changes.
- Lesson 1224 — Automatic vs Manual InstrumentationLesson 1240 — OpenTelemetry OverviewLesson 1244 — Google Cloud Trace
- Automatic load balancing
- The broker distributes messages across available consumers
- Lesson 661 — Competing Consumers Pattern
- Automatic replica distribution
- No central coordinator needed to decide where replicas live
- Lesson 1466 — Replication with Consistent Hashing
- Automatic Sharding
- Unlike manual sharding we studied earlier, CockroachDB automatically distributes your data across nodes using range-based sharding.
- Lesson 334 — CockroachDB and Distributed SQL
- Automatic Update
- DNS responses now return your DR site's IP address instead
- Lesson 1440 — DNS and Traffic Management in DR
- automatically
- without manual intervention.
- Lesson 104 — Failover FundamentalsLesson 1111 — gRPC Deadline Propagation
- Automation-friendly
- Configuration as code, CI/CD integration
- Lesson 108 — Hardware vs Software Load Balancers
- Autonomy
- Frontend teams deploy client and BFF together, moving at their own pace
- Lesson 906 — BFF Ownership and Team Structure
- availability
- and **reliability**.
- Lesson 14 — Availability and Reliability RequirementsLesson 70 — Partitioning and Replication TogetherLesson 77 — Why Load Balancers Are NecessaryLesson 98 — What Are Health Checks?Lesson 104 — Failover FundamentalsLesson 168 — What is a CDN and Why Use ItLesson 198 — What is Database Replication?Lesson 377 — Eventual Consistency and Application Reconciliation (+26 more)
- Availability Goals
- Social feeds are often the primary engagement driver.
- Lesson 1633 — Non-Functional Requirements: Scale and Performance
- Availability impact
- Typically higher.
- Lesson 1327 — Active-Active vs Active-Passive AvailabilityLesson 1489 — Cross-Partition Transactions
- Availability improves
- If one server goes down, others can still serve requests
- Lesson 68 — What is Data Replication?
- Availability isn't all-or-nothing
- CAP treats availability as "every request gets a response," but real systems degrade gracefully.
- Lesson 492 — Limitations of CAP as a Framework
- Availability loss during partitions
- System becomes unavailable in affected regions
- Lesson 526 — The Cost of Strong Consistency
- availability over consistency
- .
- Lesson 371 — The Dynamo Paper: Context and MotivationLesson 375 — Sloppy Quorum and Hinted Handoff
- Availability SLO
- tells you if your service responds at all—but not *how well*.
- Lesson 1278 — Multiple SLOs for Comprehensive Coverage
- available
- across all data centers during network partitions, accepting that different users might see slightly different views temporarily.
- Lesson 497 — Social Media and Content Feeds (AP)Lesson 508 — Availability is Also a SpectrumLesson 1318 — Defining Availability and UptimeLesson 1322 — Availability vs Reliability: Key DifferencesLesson 1513 — Pre-Generating Short Code Pools
- Avoid API gateways when
- Lesson 879 — When to Introduce an API Gateway
- Avoid high-cardinality labels entirely
- Lesson 1207 — Metrics Cardinality and Performance Impact
- Avoid Sequential Hotspots
- Auto-incrementing IDs or timestamps cause all new writes to hit the latest partition.
- Lesson 1472 — Range Partition Key Selection
- Avoid user-specific identifiers
- as span tags—use them as searchable attributes only when essential
- Lesson 1258 — Cardinality Explosion
- Avoiding false positives
- A momentary network hiccup shouldn't mark a healthy server as dead
- Lesson 100 — Health Check Intervals and Timeouts
- Avoiding hotspots
- If sharding by `timestamp` alone, recent data gets hammered.
- Lesson 245 — Composite Shard Keys
- Avoiding Server Overload
- Lesson 1840 — Politeness Requirements for Web Crawling
- AWS App Mesh
- is Amazon's managed service mesh that works seamlessly with AWS services like ECS, EKS, EC2, and Fargate.
- Lesson 864 — AWS App Mesh and Cloud-Native Meshes
- AWS SNS → SQS
- , **Google Pub/Sub with subscriptions**, and **Kafka consumer groups**.
- Lesson 663 — Hybrid Patterns: Topic + Queue
- AWS X-Ray
- , **Google Cloud Trace**, and commercial offerings like Datadog APM provide:
- Lesson 1251 — Choosing a Tracing System
- Azure Active Directory
- for OAuth2/OpenID Connect authentication
- Lesson 899 — Azure API Management Features
- Azure Cosmos DB
- offers bounded staleness as a consistency level, popular for scenarios needing "fresh enough" data
- Lesson 549 — Bounded StalenessLesson 554 — Consistency Model Examples in Real Systems
- Azure Front Door
- Global Layer 7 with CDN integration
- Lesson 114 — Cloud Load Balancers (GCP and Azure)
- Azure Load Balancer
- Layer 4 only, regional or cross-zone
- Lesson 114 — Cloud Load Balancers (GCP and Azure)
B
- back queues
- enforcing per-host politeness.
- Lesson 1846 — Queue Router and Host MappingLesson 1849 — URL Frontier Persistence and Recovery
- Backend Duplication
- Lesson 907 — BFF Anti-Patterns and Pitfalls
- Backend for Frontend (BFF)
- , dramatically reduces client-side complexity and network round trips.
- Lesson 873 — Request Routing and Aggregation
- Backend Independence
- Services can change without forcing client updates
- Lesson 887 — API Composition and Aggregation
- Backend service changes
- frequently impacting multiple clients
- Lesson 879 — When to Introduce an API Gateway
- Backend services are protected
- from unauthenticated requests entirely
- Lesson 883 — Authentication at the Gateway
- Backfilling
- is reprocessing historical data after fixing bugs or adding new logic.
- Lesson 777 — Workflow Orchestration Patterns
- Background jobs
- Sending notifications, updating search indexes
- Lesson 659 — Queue Use Cases: Work DistributionLesson 1504 — Link Expiration and Retention Policies
- Background users
- (app open but not viewing): trigger in-app badge updates
- Lesson 1681 — Mobile Push Notification Integration
- background workers
- that:
- Lesson 1532 — Expiration and Time-to-LiveLesson 1651 — Asynchronous Fanout Processing
- Backoff intervals
- Exponential delays between retries (1s, 2s, 4s.
- Lesson 684 — Negative Acknowledgments and Redelivery
- Backpressure
- occurs when the queue communicates back to producers: "I'm getting full — slow down!
- Lesson 649 — Load Smoothing and BackpressureLesson 1080 — Queue Saturation and Backpressure LossLesson 1155 — Log Buffering and BackpressureLesson 1680 — Backpressure and Rate Limiting Updates
- Backpressure control
- Consumers naturally regulate their consumption speed
- Lesson 697 — Push vs Pull Consumption Models
- backup
- is a point-in-time copy of your data stored separately from the primary system.
- Lesson 1400 — What Are Backups and Why They MatterLesson 1401 — Backup vs Replication vs SnapshotsLesson 1444 — Communication Plans During Disasters
- Backup 2
- Weekly snapshots uploaded to cloud object storage (different region)
- Lesson 1407 — The 3-2-1 Backup Rule
- Backup and disaster recovery
- Replicas serve as live backups
- Lesson 198 — What is Database Replication?
- Backup challenges
- Each shard needs its own backup strategy.
- Lesson 264 — Operational Complexity of Sharded Systems
- Backup monitoring
- tracks the health of your backup processes continuously, while **alerting** notifies teams immediately when something goes wrong.
- Lesson 1410 — Backup Monitoring and Alerting
- Backup storage tiers
- match different storage technologies to different access patterns, balancing speed and cost.
- Lesson 1405 — Backup Storage TiersLesson 1429 — Geographic Backup Distribution
- Backup window time
- Copying 100 MB of changes beats copying 10 TB every night
- Lesson 1403 — Incremental Backups
- Backups
- are point-in-time copies of data designed for **disaster recovery**.
- Lesson 1401 — Backup vs Replication vs Snapshots
- backward compatibility
- means new versions still support old clients.
- Lesson 809 — Versioning and Backward CompatibilityLesson 1898 — Why API Versioning Matters
- Backward compatible
- New consumers can read old messages (add optional fields)
- Lesson 725 — Schema Registry and Evolution
- Backward recovery
- Use compensating transactions to undo completed steps
- Lesson 585 — Alternative: Saga Pattern Introduction
- Backward-compatible changes
- APIs must evolve without breaking existing consumers
- Lesson 791 — Independent Deployability
- Backward-compatible versioning
- to avoid breaking other services
- Lesson 808 — Team Coordination Overhead
- Bad (not idempotent)
- `balance = balance + 100` (running twice adds $200!
- Lesson 679 — At-Least-Once Delivery
- Bad shard key
- `created_date` — recent dates create hotspots as all new writes go to one shard
- Lesson 232 — Shard Key Selection
- BadgerDB/LevelDB
- Embedded databases for small-scale or single-node deployments
- Lesson 1245 — Trace Storage Backends
- Balance needed
- Configure connection timeout, idle timeout, and max lifetime appropriately for your workload patterns.
- Lesson 275 — Common Pooling Anti-Patterns
- Balanced
- Set W=N/2+1, R=N/2+1 (majority on both sides)
- Lesson 365 — Tunable Consistency with Quorum Reads and WritesLesson 373 — Replication and Quorum in DynamoLesson 556 — Read and Write Quorums
- Bandwidth
- Lesson 33 — Putting It All Together: Worked ExampleLesson 1159 — Log Aggregation Performance ConsiderationsLesson 1584 — Image/Video Hosting: Problem Definition and Scale
- Bandwidth ceiling
- Even a high-speed connection (1 Gbps) can only download so much per second.
- Lesson 1862 — Why Distribute a Web Crawler
- Bandwidth cost savings
- Once cached at the edge, the same file serves millions of users without touching your origin.
- Lesson 1609 — Why CDNs Are Essential for Media Hosting
- bandwidth costs
- when transferring pastes between regions or to CDN edge locations.
- Lesson 1562 — Content Compression and EncodingLesson 1621 — Compression and Format Optimization
- Bandwidth optimization
- Regional peering agreements reduce transit costs
- Lesson 1616 — Geographic Routing and DNS
- Bandwidth Savings
- Less data travels from your servers, reducing bandwidth costs.
- Lesson 168 — What is a CDN and Why Use ItLesson 173 — Content Types Suited for CDNsLesson 887 — API Composition and Aggregation
- Bandwidth vs Volume
- Sending full, unsampled logs gives you complete visibility but can saturate network links.
- Lesson 1159 — Log Aggregation Performance Considerations
- Banking and payments
- – transferring money requires exact balances, no partial updates
- Lesson 318 — When to Choose ACID or BASE
- BASE
- offers a more relaxed approach designed for systems that prioritize availability and partition tolerance over immediate consistency.
- Lesson 314 — BASE Properties Overview
- Base58
- Removes confusing characters like `0/O` and `l/I` = 58 characters
- Lesson 1500 — URL Length and Encoding Constraints
- Base62
- `[a-zA-Z0-9]` = 62 characters (alphanumeric, case-sensitive)
- Lesson 1500 — URL Length and Encoding Constraints
- Base62 encoding
- to create a URL-safe short code.
- Lesson 1508 — Hash-Based Generation ApproachLesson 1515 — Short URL Predictability Tradeoffs
- Base62 Encoding of IDs
- converts sequential database IDs (like auto-increment values) into short strings using alphanumeric characters (a-z, A-Z, 0-9).
- Lesson 1551 — Key Generation Strategy
- Base64
- Adds `+` and `/` = 64 characters (URL-unsafe without encoding)
- Lesson 1500 — URL Length and Encoding Constraints
- Basic Pattern (Fixed Window)
- Lesson 980 — Redis-Based Distributed Rate Limiting
- Basically Available
- The system guarantees availability, even if some parts fail
- Lesson 314 — BASE Properties Overview
- Batch
- tolerates delays (minutes to hours).
- Lesson 746 — Choosing Batch vs StreamLesson 1704 — Batching for EfficiencyLesson 1858 — DNS Prefetching and Batch Resolution
- Batch export job
- 30-second timeout (legitimately slow)
- Lesson 1118 — Per-Operation Timeout Configuration
- Batch jobs
- that run periodically (hourly, daily)
- Lesson 294 — Aggregation TablesLesson 1726 — Aggregation and Reporting
- batch layer
- is your source of absolute truth.
- Lesson 748 — Lambda Architecture: Batch LayerLesson 750 — Lambda Architecture: Serving Layer
- Batch notifications
- Don't send 50 pushes for 50 new posts; aggregate: "15 new posts in your feed"
- Lesson 1681 — Mobile Push Notification Integration
- Batch operations
- Update multiple counters in one Redis pipeline instead of separate calls
- Lesson 977 — Algorithm Implementation Patterns
- Batch processing
- Converting uploaded videos, generating thumbnails
- Lesson 659 — Queue Use Cases: Work DistributionLesson 690 — What is Event Streaming?Lesson 736 — What is Batch Processing?Lesson 746 — Choosing Batch vs Stream
- Batch processing maximizes throughput
- by collecting data into large groups before processing.
- Lesson 740 — Latency vs Throughput Tradeoffs
- Batch Resolution
- Workers collect batches of hostnames from their URL frontier and resolve them in parallel using async DNS libraries.
- Lesson 1869 — Scaling DNS Resolution
- Batch views
- – Complete, accurate datasets processed by the batch layer
- Lesson 750 — Lambda Architecture: Serving Layer
- Batch-ack
- Multiple messages acknowledged together for performance
- Lesson 681 — Acknowledgment Mechanisms
- Batching
- Producers don't send every message immediately.
- Lesson 724 — Kafka Performance TuningLesson 1250 — Trace Collector and Agent PatternsLesson 1259 — Network and Agent OverheadLesson 1914 — DataLoader and Batching Solutions
- BC asks
- "How does customer support continue serving clients while the data center is down?
- Lesson 1433 — Disaster Recovery vs Business Continuity
- Bearer Token Transport
- The client explicitly includes the token in the `Authorization` header (e.
- Lesson 918 — Cookie vs Bearer Token Transport
- Bearer tokens
- in headers require **explicit JavaScript handling**, making them immune to CSRF.
- Lesson 918 — Cookie vs Bearer Token Transport
- Before accepting a write
- , a replica checks: "Have I seen all the operations this write depends on?
- Lesson 548 — Causal Consistency Implementation
- Before running any experiment
- Lesson 1346 — Blast Radius and Safety Controls
- Begin transaction
- Mark the start of atomic operations
- Lesson 310 — Atomicity: All-or-Nothing Transactions
- Benefit
- Your system remains available for writes even during network partitions or node failures
- Lesson 366 — Sloppy Quorums and Hinted HandoffLesson 683 — Consumer Acknowledgment TimingLesson 717 — Rebalancing Protocol and Strategies
- Benefits
- Lesson 72 — Multi-Leader Replication ModelLesson 136 — Write-Behind (Write-Back) Caching PatternLesson 143 — Multi-Tier Caching PatternLesson 288 — Why Denormalization?Lesson 293 — Duplicate Critical FieldsLesson 583 — Alternative: Best Effort with Eventual ConsistencyLesson 1357 — Semi- Synchronous ReplicationLesson 1363 — Statement-Based vs Row-Based Replication (+2 more)
- Benefits of pull
- Lesson 697 — Push vs Pull Consumption Models
- Best for
- High-throughput Java applications where every millisecond counts.
- Lesson 274 — Connection Pool TechnologiesLesson 386 — Embedded Documents vs ReferencesLesson 849 — Load Balancing Strategies in Service MeshLesson 1245 — Trace Storage BackendsLesson 1439 — Data Replication for DRLesson 1762 — Client-Side vs Server-Side Typeahead
- Better availability
- than strong consistency during network partitions
- Lesson 1397 — Bounded Staleness Consistency
- Better balance
- With 150+ virtual nodes per server, data distributes more evenly across the ring
- Lesson 363 — Virtual Nodes and Load Distribution
- Better bandwidth efficiency
- Your servers send data once to the CDN, which distributes it thousands of times
- Lesson 125 — CDN as Edge Caching Layer
- Better cache hit rates
- Origin Shield maintains a larger, warmer cache
- Lesson 179 — Origin Shield: Protecting Origin Servers
- Better distribution
- A simple `user_id` hash might still produce hotspots if some users generate far more data than others.
- Lesson 245 — Composite Shard Keys
- Better for ephemeral/batch jobs
- that finish quickly
- Lesson 1197 — Pull vs Push Metrics Collection Models
- Better geo-distribution
- – Distance to replicas doesn't slow down writes
- Lesson 1356 — Asynchronous Replication: Speed and Risk
- Better monetization
- You can offer tiered pricing that reflects actual infrastructure costs, not arbitrary request counts.
- Lesson 992 — Cost-Based Rate Limiting
- Better monitoring
- Track saga progress through orchestrator state
- Lesson 591 — Orchestration-Based Sagas
- Better partition tolerance
- The system stays operational during network splits
- Lesson 560 — Eventual Consistency with Quorums
- Better storage (SSDs)
- reduces disk I/O bottlenecks
- Lesson 54 — Scaling Databases: Special Considerations
- Better throughput
- especially for write-heavy workloads
- Lesson 136 — Write-Behind (Write-Back) Caching Pattern
- Better tradeoff decisions
- When you know the simple solution, you can justify why each added complexity is necessary
- Lesson 34 — Start Simple: The Minimum Viable Design
- Betweenness centrality
- Measures how often a node sits on shortest paths between others—who's the essential bridge?
- Lesson 468 — Graph Algorithms: PageRank and Centrality
- BFS
- for general-purpose crawlers that need fresh, diverse content across many domains (like search engines).
- Lesson 1830 — Breadth-First vs Depth-First Crawling
- BFS advantages
- Lesson 1830 — Breadth-First vs Depth-First Crawling
- BGP routing decides
- Internet routers use Border Gateway Protocol (BGP) to determine the "shortest" path based on network hops
- Lesson 181 — Anycast Routing for CDNs
- Bidirectional Navigation
- Your observability UI should let you:
- Lesson 1249 — Integrating Traces with Logs and Metrics
- BigTable advantages
- Lesson 441 — HBase vs BigTable Design Differences
- Billing accuracy
- Track exactly how many requests each tenant made to charge appropriately
- Lesson 1825 — Monitoring and Analytics Per Tenant
- Binary blobs
- let you store images, serialized objects, or encrypted data.
- Lesson 341 — Data Types and Value Complexity
- Binary Data
- Raw bytes for storing files, encrypted data, or binary content
- Lesson 390 — BSON Format and Data Types
- Binary Log (binlog)
- Lesson 206 — Replication Logs and Mechanisms
- Bitrate variants
- Even at the same resolution, encode at different bitrates (e.
- Lesson 1601 — Video Transcoding Fundamentals
- Blackbox Monitoring
- observes your system as an external user would.
- Lesson 1266 — Blackbox vs Whitebox Monitoring
- blameless culture
- means treating system failures as organizational learning moments rather than individual mistakes deserving punishment.
- Lesson 1317 — Blameless Culture and Learning from FailureLesson 1350 — What is a Postmortem?
- Blazing fast reads
- feeds are pre-built, just fetch and display
- Lesson 1638 — Push (Write-Time) Feed Model
- Blazing fast writes
- no database latency in the critical path
- Lesson 136 — Write-Behind (Write-Back) Caching Pattern
- Blob Store
- Use distributed object storage (S3, HDFS) to hold actual HTML/content, keyed by hash
- Lesson 1870 — Content Storage and Deduplication
- Block full scans
- Reject requests without filters on massive tables
- Lesson 1897 — Performance Considerations and Limits
- Block or flag
- Reject malicious URLs immediately or flag suspicious ones for review
- Lesson 1540 — Spam and Malicious Link Detection
- Block producers
- Slow down the application (dangerous—can freeze critical paths)
- Lesson 1155 — Log Buffering and Backpressure
- Block reserved words
- Reject codes like `api`, `admin`, `delete`, `stats` that conflict with your service's routes
- Lesson 1514 — Custom Short URL Support
- Block storage
- works like a traditional hard drive attached to a server.
- Lesson 1588 — Object Storage vs Block Storage
- Blocking
- If the crawl delay hasn't elapsed, the queue remains blocked until the timer expires
- Lesson 1845 — Back Queue: Politeness Enforcement
- Blocking producers
- creates a domino effect: if Service A blocks waiting for queue space, its own queues fill, forcing *its* callers to block.
- Lesson 1080 — Queue Saturation and Backpressure Loss
- bloom filter
- is a space-efficient probabilistic data structure that answers: "Is this key *definitely not* in this SSTable?
- Lesson 416 — Read Path and Bloom FiltersLesson 429 — Read Path and Bloom FiltersLesson 1853 — Bloom Filters for URL Seen Checking
- Bloom filters
- for efficient reads.
- Lesson 433 — What is HBase?Lesson 437 — HBase Read Path and Bloom FiltersLesson 1533 — Access Control and Private URLs
- Body modification
- Convert JSON to XML, rename fields (`clientId` → `customer_id`), filter unnecessary data
- Lesson 882 — Request and Response Transformation
- Booking systems
- – preventing double-bookings for flights, hotels, or appointments
- Lesson 318 — When to Choose ACID or BASE
- BookKeeper
- (storage layer): Distributed log storage system called "bookies" that durably stores messages
- Lesson 730 — Apache Pulsar Architecture
- books
- (blobs) sit on shelves (object storage), while the **library catalog** (metadata database) tells you what books exist, who checked them out, and where to find them.
- Lesson 1590 — Metadata Database DesignLesson 1874 — What REST Means: Resource-Oriented Architecture
- both
- active checks provide baseline monitoring, while passive checks catch issues that only appear under real traffic patterns (like a server that handles pings fine but chokes on complex queries).
- Lesson 99 — Active vs Passive Health ChecksLesson 134 — Write-Through Caching PatternLesson 157 — Active Invalidation on WriteLesson 318 — When to Choose ACID or BASELesson 331 — What NewSQL IsLesson 337 — When to Choose NewSQLLesson 351 — Redis Persistence: AOF LogsLesson 596 — Forward Recovery vs Backward Recovery (+12 more)
- Bottleneck
- The central store becomes a single point of failure and a throughput limit.
- Lesson 1793 — Centralized vs Distributed Rate Limiting
- Bottleneck potential
- The shared store becomes a performance constraint at scale
- Lesson 979 — Centralized vs Decentralized Approaches
- Bottlenecks
- If 80% of traces show Service B taking longest (from **critical path analysis**), it's your slowest link.
- Lesson 1229 — Service Dependency Graphs
- bounded contexts
- each service focuses on a specific business capability with clear boundaries.
- Lesson 801 — Easier Onboarding and ContextLesson 817 — Identifying Service Boundaries by Data OwnershipLesson 824 — Avoiding Distributed MonolithsLesson 825 — Starting with a Modular Monolith
- Bounded convergence
- provides a concrete promise: "All replicas will converge within X milliseconds after the last write.
- Lesson 533 — Convergence Guarantees
- bounded staleness
- provides a middle ground between strong and eventual consistency by guaranteeing a maximum lag for replicas.
- Lesson 549 — Bounded StalenessLesson 554 — Consistency Model Examples in Real Systems
- Breadth coverage
- Ensures you sample widely across the web early
- Lesson 1830 — Breadth-First vs Depth-First Crawling
- Breadth-First Search (BFS)
- crawls all pages at one "level" before moving deeper.
- Lesson 1830 — Breadth-First vs Depth-First Crawling
- Breaking changes
- in backend services directly impact clients
- Lesson 870 — What is an API Gateway?Lesson 1905 — Breaking vs Non-Breaking Changes
- Broadcasts
- the query to all relevant shards (often all of them)
- Lesson 1769 — Horizontal Scaling of Search Infrastructure
- broker
- is a Kafka server that stores data and serves client requests.
- Lesson 700 — Kafka Overview and Core ComponentsLesson 704 — Brokers and Cluster Architecture
- Brokers
- (compute layer): Stateless servers that handle client connections, message routing, and subscription management
- Lesson 730 — Apache Pulsar Architecture
- Browser cache
- Instant, but only for one user
- Lesson 120 — Caching Hierarchy OverviewLesson 1766 — Caching Suggestions at Multiple Layers
- Browser-friendly
- Works seamlessly in browsers, cURL, and documentation
- Lesson 1899 — URI Versioning (Path-Based)
- Budget consistently underused
- Your SLO might be too conservative; consider tightening it or investing saved engineering effort elsewhere
- Lesson 1279 — Error Budgets: The Core Concept
- Budget Constraints
- Open-source solutions (Kong Community, Nginx) minimize licensing costs but increase operational overhead.
- Lesson 901 — Choosing the Right API Gateway TechnologyLesson 1260 — Cost-Benefit Analysis
- Budget depleted
- Freeze risky changes, focus on reliability improvements
- Lesson 1279 — Error Budgets: The Core Concept
- Buffering
- Queue spans temporarily when the backend is slow or unreachable
- Lesson 1250 — Trace Collector and Agent PatternsLesson 1259 — Network and Agent OverheadLesson 1698 — Message Queue for Decoupling
- Bugs surface immediately
- during development, not in production
- Lesson 301 — Schema Enforcement and Type Safety
- Build
- a new microservice that implements that feature
- Lesson 822 — The Strangler Fig Pattern for Migration
- Build muscle memory
- for your runbooks and playbooks without customer impact
- Lesson 1345 — Starting with Game Days
- Build verification
- compile-time checks, linting, security scanning
- Lesson 1314 — Release Engineering and Safe Deployment
- Built-in atomic operations
- `INCR`, `EXPIRE`, Lua scripts prevent race conditions
- Lesson 1807 — In-Memory vs Persistent Storage for Rate Limiting
- Built-in data structures
- Store complex session data like shopping carts using Redis hashes or lists.
- Lesson 356 — Redis as a Session Store
- Built-in health checks
- Automated monitoring of target health
- Lesson 113 — Cloud Load Balancers (AWS ELB/ALB)
- Built-in Observability
- Envoy generates detailed metrics, logs, and distributed traces out-of-the-box.
- Lesson 115 — Envoy Proxy Architecture
- Built-in Proxy
- A lightweight, native Go proxy that comes with Consul.
- Lesson 863 — Consul Connect: HashiCorp's Approach
- Bulk Operations
- Enable users to delete multiple pastes, change privacy settings in batch, or export their content— all requiring authorization checks to prevent unauthorized access.
- Lesson 1578 — User Accounts and Paste Management
- Bulkhead Pattern
- isolates different parts of your system into separate resource pools—just like the watertight compartments (bulkheads) in a ship's hull.
- Lesson 1337 — Bulkhead Pattern for Fault Isolation
- Bulkheads
- prevent resource exhaustion by isolating resource pools.
- Lesson 1074 — Bulkheads vs Circuit Breakers: Complementary PatternsLesson 1078 — Cascading Failure Propagation MechanicsLesson 1080 — Queue Saturation and Backpressure Loss
- Bulkheads provide resource isolation
- Even if one circuit breaker trips, the bulkhead ensures that failure is contained to a specific resource pool (thread pool, connection pool, or semaphore).
- Lesson 1085 — Preventing Cascades with Circuit Breakers and Bulkheads
- Burst allowance
- You can spend $2,000 in one day, but it still counts against your monthly total
- Lesson 994 — Quota Management and Burst AllowancesLesson 1824 — Tiered Rate Limiting
- Burst credits
- Allow exceeding the steady-state rate temporarily if quota headroom exists
- Lesson 994 — Quota Management and Burst Allowances
- Burst scenarios
- Send traffic that rapidly hits the limit to verify bucket/counter algorithms respond correctly
- Lesson 997 — Testing and Monitoring Rate Limiters
- Bursty traffic
- Accept thousands of requests instantly, process them gradually over time
- Lesson 650 — Temporal Decoupling
- Business analytics
- track API usage patterns: which clients call which endpoints most, geographic distribution, peak usage times, and feature adoption rates.
- Lesson 890 — Logging and Metrics Collection
- Business Continuity
- is the broader organizational strategy for keeping critical business operations running during *and* after any disruption—including non-technical issues like pandemics, supply chain failures, or key personnel losses.
- Lesson 1433 — Disaster Recovery vs Business Continuity
- Business impact
- Shorter windows mean faster recovery attempts but more probing traffic during actual outages.
- Lesson 1059 — Timeout Windows and Reset Logic
- Business Impact Analysis (BIA)
- is the structured process of identifying what each service's downtime and data loss actually *cost* your organization, so you can set appropriate RPO/RTO targets and justify the investment needed to meet them.
- Lesson 1420 — Business Impact Analysis for RPO/RTO
- Business impact drives urgency
- A P0 for a payments service during Black Friday demands instant all-hands response.
- Lesson 1298 — Incident Severity Levels and Escalation
- Business Insights
- Beyond technical health, monitoring can track business metrics like transaction volumes or user sign-ups.
- Lesson 1262 — What is Monitoring and Why It Matters
- Business Logic Creep
- Lesson 907 — BFF Anti-Patterns and Pitfalls
- Business Metrics
- Lesson 1825 — Monitoring and Analytics Per Tenant
- Business priorities
- Admin edits override user edits
- Lesson 1383 — Application-Level Conflict Resolution
- Business priority rules
- VIP customers, loyalty tiers, regulatory requirements
- Lesson 1387 — Custom Merge Functions
- Business rules
- Always replicate content from premium users or verified creators
- Lesson 1631 — Multi-Region Replication Strategy
- Business units
- Separate data by department, brand, or subsidiary
- Lesson 1452 — List-Based Partitioning
- Business-critical events
- User sign-ups, purchases, authentication successes/failures, permission changes.
- Lesson 1129 — What to Log vs What Not to Log
- Business-Focused Boundaries
- Services map to business domains (inventory, shipping, recommendations), not technical layers.
- Lesson 781 — What are Microservices?
C
- cache
- first.
- Lesson 6 — Components of a System Design SolutionLesson 142 — Look-Aside vs Inline Cache TopologiesLesson 1552 — Initial Architecture Diagram
- Cache control issues
- You can't force clients to refresh cached DNS records
- Lesson 116 — DNS-Based Load Balancing
- Cache frequent queries
- Expensive aggregations or reports should be pre-computed or cached
- Lesson 1897 — Performance Considerations and Limits
- Cache hit
- Data is in memory → ultra-fast return
- Lesson 126 — Database Internal Caching (Buffer Pool)Lesson 131 — Cache-Aside (Lazy Loading) PatternLesson 172 — Cache Hit vs Cache Miss at the EdgeLesson 1523 — Caching Layer ArchitectureLesson 1539 — QR Code GenerationLesson 1558 — Read Path: Cache-Aside PatternLesson 1569 — CDN Integration for Paste DeliveryLesson 1702 — User Preferences Lookup
- Cache Hit Rates
- measure how often results come from cache versus requiring expensive index lookups.
- Lesson 1777 — Query Performance Monitoring
- cache hit ratio
- is the percentage of requests served from cache.
- Lesson 27 — Memory Requirements for CachingLesson 129 — Cache Hit Ratio OptimizationLesson 172 — Cache Hit vs Cache Miss at the Edge
- Cache hit ratios
- (how often edge serves content without origin)
- Lesson 191 — CDN Provider Feature ComparisonLesson 1609 — Why CDNs Are Essential for Media Hosting
- cache invalidation
- (marking content as stale) come in.
- Lesson 185 — Purging and Cache Invalidation StrategiesLesson 660 — Pub-Sub Use Cases: Event Broadcasting
- Cache invalidation complexity
- You must decide when to remove or update cached entries.
- Lesson 132 — Cache-Aside: Pros and Cons
- Cache Invalidation Problem
- you learned about earlier by providing a simple, time-based invalidation strategy.
- Lesson 156 — Time-Based Expiration (TTL)
- Cache is optional
- The database is the source of truth; cache failures don't break your system
- Lesson 131 — Cache-Aside (Lazy Loading) Pattern
- Cache key structure
- Include user ID, action, and resource.
- Lesson 951 — Caching Authorization Decisions
- Cache miss
- Data must be loaded from disk → slower, but then cached for next time
- Lesson 126 — Database Internal Caching (Buffer Pool)Lesson 131 — Cache-Aside (Lazy Loading) PatternLesson 172 — Cache Hit vs Cache Miss at the EdgeLesson 179 — Origin Shield: Protecting Origin ServersLesson 1523 — Caching Layer ArchitectureLesson 1539 — QR Code GenerationLesson 1558 — Read Path: Cache-Aside PatternLesson 1569 — CDN Integration for Paste Delivery (+1 more)
- Cache results
- Store threat verdicts to avoid repeated API calls for the same domain
- Lesson 1540 — Spam and Malicious Link Detection
- Cache subscribers receive event
- across different services or regions
- Lesson 158 — Event-Based Invalidation
- Cache tagging
- means attaching labels (tags) to cache entries when you store them.
- Lesson 164 — Cache Tagging and Grouping
- Cache tags/keys
- Use consistent naming so one invalidation command affects all layers
- Lesson 163 — Multi-Level Cache Invalidation
- Cache Validation Results
- Lesson 950 — Auth Service Single Point of Failure
- Cache warming
- solves this problem by stocking your kitchen *before* opening the doors.
- Lesson 140 — Cache Warming StrategiesLesson 161 — Cache Warming StrategiesLesson 1611 — Multi-Tier Caching Architecture
- cache-aside pattern
- (also called lazy loading), your application code takes full responsibility for managing the cache.
- Lesson 131 — Cache-Aside (Lazy Loading) PatternLesson 133 — Read-Through Caching PatternLesson 1523 — Caching Layer ArchitectureLesson 1558 — Read Path: Cache-Aside PatternLesson 1722 — Real- Time Preference Updates
- Cache-Control
- is the modern workhorse.
- Lesson 121 — Browser Caching and HTTP HeadersLesson 1570 — CDN Cache Control Headers
- cache-control headers
- from backend responses to guide caching decisions.
- Lesson 888 — Caching at the GatewayLesson 1569 — CDN Integration for Paste Delivery
- Cached inheritance maps
- Pre-compute effective permissions for performance
- Lesson 939 — Permission Inheritance and Hierarchies
- Caching
- Lesson 33 — Putting It All Together: Worked ExampleLesson 39 — Trade-offs Over Best PracticesLesson 338 — What is a Key-Value Store?Lesson 343 — Time-to-Live and ExpirationLesson 1042 — Idempotency vs Performance TradeoffsLesson 1914 — DataLoader and Batching Solutions
- Caching aggressively
- Cache celebrity data at multiple levels (CDN, application cache) with longer TTLs.
- Lesson 257 — Celebrity Problem in Social Graphs
- Caching is critical
- With 99%+ operations being reads, aggressive caching at multiple levels (CDN, application cache, database query cache) becomes non-negotiable for performance and cost.
- Lesson 1636 — Capacity Estimation: Feed Reads vs Writes
- Caching Strategy
- Expired links must be evicted from cache (lesson 1502) to prevent serving dead redirects.
- Lesson 1504 — Link Expiration and Retention Policies
- Caching-friendly
- Different versions can have independent cache policies
- Lesson 1899 — URI Versioning (Path-Based)
- Calculate percentiles
- from this distribution (P50, P95, P99)
- Lesson 1117 — Adaptive Timeouts Based on Historical Latency
- Calculate remaining budget
- after accounting for time already spent
- Lesson 1098 — Per-Hop Timeout Budgets
- Calculate split point
- – find the midpoint key in the partition's range
- Lesson 1475 — Dynamic Range Splitting
- Calculates remaining time
- Downstream services see how much budget is left
- Lesson 1104 — gRPC Timeout Propagation
- Calculation
- Lesson 25 — Storage Estimation Basics
- Calendar windows
- This week, this month, this quarter
- Lesson 1274 — SLI Measurement Windows and AggregationLesson 1277 — SLO Time Windows: Rolling vs Calendar
- Camunda
- provides a visual workflow designer and execution engine, popular in enterprise settings.
- Lesson 598 — Saga Frameworks and Real-World Adoption
- CAN-SPAM
- (US) require verifiable records showing when users opted out, what they opted out of, and that you stopped messaging them accordingly.
- Lesson 1728 — Opt-Out and Compliance Tracking
- Canary deployments
- gradually shift traffic to a new version.
- Lesson 848 — Traffic Management and RoutingLesson 1314 — Release Engineering and Safe Deployment
- Cancel the stale request
- for "ama"—its results are obsolete.
- Lesson 1763 — Debouncing and Request Optimization
- Cannot aggregate
- percentiles across multiple instances (you can't average p99s meaningfully)
- Lesson 1186 — Summary Metrics
- Cannot be trusted
- because clients can be modified, bypassed, or malicious
- Lesson 1789 — Client-Side vs Server-Side Rate Limiting
- Cannot revoke mid-lifetime
- Valid token stays valid until expiration
- Lesson 916 — Session vs Token Tradeoffs
- CAP Availability
- = Both branches stay open and answer questions using whatever information they have locally, even if it's outdated
- Lesson 485 — Availability in CAP Context
- CAP Theorem
- (also called Brewer's Theorem) states that any distributed database system can simultaneously guarantee **at most two** of these three properties:
- Lesson 481 — What CAP Theorem StatesLesson 484 — Consistency in CAP Context
- Capacity limits
- Your dataset must fit in available memory
- Lesson 349 — Redis In-Memory Storage ModelLesson 1491 — Data Skew and Cardinality Issues
- Capacity model
- | Pre-provisioned per shard | Broker disk/network based |
- Lesson 728 — AWS Kinesis Overview
- Capacity Planning
- Understanding resource utilization trends helps you scale proactively, not reactively during an outage.
- Lesson 1262 — What is Monitoring and Why It MattersLesson 1323 — Mean Time Between Failures (MTBF)Lesson 1825 — Monitoring and Analytics Per Tenant
- Carbon
- The listener daemon that receives metrics over the network (typically via plaintext protocol or pickle)
- Lesson 1202 — Graphite Time-Series Database
- Cardinality
- (enough unique values)
- Lesson 397 — Shard Key SelectionLesson 1178 — Metric Cardinality and LabelsLesson 1203 — InfluxDB and Time-Series DatabasesLesson 1258 — Cardinality ExplosionLesson 1269 — Time Series Databases for MetricsLesson 1491 — Data Skew and Cardinality Issues
- Cardinality limits
- Intentionally push high-cardinality metrics to ensure your system rejects or samples them appropriately—preventing production label explosions.
- Lesson 1218 — Testing Metric Pipelines
- Careful shard key selection
- Design so related data lives on the same shard (like keeping all data for a user together)
- Lesson 261 — Distributed Transactions Across Shards
- Carrier filtering
- Spam filters may block messages with certain keywords or patterns
- Lesson 1685 — SMS Notifications
- Cart updates are low-stakes
- If two data centers briefly disagree about your cart contents, no money has changed hands yet
- Lesson 498 — Shopping Cart Systems (AP)
- Cascade options
- You can configure what happens on delete—cascade the delete to child records, set foreign keys to NULL, or reject the operation
- Lesson 300 — Foreign Keys and Referential Integrity
- Cascading delays
- Slow downstream services block upstream callers, even when the top-level request already timed out
- Lesson 1096 — Why Timeouts Must Propagate
- Cascading effects
- Too short, and you risk repeatedly hammering a struggling service.
- Lesson 1059 — Timeout Windows and Reset LogicLesson 1125 — Timeout Testing and Chaos Engineering
- Cascading failures
- Service A's error might be caused by Service B, which is actually failing because of Service C
- Lesson 807 — Debugging and TroubleshootingLesson 1043 — What Is a Circuit Breaker?
- Cascading invalidation
- Application triggers invalidation at each layer sequentially
- Lesson 163 — Multi-Level Cache Invalidation
- Cassandra
- took BigTable's columnar model and storage engine (LSM trees, SSTables, compaction) but married them with Dynamo's decentralized, peer-to-peer architecture.
- Lesson 450 — BigTable's Influence on Modern SystemsLesson 494 — AP Systems: Prioritizing AvailabilityLesson 517 — PA/EL Systems: Availability and Latency FirstLesson 521 — PACELC Tradeoffs in Real SystemsLesson 533 — Convergence GuaranteesLesson 1242 — Zipkin Architecture and Design
- Cassandra, Elasticsearch, Kafka
- (as a buffer), and in-memory stores for testing.
- Lesson 1241 — Jaeger Architecture and Components
- Catch-up
- Other replicas begin replicating from the newly promoted primary
- Lesson 207 — Replica Promotion and Failover Basics
- Category diversity
- Include URLs spanning different topics, languages, and regions.
- Lesson 1828 — Seed URLs and Starting Point
- Causal consistency
- sits in the middle: if operation A causally influences operation B, everyone sees A before B.
- Lesson 507 — Consistency is a Spectrum in PracticeLesson 541 — The Consistency SpectrumLesson 607 — Consensus vs Consistency ModelsLesson 1396 — Implementing Consistency with Vector Clocks
- CDN
- strategies for static content delivery.
- Lesson 95 — Geographic/Proximity-Based RoutingLesson 120 — Caching Hierarchy Overview
- CDN Edge Cache
- Deploy regional edge servers that cache popular prefix responses.
- Lesson 1766 — Caching Suggestions at Multiple Layers
- CDN edge locations
- using the push model you studied earlier, minimizing propagation delay.
- Lesson 1630 — Live Streaming Architecture
- CDN integration
- , and **pennies-per-GB pricing**.
- Lesson 1550 — Object Storage for Paste ContentLesson 1593 — Distributed File System Considerations
- CDN offloading
- Popular links can be cached at edge, dramatically reducing origin bandwidth
- Lesson 1499 — Bandwidth Requirements for Redirects
- CDN provider
- (Cloudflare, AWS CloudFront, Fastly have different rates)
- Lesson 30 — CDN Bandwidth and Cost Estimation
- Celebrity engagement
- (how often they post) also matters.
- Lesson 1658 — Fanout Strategy Selection Criteria
- Celebrity Post Cache
- When a celebrity posts, store it in a dedicated, shared cache keyed by the celebrity's ID
- Lesson 1655 — Celebrity Follower Caching
- Celebrity post latency
- Track high-follower accounts separately—they're outliers
- Lesson 1657 — Measuring Fanout Performance
- Celebrity posts
- Often 1-5 seconds to reach millions (high priority)
- Lesson 1671 — Real-Time Requirements for Social Feeds
- Celebrity User Problem
- when one key generates disproportionate traffic that a single partition cannot handle, creating a **hot spot** that degrades performance for everyone hitting that partition.
- Lesson 1483 — Celebrity User Problem
- Celebrity/influencer effect
- creates hotspots when specific entities generate massive traffic.
- Lesson 234 — Data Distribution and Hotspots
- Centralization
- Update "order shipped" message once, affects all future notifications instantly—no code deploys needed.
- Lesson 1701 — Template Service for Content
- Centralized Aggregation
- uses log shipping tools (Fluentd, Logstash) or streaming platforms (Kafka) to funnel events to a single data store—often a time-series database or specialized SIEM (Security Information and Event Management) system.
- Lesson 954 — Distributed Auth Audit Logging
- Centralized control
- The monitoring system controls scrape intervals and targets
- Lesson 1197 — Pull vs Push Metrics Collection Models
- Centralized logging
- means all services send their logs to a single, unified system where you can search, filter, and analyze them together.
- Lesson 1148 — Centralized Logging ArchitectureLesson 1169 — Centralized vs Localized Logging
- Centralized logic
- Instead of every microservice implementing routing rules, one proxy handles it for all connections.
- Lesson 222 — Proxy-Based Read-Write SplittingLesson 591 — Orchestration-Based Sagas
- Centralized management
- Change a role's permissions once, affects all users with that role
- Lesson 933 — Role-Based Access Control (RBAC) Fundamentals
- Centralized policies
- Apply rate limiting, authentication differently per version
- Lesson 1907 — Gateway-Level Version Routing
- Centralized Policy Definition
- Set timeout rules in YAML or configuration APIs once, apply everywhere.
- Lesson 1126 — Timeout Configuration in Service Mesh
- Centralized Politeness Service
- Workers consult a shared politeness table (Redis or similar) before crawling.
- Lesson 1868 — Coordinating Politeness Across Workers
- Centralized Session Store
- Lesson 947 — Distributed Session Management
- Centralized State
- All servers query the same counters
- Lesson 980 — Redis-Based Distributed Rate Limiting
- Centralized validation
- Route token checks through a single service with one authoritative clock.
- Lesson 949 — Clock Skew and Token Validation
- Certificate Management
- You only need to install and renew SSL certificates on the load balancer, not across dozens or hundreds of backend servers.
- Lesson 118 — SSL/TLS Termination at Load BalancersLesson 861 — Istio: Architecture and Components
- Certificate management simplicity
- Instead of distributing SSL certificates to dozens of microservices, you manage them in one place —the gateway.
- Lesson 891 — SSL/TLS Termination
- Challenge at scale
- Comparing every new fingerprint against billions of stored ones is impractical.
- Lesson 1855 — Near-Duplicate Detection with Simhash
- Challenges
- Lesson 72 — Multi-Leader Replication ModelLesson 158 — Event-Based InvalidationLesson 1422 — Incremental Backup StrategyLesson 1452 — List-Based PartitioningLesson 1520 — Primary Key Selection: Auto-Increment vs UUIDLesson 1788 — High-Level Architecture: Edge vs Centralized
- Challenges with push
- Lesson 697 — Push vs Pull Consumption Models
- Change Data Capture
- monitors databases for INSERT, UPDATE, and DELETE operations, turning these changes into events that flow through streaming pipelines.
- Lesson 776 — Change Data Capture Tools
- Changing data types
- Switching `age` from integer to string breaks parsing logic
- Lesson 1905 — Breaking vs Non-Breaking Changes
- Channel Provider Abstraction
- defines a common contract that all vendors must implement:
- Lesson 1690 — Channel Provider AbstractionLesson 1695 — Fallback and Retry Logic
- Channel-specific formats
- (HTML for email, plain text for SMS)
- Lesson 1701 — Template Service for Content
- Channel-specific workers
- that handle the actual sending:
- Lesson 1696 — Notification System High-Level Architecture
- Chaos Monkey
- randomly terminates virtual machine instances in production.
- Lesson 1348 — Chaos Engineering Tools
- Character limits
- 160 characters for standard GSM; Unicode reduces to 70
- Lesson 1685 — SMS Notifications
- Character Normalization
- Store multiple normalized forms of each query.
- Lesson 1768 — Typeahead for Multi-Language Support
- Character set size
- How many distinct characters can you use?
- Lesson 1500 — URL Length and Encoding Constraints
- Characteristics
- Lesson 80 — Layer 4 vs Layer 7 Load BalancingLesson 183 — Pull vs Push CDN ModelsLesson 644 — Synchronous vs Asynchronous CommunicationLesson 1405 — Backup Storage Tiers
- Chat applications
- maintaining active WebSocket connections
- Lesson 56 — What Makes a Service StatefulLesson 357 — Redis Pub/Sub for Real-Time Messaging
- Chatty inter-service calls
- Service A calling services B, C, and D synchronously to complete a single user request creates tight coupling.
- Lesson 824 — Avoiding Distributed Monoliths
- Check availability
- Query your database to see if `ceo-blog` already exists
- Lesson 1514 — Custom Short URL SupportLesson 1531 — Custom Aliases and Vanity URLs
- Check before executing
- Each step first queries: "Did I already finish?
- Lesson 1037 — Idempotency in Distributed Workflows
- Check cache first
- Look up the short URL key in your caching layer (Redis, Memcached)
- Lesson 1524 — Cache-Aside Pattern for URL LookupsLesson 1539 — QR Code GenerationLesson 1558 — Read Path: Cache-Aside PatternLesson 1664 — Timeline Caching Strategies
- Check limit
- If the returned value exceeds your threshold, reject the request
- Lesson 1794 — Redis-Based Rate Limiting with INCR
- Check locally first
- If the counter exists and is within the current time window, increment it in-memory (nanoseconds, not milliseconds)
- Lesson 1801 — Local Caching for Performance
- Check positions
- For each matching doc, verify "learning" appears exactly one position after "machine"
- Lesson 1751 — Phrase Queries and Positional Indexes
- Check robots.txt regularly
- Sites update their rules; cache the file but refresh it periodically (every 24 hours is common).
- Lesson 1831 — Robots.txt and Crawl Etiquette
- Check SSTables on disk
- immutable files stored in GFS, potentially many of them
- Lesson 449 — Read Path and Compaction
- Check the cache first
- When your app needs data, it looks in the cache
- Lesson 131 — Cache-Aside (Lazy Loading) Pattern
- Check the MemStore first
- – Since recent writes live in memory, HBase looks here before touching disk
- Lesson 437 — HBase Read Path and Bloom Filters
- Check the MemTable
- the in-memory structure holding recent writes
- Lesson 449 — Read Path and Compaction
- Check the memtable first
- (in-memory write buffer) — fastest lookup
- Lesson 429 — Read Path and Bloom Filters
- Checkout timeout
- (or *connection wait timeout*): How long a thread will wait to acquire a connection from the pool before giving up.
- Lesson 272 — Connection Timeouts and Limits
- Checkpoint after each step
- Store completion markers (e.
- Lesson 1037 — Idempotency in Distributed Workflows
- Checkpoint the frontier periodically
- to durable storage.
- Lesson 1849 — URL Frontier Persistence and Recovery
- Checksum validation
- Comparing cryptographic hashes of backed-up data against originals
- Lesson 1408 — Backup Verification and Testing
- Choose Availability (AP)
- Accept writes on both sides of the partition.
- Lesson 483 — The CAP Tradeoff During Partitions
- Choose Consistency (CP)
- Reject write requests until the partition heals.
- Lesson 483 — The CAP Tradeoff During Partitions
- Choose counters
- when you need maximum URL brevity, can handle centralized ID coordination, and predictability isn't a security concern (most URL shorteners).
- Lesson 1516 — Counter-Based vs UUID Approaches
- Choose document stores when
- Lesson 419 — Wide-Column vs Document Stores
- Choose GraphQL when
- Lesson 1918 — gRPC vs REST vs GraphQL: When to Use Each
- Choose gRPC when
- Lesson 1918 — gRPC vs REST vs GraphQL: When to Use Each
- Choose REST when
- Lesson 1918 — gRPC vs REST vs GraphQL: When to Use Each
- Choose UUIDs
- when you need truly distributed generation without coordination, security through obscurity matters, or you're operating at extreme scale where multiple data centers must generate IDs independently.
- Lesson 1516 — Counter-Based vs UUID Approaches
- Choose wide-column stores when
- Lesson 419 — Wide-Column vs Document Stores
- Choreography
- distributes logic across services—each listens for events and decides what to do next.
- Lesson 592 — Choreography vs Orchestration Tradeoffs
- Chronological feeds
- display posts in time order (newest first).
- Lesson 1644 — Feed Personalization and Ranking Requirements
- Chubby
- , a distributed lock service similar to ZooKeeper, for critical coordination tasks: discovering tablet servers, storing schema information, and managing master election.
- Lesson 439 — Google BigTable Architecture
- Chunking
- Data is organized into time-based blocks (e.
- Lesson 1269 — Time Series Databases for MetricsLesson 1586 — Multipart Upload for Large Files
- circuit breaker
- works like an electrical circuit breaker in your home.
- Lesson 105 — Graceful Degradation and Circuit BreakingLesson 1030 — Combining Retries with Circuit BreakersLesson 1043 — What Is a Circuit Breaker?
- Circuit breaker integration
- Does a slow dependency correctly trip the breaker before exhausting resources?
- Lesson 1125 — Timeout Testing and Chaos Engineering
- Circuit breaker per dependency
- Payments failing won't flood the system with doomed retries
- Lesson 1085 — Preventing Cascades with Circuit Breakers and Bulkheads
- Circuit breakers
- prevent cascading failures by stopping requests to failing dependencies.
- Lesson 1074 — Bulkheads vs Circuit Breakers: Complementary PatternsLesson 1078 — Cascading Failure Propagation MechanicsLesson 1080 — Queue Saturation and Backpressure LossLesson 1784 — Non- Functional Requirements: Latency and Availability
- Circuit Breaking
- When a downstream service becomes unhealthy, the proxy can "trip" a circuit breaker— temporarily stopping requests to that service to prevent cascading failures, similar to an electrical circuit breaker protecting your home.
- Lesson 839 — Data Plane: Proxy ResponsibilitiesLesson 840 — Data Plane: Envoy Proxy FundamentalsLesson 877 — The API Gateway Bottleneck RiskLesson 1167 — Avoid Log Explosion
- Circuit Breaking and Fallbacks
- Lesson 950 — Auth Service Single Point of Failure
- Circular transactions
- Money moving A → B→ C→ A to legitimize stolen funds
- Lesson 474 — Fraud Detection Through Pattern Matching
- Classification models
- detect categories like violence, adult content, hate symbols
- Lesson 1629 — Content Moderation at Scale
- Clean separation
- Backend services focus on business logic, not version negotiation
- Lesson 1907 — Gateway-Level Version Routing
- Cleaner URLs
- Resource paths stay stable; `/users/123` is always the same user
- Lesson 1902 — Content Negotiation with Media Types
- Cleanup
- The hints are deleted from the temporary nodes
- Lesson 1372 — Sloppy Quorums and Hinted Handoff
- Cleanup after expiration
- old keys can be purged to save space
- Lesson 1004 — Server-Side State for Idempotency
- Cleanup job effectiveness
- Instrument your scheduled deletion tasks (from lesson 1568) to emit metrics like `pastes_deleted_per_run`, `cleanup_duration_ms`, and `failed_deletion_count`.
- Lesson 1574 — Monitoring Expiration and Storage Health
- Cleanup job failures
- Job didn't run, crashed, or deleted zero records when expired data exists
- Lesson 1574 — Monitoring Expiration and Storage Health
- Clear accountability
- The team that needs the feature builds it
- Lesson 906 — BFF Ownership and Team Structure
- Clear boundaries
- Each team's communication pattern becomes a service boundary
- Lesson 788 — Organizational Alignment: Conway's Law
- Clear interfaces
- Simple APIs with obvious behaviors reduce integration errors
- Lesson 1315 — Simplicity as a Core Value
- Clear popularity hierarchies
- Some items are consistently more popular
- Lesson 147 — Least Frequently Used (LFU)
- Clearer Dependencies
- In a monolith, modules can silently depend on each other in tangled ways.
- Lesson 797 — Improved Code Maintainability
- Click count
- The simplest metric—increment a counter per redirect.
- Lesson 1505 — Analytics and Tracking Requirements
- Clicked
- The user took action on a link or button within the notification (e.
- Lesson 1724 — Notification Analytics Events
- ClickHouse
- Columnar database offering excellent compression and fast analytical queries at lower cost than Elasticsearch
- Lesson 1245 — Trace Storage Backends
- client
- sends a request to a **server** (often through a **load balancer**).
- Lesson 6 — Components of a System Design SolutionLesson 921 — OAuth2 Roles: Resource Owner, Client, ServerLesson 1879 — HTTP Status Codes: Choosing the Right Response
- Client → Gateway (HTTP)
- → Gateway translates → **Gateway → Service (gRPC)**
- Lesson 874 — Protocol Translation
- Client automatically includes cookie
- in every subsequent request
- Lesson 909 — Session-Based Authentication Fundamentals
- Client creates code challenge
- Hash the verifier with SHA256, then base64url-encode it
- Lesson 923 — PKCE: Proof Key for Code Exchange
- Client Credentials Flow
- solves this by allowing a service to authenticate using its own identity.
- Lesson 925 — Client Credentials Flow
- Client diversity is high
- Your iOS, Android, and web apps have fundamentally different data needs, screen sizes, or performance constraints.
- Lesson 908 — When to Use BFF Pattern
- Client errors
- (400, 401, 404) → ignore, these aren't service health issues
- Lesson 1048 — Failure Thresholds and Detection
- Client initiates
- Opens WebSocket, sends subscription request
- Lesson 1915 — GraphQL Subscriptions for Real-Time Data
- Client needs are similar
- If all your clients consume roughly the same data and services, a shared gateway is simpler.
- Lesson 908 — When to Use BFF Pattern
- Client optimization
- Each BFF tailors responses perfectly for its client (mobile gets compact JSON, web gets richer data)
- Lesson 904 — BFF vs Single Gateway Tradeoffs
- Client overwhelm
- Even if the response arrives, the client's browser or application must parse and render massive datasets, freezing the UI and consuming device resources unnecessarily.
- Lesson 1887 — Why Pagination Is Essential at Scale
- Client Request
- Lesson 1882 — Content Negotiation and Accept Headers
- Client retry windows
- Most well-behaved clients retry failed requests within seconds to minutes, not days
- Lesson 1012 — Idempotency Key Expiration Strategy
- Client SLAs
- If you promise clients they can safely retry for 48 hours, honor that
- Lesson 1012 — Idempotency Key Expiration Strategy
- Client Storage
- Client stores the token (typically in memory, localStorage, or a cookie)
- Lesson 912 — Token-Based Authentication Fundamentals
- client-side
- (browser downloads data, filters locally), **server-side** (every keystroke triggers a backend query), or use a **hybrid approach**.
- Lesson 1762 — Client-Side vs Server-Side TypeaheadLesson 1789 — Client-Side vs Server-Side Rate Limiting
- Client-side enforcement
- is essential when:
- Lesson 1123 — Client-Side vs Server-Side Timeout Enforcement
- Client-side rendering
- delivers raw code to browsers, which apply highlighting via JavaScript libraries (`highlight.
- Lesson 1575 — Syntax Highlighting and Language Detection
- Client-side timeout
- The maximum time a client will wait for a response before giving up
- Lesson 1090 — Client-Side vs Server-Side Timeouts
- Client-side timeouts
- protect the caller from waiting indefinitely—they're about resource management and user experience.
- Lesson 1123 — Client-Side vs Server-Side Timeout Enforcement
- Client-side timestamps
- The client remembers the timestamp of its last write and includes it in read requests, ensuring it only reads from sufficiently up-to-date replicas.
- Lesson 1359 — Read-Your-Writes Consistency with Replicas
- Client-side tracking
- The client remembers the last-seen transaction ID or timestamp.
- Lesson 1360 — Monotonic Reads Across Replicas
- Client-to-Server (External)
- Between end users and your application servers
- Lesson 78 — Load Balancer Placement in Architecture
- clock drift
- , and **concurrent load**—conditions where subtle bugs hide.
- Lesson 988 — Testing Distributed Rate LimitersLesson 1381 — Limitations of Last-Write-Wins
- clock skew
- small time differences between machines.
- Lesson 949 — Clock Skew and Token ValidationLesson 977 — Algorithm Implementation PatternsLesson 1105 — Clock Skew and Timeout DriftLesson 1114 — Clock Skew and Time SynchronizationLesson 1381 — Limitations of Last-Write-WinsLesson 1799 — Handling Clock Skew Across Nodes
- Clock skew tolerance (leeway)
- Accept tokens within a grace period (e.
- Lesson 949 — Clock Skew and Token Validation
- Closed
- Lesson 1047 — The Three States: Half-OpenLesson 1050 — State Transition MechanicsLesson 1052 — Circuit Breaker Reset LogicLesson 1060 — Half-Open State TestingLesson 1803 — Handling Redis Failures and Fallbacks
- Closed → Open
- Send enough failures to breach the threshold, then verify the breaker opens and fast-fails subsequent requests
- Lesson 1065 — Testing Circuit Breaker Behavior
- Closed circuit
- Retries execute normally for transient errors
- Lesson 1030 — Combining Retries with Circuit Breakers
- Closed state
- Normal operation, requests flow through
- Lesson 105 — Graceful Degradation and Circuit BreakingLesson 889 — Circuit Breaking and FallbacksLesson 1045 — The Three States: Closed
- Closed to Open
- when failures exceed your configured threshold within a time window.
- Lesson 1050 — State Transition Mechanics
- Closeness centrality
- How quickly can someone reach everyone else?
- Lesson 468 — Graph Algorithms: PageRank and Centrality
- Cloud Functions
- Lesson 1244 — Google Cloud Trace
- Cloud Monitoring
- (formerly Stackdriver, GCP) are fully managed metrics services that automatically collect, store, and visualize metrics from your cloud resources.
- Lesson 1204 — Cloud-Native Metrics: CloudWatch and Stackdriver
- Cloud Run
- Lesson 1244 — Google Cloud Trace
- Cloud vs On-Premise
- Lesson 119 — Choosing Load Balancer TechnologyLesson 901 — Choosing the Right API Gateway Technology
- Cloud-native
- Use managed services (SQS/SNS, GCP Pub/Sub) for lower operational burden
- Lesson 676 — Choosing Between Message Broker Technologies
- Cloud-Native BigTable
- itself evolved into Google Cloud Bigtable, the managed service version.
- Lesson 450 — BigTable's Influence on Modern Systems
- Cloud-native options
- (CloudWatch, Stackdriver) scale elastically but at higher cost.
- Lesson 1208 — Choosing a Metrics System for Your Scale
- CloudWatch
- (AWS) and **Cloud Monitoring** (formerly Stackdriver, GCP) are fully managed metrics services that automatically collect, store, and visualize metrics from your cloud resources.
- Lesson 1204 — Cloud-Native Metrics: CloudWatch and Stackdriver
- Clustering
- connects multiple RabbitMQ nodes together so they act as one logical broker, sharing metadata and providing redundancy.
- Lesson 668 — RabbitMQ Clustering and High Availability
- Clustering columns
- optionally sort data *within* a partition.
- Lesson 420 — Cassandra Overview and Data ModelLesson 422 — Clustering Columns and Row OrderingLesson 423 — Primary Key Components
- CNAME
- Alias pointing to another domain (adds extra lookup!
- Lesson 1856 — DNS Resolution Fundamentals for Crawlers
- Coalescing
- Multiple identical pending requests merge into one, reducing redundant work.
- Lesson 1914 — DataLoader and Batching Solutions
- Coarse-grained
- Works at the connection level, not the request level
- Lesson 116 — DNS-Based Load Balancing
- Coarse-grained authorization
- makes broad decisions at high levels (e.
- Lesson 940 — Coarse-Grained vs Fine-Grained Authorization
- Code Exchange
- The authorization server sends back a temporary **authorization code** to your app's redirect URL.
- Lesson 922 — Authorization Code Flow
- Code maintainability
- Consistent patterns reduce mental overhead
- Lesson 1877 — Singular vs Plural Resource Names
- Code pollution
- Libraries clutter application code; service meshes keep business logic clean
- Lesson 830 — Service Mesh vs Library-Based Solutions
- Coding
- is like being the carpenter who hammers nails, installs drywall, and connects the plumbing pipes.
- Lesson 3 — System Design vs Coding
- Cognitive Load Reduction
- A developer can hold the entire service's logic in their head.
- Lesson 797 — Improved Code Maintainability
- cold standby
- is a backup that exists only as stored data—backups, snapshots, or archived configurations.
- Lesson 1417 — Hot Standby vs Cold StandbyLesson 1443 — DR Cost Optimization
- Cold storage
- (90+ days): Archive systems like S3 Glacier for compliance
- Lesson 1135 — Log Retention and Volume ManagementLesson 1428 — Backup Storage TiersLesson 1557 — Hot vs Cold Storage TieringLesson 1572 — Storage Tier MigrationLesson 1589 — Storage Tiering Strategy
- Cold tier
- Ancient logs (months to years) archived to object storage like S3.
- Lesson 1156 — Indexing Strategies and RetentionLesson 1620 — Storage Tiering for Cost OptimizationLesson 1663 — Hot and Cold Timeline Data
- Cold/archived (30-90 days)
- Keep only trace metadata and aggregated statistics for compliance or trend analysis.
- Lesson 1246 — Trace Data Retention Policies
- Collaborative Documents
- Lesson 540 — Use Cases for Eventual Consistency
- Collaborative editing
- → Causal consistency (preserve cause-effect relationships)
- Lesson 553 — Choosing Consistency Levels
- Collaborative filtering
- "Users who bought X also bought Y" traverses purchase edges
- Lesson 457 — Use Cases: Social Networks and Recommendations
- Collect training data
- Log queries, returned results, their positions, and which ones users clicked
- Lesson 1781 — Machine Learning for Ranking
- Collection
- = A specific section (Fiction, Non-Fiction, Reference)
- Lesson 383 — Collections and DatabasesLesson 389 — MongoDB Document Model and CollectionsLesson 1218 — Testing Metric Pipelines
- Collection scans
- Without the right index, queries must examine every document.
- Lesson 408 — Query Performance Limitations
- Collection validation
- Emit known metric values from test services and verify they appear in your metrics backend (Prometheus, InfluxDB, etc.
- Lesson 1218 — Testing Metric Pipelines
- Collector
- receives span data from instrumented applications via HTTP, Kafka, or RabbitMQ
- Lesson 1242 — Zipkin Architecture and Design
- Collision risk
- Truncation increases collision probability (two different URLs producing the same short code).
- Lesson 1508 — Hash-Based Generation Approach
- Collision-free
- Practically guaranteed uniqueness across all nodes
- Lesson 1520 — Primary Key Selection: Auto-Increment vs UUID
- Collision-resistant
- Virtually impossible for two different pages to produce the same hash
- Lesson 1852 — Content Fingerprinting with Hashing
- Column families
- Groups of related columns (defined at schema time)
- Lesson 410 — What is a Wide-Column Store?Lesson 411 — Column Families and Super ColumnsLesson 420 — Cassandra Overview and Data ModelLesson 433 — What is HBase?
- Column keys
- identify attributes (like "name" or "email")
- Lesson 444 — Data Model: Sparse, Distributed, Multi-Dimensional Map
- Column-oriented storage flips this
- it groups all values from the *same column* together on disk.
- Lesson 414 — Column-Oriented Storage Benefits
- Columnar layouts
- Values, timestamps, and tags are stored separately.
- Lesson 1269 — Time Series Databases for Metrics
- Columnar storage
- storing columns together instead of rows, perfect for aggregations
- Lesson 760 — Data Warehouse ArchitectureLesson 1530 — Analytics and Click Tracking
- columns
- .
- Lesson 231 — Vertical Partitioning vs Horizontal PartitioningLesson 298 — The Relational Model FoundationLesson 410 — What is a Wide-Column Store?Lesson 411 — Column Families and Super ColumnsLesson 420 — Cassandra Overview and Data ModelLesson 1203 — InfluxDB and Time-Series Databases
- commit
- does the database permanently apply all changes together.
- Lesson 304 — Transaction Atomicity in PracticeLesson 571 — Phase 2: Commit PhaseLesson 614 — The Accept Phase
- Commit index
- Tells followers which entries are safe to apply
- Lesson 624 — AppendEntries RPC: Replication MechanismLesson 626 — Commitment and the Commit Index
- Commit Latency
- Time between a write being proposed and committed.
- Lesson 643 — Monitoring and Operating Consensus Clusters
- Commit Log (Write-Ahead Log)
- The write is immediately appended to a sequential log file stored in GFS.
- Lesson 448 — Write Path: MemTable and Commit Logs
- Commit on their own
- another participant might have voted "no"
- Lesson 573 — The Blocking Problem in 2PC
- Commit or rollback
- Either save all changes permanently or undo everything
- Lesson 310 — Atomicity: All-or-Nothing Transactions
- Commit phase
- The coordinator sends the final decision (`COMMIT` or `ABORT`) to all participants, who then execute it.
- Lesson 569 — The Coordinator Role in 2PCLesson 575 — 2PC Performance Characteristics
- Committed
- After Phase 2, the participant has permanently applied the transaction
- Lesson 572 — Participant State TransitionsLesson 623 — Log Structure and EntriesLesson 626 — Commitment and the Commit IndexLesson 630 — Safety Argument: Committing Entries from Current TermLesson 632 — Log Compaction: Snapshotting
- Committed use discounts
- lower rates for guaranteed traffic volumes
- Lesson 191 — CDN Provider Feature Comparison
- Committing
- means saving your current offset position back to Kafka.
- Lesson 710 — Offsets and Commit Strategies
- Common bounds
- Lesson 549 — Bounded Staleness
- Common optimization strategies
- Lesson 129 — Cache Hit Ratio Optimization
- Common retention tiers
- Lesson 1213 — Metric Retention Policies
- Common schedule
- Full backup weekly, differential backups daily.
- Lesson 1423 — Differential Backup Strategy
- Common Schema
- Standards like ECS (Elastic Common Schema) define field names (`http.
- Lesson 1136 — Logging Libraries and Standards
- Communicate degradation
- Let users know when features are limited (optional but honest)
- Lesson 1336 — Graceful Degradation
- Communication Becomes Explicit
- Teams communicate through well-defined APIs instead of navigating a shared codebase.
- Lesson 798 — Organizational Alignment
- Communications Lead
- Manages all outbound communication—status updates to stakeholders, customer notifications, and executive briefings.
- Lesson 1300 — Incident Command System (ICS)
- Compact
- (typically 6-8 characters)
- Lesson 1494 — Functional Requirements for a URL ShortenerLesson 1520 — Primary Key Selection: Auto- Increment vs UUID
- Compaction
- Over time, multiple SSTables accumulate.
- Lesson 415 — Write Path and LSM TreesLesson 417 — Compaction StrategiesLesson 428 — Compaction StrategiesLesson 449 — Read Path and CompactionLesson 711 — Message Retention and Log Segments
- Compare-and-Set (CAS)
- operations like `WATCH` in Redis allow optimistic locking: watch a key, prepare a transaction, and execute only if the key hasn't changed.
- Lesson 1800 — Race Conditions and Concurrency Control
- Compare-and-Swap (CAS)
- Updates a record only if its current value matches what you expect.
- Lesson 1015 — Conditional Writes for Idempotency
- compensating transactions
- to undo the work of preceding steps.
- Lesson 585 — Alternative: Saga Pattern IntroductionLesson 588 — The Saga Pattern: Motivation and DefinitionLesson 593 — Compensating Transactions: Design Principles
- Compensation
- Cancel hotel
- Lesson 589 — Saga Fundamentals: Local Transactions and CompensationsLesson 1297 — On-Call Fundamentals and Rotation Models
- Compensation stack
- tracking which rollbacks to execute if failure occurs
- Lesson 597 — Saga State Management and Persistence
- competing consumers
- within each service (queue's strength).
- Lesson 663 — Hybrid Patterns: Topic + QueueLesson 664 — Choosing Between Queue and Pub-Sub
- Competing Consumers Pattern
- solves this by adding multiple consumer instances that all read from the *same* queue.
- Lesson 661 — Competing Consumers Pattern
- Complete
- the span when work finishes, recording duration and metadata
- Lesson 1223 — Instrumentation Basics
- Complete audit trail
- You know *why* something is in its current state
- Lesson 691 — Events as First-Class Citizens
- Complete rewrites
- The codebase becomes so tangled that starting over feels easier than fixing it
- Lesson 2 — Why System Design Matters
- Completion
- When the operation finishes, the end time is recorded
- Lesson 1231 — Span Lifecycle and StructureLesson 1586 — Multipart Upload for Large Files
- Complex aggregation
- Combining partial edits from multiple sources intelligently
- Lesson 1387 — Custom Merge Functions
- Complex business logic
- Where partial success is unacceptable
- Lesson 322 — Transaction Requirements and Trade-offs
- Complex data synchronization
- Must handle multi-region writes and conflicts
- Lesson 1436 — Active-Passive vs Active-Active DR
- Complex filtering
- "Find all users who want email AND push for mentions BUT not marketing.
- Lesson 1721 — Preference Storage Strategy
- Complex Joins and Relationships
- Lesson 320 — When SQL Is the Right Choice
- Complex queries
- Join with user metadata, analyze patterns
- Lesson 1807 — In-Memory vs Persistent Storage for Rate Limiting
- Complex Queries and Filtering
- Lesson 346 — When Not to Use Key-Value Stores
- Complex routing logic
- where clients would otherwise need to know about many backend service locations
- Lesson 879 — When to Introduce an API Gateway
- Complex service meshes
- If a request touches 10+ microservices, tracing becomes essential.
- Lesson 1260 — Cost-Benefit Analysis
- Complex Traffic Management Needs
- Lesson 868 — When Service Mesh Adds Value
- Complexity
- Need to handle scenarios where no replica acknowledges in time
- Lesson 217 — Semi-Synchronous Replication Trade-offsLesson 263 — Shard Key Immutability ProblemLesson 1332 — Active-Active vs Active-Passive RedundancyLesson 1374 — Tree Replication TopologyLesson 1470 — Consistent Hashing vs Rendezvous HashingLesson 1489 — Cross-Partition TransactionsLesson 1791 — Single Data Center vs Distributed Setup
- Compliance
- (some countries require data to stay local)
- Lesson 53 — Geographic Distribution BenefitsLesson 764 — Data Governance and QualityLesson 930 — OAuth2 Scopes and ConsentLesson 1251 — Choosing a Tracing System
- Compliance and Legal Requirements
- Lesson 1420 — Business Impact Analysis for RPO/RTO
- Compliance features
- Lesson 1728 — Opt-Out and Compliance Tracking
- Compliance mandates
- GDPR, HIPAA, or industry regulations may require 1–7 years.
- Lesson 1165 — Log Retention Policies
- Compliance requirements
- often dictate minimum retention.
- Lesson 1135 — Log Retention and Volume ManagementLesson 1429 — Geographic Backup Distribution
- Compliance requires audit trails
- You need to reconstruct state at any historical moment
- Lesson 1427 — Continuous Data Protection
- Compliance windows
- Some industries require request deduplication for specific audit periods
- Lesson 1012 — Idempotency Key Expiration Strategy
- Components
- The individual parts that do specific jobs (like databases that store data, servers that handle requests, or caches that speed things up)
- Lesson 1 — What Is System Design?
- Composability
- means these services can be combined in different ways—like LEGO bricks—to build diverse experiences without rewriting code.
- Lesson 800 — Reusability and Composability
- Composite indexes
- for multi-column filters (`user_id, created_at`)
- Lesson 278 — Index Strategy for Large TablesLesson 1563 — Indexing for Ownership and Search
- Composite keys
- Balance distribution and ordering needs
- Lesson 703 — Partitioning Strategies and Key Selection
- Compound indexes
- cover multiple fields together, like indexing `(country, city)` to efficiently query "all users in Paris, France.
- Lesson 385 — Indexing in Document Stores
- Compress Content
- Lesson 1560 — Handling Large Pastes Efficiently
- Compressed image formats
- (WebP, AVIF) with aggressive optimization
- Lesson 1618 — Optimizing for Mobile Networks
- Compressed tries
- merge single-child node chains into edge labels, dramatically reducing memory footprint.
- Lesson 1776 — Typeahead Index Optimization
- Compression
- Enable `compression.
- Lesson 724 — Kafka Performance TuningLesson 760 — Data Warehouse ArchitectureLesson 1135 — Log Retention and Volume ManagementLesson 1259 — Network and Agent OverheadLesson 1269 — Time Series Databases for MetricsLesson 1745 — Posting Lists and Document IDsLesson 1746 — Index Construction at Scale
- Compute-intensive
- Each query may need to parse and validate data
- Lesson 759 — Schema-on-Write vs Schema-on-Read
- Computes optimal send windows
- (User X engages 5x more at 8 PM than 10 AM)
- Lesson 1729 — Analytics-Driven Optimization
- Con
- Larger index size, slower writes (more data to maintain)
- Lesson 279 — Covering IndexesLesson 1518 — Case Sensitivity Considerations
- concurrent
- a conflict exists
- Lesson 367 — Vector Clocks and Conflict DetectionLesson 374 — Vector Clocks for Conflict DetectionLesson 539 — Vector Clocks and CausalityLesson 547 — Causal Consistency FundamentalsLesson 1382 — Version Vectors and Causality
- Concurrent Conflicts
- Different distributed transactions might touch the same data in different orders across services, creating deadlocks.
- Lesson 566 — What is a Distributed Transaction?
- Concurrent Reads
- Multiple read operations can scan different SSTables simultaneously without coordination or locks.
- Lesson 427 — SSTables and Immutable Storage
- Concurrent writes
- Two users editing the same document from different datacenters—one entire edit vanishes without trace
- Lesson 1381 — Limitations of Last-Write-Wins
- concurrently
- (neither knew about the other).
- Lesson 367 — Vector Clocks and Conflict DetectionLesson 374 — Vector Clocks for Conflict Detection
- Config servers
- Store metadata about which data lives where
- Lesson 396 — Sharding in MongoDBLesson 398 — Config Servers and mongos Routers
- Configurable policies
- Your crawler maintains per-host settings—either from `robots.
- Lesson 1842 — Politeness Budget and Crawl Delay
- Configuration burden
- Each bulkhead needs tuning for size, timeouts, and thresholds—multiply this by dozens of dependencies
- Lesson 1076 — Bulkhead Tradeoffs: Complexity and Resource Overhead
- Configuration changes
- Should we add this new node to the cluster?
- Lesson 599 — What Is Distributed Consensus?Lesson 617 — Why Paxos Is Difficult in Practice
- Configuration distribution
- Pushes routing rules, policies, and traffic management settings to all Envoy proxies
- Lesson 861 — Istio: Architecture and Components
- Configuration Isolation
- Lesson 1790 — Multi-Tenancy Considerations
- Configuration Management
- Store service configs keyed by service name.
- Lesson 720 — Log CompactionLesson 846 — Control Plane: API and User Interface
- conflict
- Lesson 562 — Version Vectors and Conflict DetectionLesson 1368 — Multi-Leader Conflict Scenarios
- Conflict detected
- Lesson 562 — Version Vectors and Conflict Detection
- Conflict Detection
- Store metadata (like update timestamps or version numbers) with your data.
- Lesson 219 — Application-Level Consistency Patterns
- Conflict resolution
- Use distributed ID generation (from lesson 1511) to avoid collisions across regions
- Lesson 1535 — Multi-Region Deployment
- Conflict resolution complexity
- after partition heals
- Lesson 494 — AP Systems: Prioritizing Availability
- Conflicting entries
- (different commands at the same log index)
- Lesson 629 — Log Inconsistencies and Repair
- Confluent Schema Registry
- (for Kafka) and cloud-native equivalents like AWS Glue Schema Registry.
- Lesson 725 — Schema Registry and Evolution
- Connection Acquisition Patterns
- Lesson 273 — Connection Pool Monitoring
- Connection failures
- Simulate network partitions or complete service unavailability
- Lesson 858 — Fault Injection for Testing
- Connection level
- Specific applications get different limits
- Lesson 285 — Query Timeout and Statement Limits
- Connection limits
- cap how many simultaneous connections a service accepts.
- Lesson 852 — Circuit Breaking at the Mesh Level
- Connection persists
- Stays open until client disconnects or unsubscribes
- Lesson 1915 — GraphQL Subscriptions for Real-Time Data
- Connection pool configuration
- Libraries allow multiple named pools with different size limits
- Lesson 1071 — Connection Pool Bulkheads: Database and Service Isolation
- Connection Pooling
- The edge server maintains persistent, reusable connections to the origin.
- Lesson 186 — Dynamic Content AccelerationLesson 267 — What is Connection PoolingLesson 841 — Data Plane: Performance and Latency OverheadLesson 1674 — Connection Management at Scale
- Connection pools
- (dedicated database connections)
- Lesson 1067 — Bulkhead Pattern: Isolating Resources to Prevent Total Failure
- Connection refused
- Service briefly overloaded, recovering in seconds
- Lesson 1020 — Why Retries Are Necessary in Distributed Systems
- Connection Registry
- When a feed update arrives, lookup which server holds the user's connection and route the message accordingly
- Lesson 1674 — Connection Management at Scale
- Connection timeout
- limits how long you'll wait to establish a TCP connection with the remote service.
- Lesson 1088 — Connection Timeout vs Request Timeout
- Connection validation
- is the practice of testing a connection before handing it to application code.
- Lesson 271 — Connection Validation and Stale Connections
- Cons
- Lesson 108 — Hardware vs Software Load BalancersLesson 237 — Sharding Architecture PatternsLesson 259 — Resharding Strategies: Stop-the-World vs OnlineLesson 417 — Compaction StrategiesLesson 916 — Session vs Token TradeoffsLesson 927 — Token Introspection and ValidationLesson 943 — Authorization in MicroservicesLesson 983 — Gossip Protocols for Rate Limit Sync (+17 more)
- Consecutive error count
- "Open after 5 straight failures" — detects immediate, persistent issues.
- Lesson 1057 — Failure Detection and Counting
- consensus
- comes in.
- Lesson 527 — Consensus and Strong ConsistencyLesson 1366 — Leader Election and Failover
- consensus algorithms
- (like Zab or Raft) that guarantee: if you get an answer, it's the globally consistent answer.
- Lesson 501 — Distributed Locking Services (CP)Lesson 607 — Consensus vs Consistency Models
- Consider increasing timeout
- or breaking the request into smaller chunks
- Lesson 1115 — Deadline Exceeded Error Handling
- Consider your architecture
- Lesson 553 — Choosing Consistency Levels
- Consistency
- What happens when two servers update the same data simultaneously?
- Lesson 49 — Application Complexity Trade-offsLesson 155 — Cache Invalidation ProblemLesson 309 — ACID Properties OverviewLesson 470 — Transaction Model and ACID in Neo4jLesson 486 — Partition Tolerance ExplainedLesson 489 — CAP During Normal OperationLesson 491 — CAP Theorem's Original PaperLesson 493 — CP Systems: Prioritizing Consistency (+15 more)
- Consistency across services
- If your payments service uses `payment_processed_total`, don't let checkout use `checkout_txn_count`.
- Lesson 1182 — Metric Naming Conventions
- Consistency check
- Each AppendEntries includes the `prevLogIndex` and `prevLogTerm` of the entry immediately before the new ones.
- Lesson 629 — Log Inconsistencies and Repair
- Consistency complexity
- cache and database temporarily out of sync
- Lesson 136 — Write-Behind (Write-Back) Caching Pattern
- Consistency Example
- Lesson 1876 — Resource Naming and URI Design Best Practices
- consistency is non-negotiable
- .
- Lesson 511 — Banking Systems: Consistency Over AvailabilityLesson 518 — PC/EC Systems: Consistency Always
- Consistency isn't one-size-fits-all
- CAP's "consistency" means linearizability—the strongest guarantee.
- Lesson 492 — Limitations of CAP as a Framework
- Consistency models
- answer: *"What guarantees does the system provide about the order and visibility of reads and writes?
- Lesson 607 — Consensus vs Consistency Models
- Consistency options
- – You can choose sync/async per your needs (learned in lessons 1354-1357)
- Lesson 1365 — Single-Leader Replication Topology
- Consistency trade-offs
- You might over-allow requests during sync windows
- Lesson 1791 — Single Data Center vs Distributed Setup
- Consistency tradeoffs
- (synchronous cross-region replication adds latency)
- Lesson 1334 — Geographic Redundancy and Multi-Region
- Consistency vs Performance
- Lesson 1354 — Synchronous vs Asynchronous Replication
- Consistency with changing data
- If items are inserted/deleted during pagination, cursors keep you at the logical position.
- Lesson 1889 — Cursor-Based Pagination
- Consistent behavior
- Users experience uniform rate limiting regardless of which server handles their request
- Lesson 979 — Centralized vs Decentralized Approaches
- Consistent data formats
- (like the W3C Trace Context you've already learned)
- Lesson 1240 — OpenTelemetry Overview
- Consistent enforcement
- across all services without code changes
- Lesson 859 — Rate Limiting at Service Boundaries
- Consistent Hashing
- and **Least Response Time** require more complex state tracking and calculation.
- Lesson 96 — Algorithm Selection TradeoffsLesson 253 — Evaluating Sharding Strategy TradeoffsLesson 370 — Distributed Key-Value Store Architectures in PracticeLesson 372 — Consistent Hashing in DynamoLesson 1460 — Adding Nodes with Minimal DisruptionLesson 1470 — Consistent Hashing vs Rendezvous HashingLesson 1806 — Rate Limiting with Consistent HashingLesson 1815 — Sharding Rate Limit Data Across Redis Instances (+4 more)
- Consistent hashing helps
- Minimizes data movement when adding/removing nodes
- Lesson 258 — Resharding and Data Migration
- Consistent performance
- Whether you're on page 1 or page 10,000, the database performs an index seek—always fast
- Lesson 1890 — Keyset Pagination
- Consistent Policy Enforcement
- A user shouldn't be able to bypass limits by hitting different endpoints.
- Lesson 1782 — Rate Limiter Service Overview
- Consistent prefix
- Reads never see out-of-order writes
- Lesson 554 — Consistency Model Examples in Real Systems
- Consistent Reads
- Lesson 320 — When SQL Is the Right Choice
- Consistent security policies
- applied uniformly to all routes
- Lesson 883 — Authentication at the GatewayLesson 891 — SSL/TLS Termination
- Constraint boundaries
- – If your limit is 1,000 QPS and you estimate 980, don't round to 1,000
- Lesson 32 — Rounding and Approximation Techniques
- Constraints
- `NOT NULL` prevents missing values, `UNIQUE` prevents duplicates
- Lesson 301 — Schema Enforcement and Type Safety
- Consul
- , and **ZooKeeper** all use consensus-based leader election to coordinate distributed operations safely.
- Lesson 636 — Consensus for Leader ElectionLesson 638 — Configuration Management with Consensus
- Consul Clients
- Lightweight agents on each node, forward registrations to servers
- Lesson 635 — Consul: Service Discovery with Raft Consensus
- Consul Servers
- Run Raft consensus, maintain the service catalog
- Lesson 635 — Consul: Service Discovery with Raft Consensus
- Consult bloom filters
- for each SSTable — probabilistic check to skip entire files
- Lesson 429 — Read Path and Bloom Filters
- Consumer (Worker Pool)
- Multiple workers poll the queue, process files, then acknowledge completion.
- Lesson 1604 — Message Queue for Processing Jobs
- Consumer groups
- solve this by allowing multiple consumers to coordinate and share the workload.
- Lesson 708 — Consumer Groups and Parallel Consumption
- Consumer retrieves schema
- Consumers fetch the schema by ID to deserialize messages
- Lesson 725 — Schema Registry and Evolution
- consumers
- who receive and process them.
- Lesson 646 — The Producer-Consumer ModelLesson 671 — ActiveMQ and Traditional Enterprise MessagingLesson 694 — Producers and Consumers
- Consumption Model
- Lesson 698 — Streaming vs Message Queues
- Container orchestration platforms
- like Kubernetes don't run themselves.
- Lesson 811 — Infrastructure and Tooling Costs
- content
- (in object storage) is key.
- Lesson 1549 — Database Schema DesignLesson 1559 — Write Path: Synchronous vs Asynchronous Storage
- Content Aggregation
- News aggregators, job boards, or real estate platforms crawl multiple sources to compile listings in one place.
- Lesson 1826 — What is a Web Crawler
- Content delivery
- – blog posts, images, videos tolerate brief staleness
- Lesson 318 — When to Choose ACID or BASE
- Content Delivery Network (CDN)
- is a geographically distributed network of servers that cache and deliver static content (like images, videos, CSS, JavaScript files) from locations physically closer to your users.
- Lesson 168 — What is a CDN and Why Use It
- Content Delivery Networks (CDNs)
- Lesson 540 — Use Cases for Eventual Consistency
- Content discovery
- "Products similar to what you viewed" follows category and attribute relationships
- Lesson 457 — Use Cases: Social Networks and Recommendations
- Content features
- post type (video/text), topic category, recency
- Lesson 1668 — Machine Learning for Feed Ranking
- Content feed
- (AP): Display cached posts immediately; eventual consistency is acceptable
- Lesson 510 — Real Systems: Multi-Region Trade-offs
- content hashing
- generating a unique fingerprint of a file's actual content (not its filename or metadata).
- Lesson 1622 — Deduplication StrategiesLesson 1870 — Content Storage and Deduplication
- Content negotiation
- works the same way: the client tells the server which response format it prefers (JSON, XML, HTML, etc.
- Lesson 1882 — Content Negotiation and Accept Headers
- Content References
- Lesson 1642 — Post Metadata and Schema Design
- Content safety
- Does the notification meet policy guidelines?
- Lesson 1699 — Notification Processing Workers
- Content type
- (video vs text vs image)
- Lesson 1644 — Feed Personalization and Ranking RequirementsLesson 1666 — Ranking Signals and FeaturesLesson 1839 — FIFO vs Priority-Based Frontier
- Content type detection
- lets you filter out non-HTML content early and decide what's worth processing.
- Lesson 1833 — Content Type Detection
- Content type preferences
- Do you interact more with videos, photos, or text?
- Lesson 1665 — Feed Ranking Fundamentals
- Context
- IP address, request ID, policy version applied
- Lesson 944 — Auditing and Compliance for AuthorizationLesson 1317 — Blameless Culture and Learning from FailureLesson 1891 — Pagination Metadata and Links
- Context Maps
- Document how contexts relate and integrate with each other
- Lesson 815 — Domain-Driven Design and Bounded Contexts
- Context-aware
- Lambda receives HTTP details (headers, body, path parameters) as structured input
- Lesson 895 — AWS API Gateway and Serverless Integration
- Context-aware queries
- Understanding that "Washington" might mean a person, city, or state based on surrounding relationships
- Lesson 475 — Knowledge Graphs and Semantic Networks
- Contextual decisions
- Merge based on user roles, time zones, or external state
- Lesson 1387 — Custom Merge Functions
- Continue existing context
- if the caller already sent trace headers (distributed tracing across organizational boundaries)
- Lesson 1239 — Root Span and Entry Points
- Continued revenue
- Your system still processes 7/8 of transactions
- Lesson 266 — Shard Failure and Partial Outages
- Continuous Computation
- Your processing logic runs continuously, waiting for the next event rather than starting and stopping on a schedule.
- Lesson 737 — What is Stream Processing?
- Continuous Consumption
- Streams are designed for consumers to read continuously, processing events as they arrive.
- Lesson 692 — Streams vs Traditional Databases
- Continuous training
- Retrain regularly on fresh click data to adapt to changing user intent
- Lesson 1781 — Machine Learning for Ranking
- Contributing Factors
- Additional conditions that enabled or worsened the incident.
- Lesson 1352 — Postmortem Structure and Action Items
- control plane
- is the brain of the mesh.
- Lesson 837 — Service Mesh Architecture: Control vs Data PlaneLesson 842 — Control Plane: Configuration ManagementLesson 845 — Control Plane: Telemetry CollectionLesson 847 — Plane Separation: Scalability and ReliabilityLesson 848 — Traffic Management and RoutingLesson 850 — Service Discovery IntegrationLesson 856 — Observability: Metrics CollectionLesson 894 — Kong Gateway Architecture (+2 more)
- Control Plane: istiod
- Lesson 861 — Istio: Architecture and Components
- Controlled staleness
- → Distributed cache with short TTLs
- Lesson 130 — Choosing the Right Caching Layer
- Conversion rate
- Percentage of visitors who complete a desired action
- Lesson 1196 — Business vs Technical Metrics
- Conversion rates
- Calculate cost-per-action, not just cost-per-send
- Lesson 1694 — Channel Costs and Economics
- Converts to absolute time
- The deadline becomes a timestamp (not a duration)
- Lesson 1104 — gRPC Timeout Propagation
- Conway's Law
- tells us that system architecture should match team structure—if you have multiple autonomous teams, a monolith fights against that.
- Lesson 821 — When to Transition from Monolith to Microservices
- Cookie Transport
- The server sets an HTTP cookie containing the token.
- Lesson 918 — Cookie vs Bearer Token Transport
- Cookies
- The load balancer sets a cookie containing the server identifier
- Lesson 94 — Session Affinity (Sticky Sessions)Lesson 110 — Layer 7 (Application) Load BalancingLesson 918 — Cookie vs Bearer Token Transport
- Cooperative (Incremental) Rebalancing
- Lesson 717 — Rebalancing Protocol and Strategies
- Coordinate delivery
- Use a notification orchestrator that sends requests to each channel's dedicated service (push notification service, SMS gateway, email sender, in-app storage).
- Lesson 1689 — Multi-Channel Delivery
- Coordination
- Nodes communicate to handle failures and rebalancing
- Lesson 360 — What Makes a Key-Value Store DistributedLesson 526 — The Cost of Strong ConsistencyLesson 984 — Quota Sharding Across Nodes
- Coordination costs
- Lesson 1791 — Single Data Center vs Distributed Setup
- Coordination overhead
- Multiple nodes must communicate and agree before responding
- Lesson 509 — Latency: The Hidden Cost of CAPLesson 566 — What is a Distributed Transaction?Lesson 785 — When Monoliths Become ProblematicLesson 1355 — Synchronous Replication: Guarantees and Costs
- Coordination overhead explodes
- Simple operations that once happened in a single transaction now require multiple services to coordinate.
- Lesson 802 — Distributed System Complexity
- Coordination required
- Nodes must agree before responding, which takes time
- Lesson 493 — CP Systems: Prioritizing Consistency
- coordinator
- that orchestrates two distinct phases:
- Lesson 568 — Two-Phase Commit (2PC) OverviewLesson 569 — The Coordinator Role in 2PCLesson 570 — Phase 1: Prepare PhaseLesson 1863 — Coordinator-Worker Pattern for Crawling
- Coordinator logs
- Lesson 574 — Recovery Protocols and Logs
- Coordinator-Worker Pattern
- splits your web crawler into two distinct roles:
- Lesson 1863 — Coordinator-Worker Pattern for Crawling
- Copy
- paste content from S3 Standard → S3 Glacier
- Lesson 1557 — Hot vs Cold Storage TieringLesson 1572 — Storage Tier Migration
- Copy-on-Write
- The parent process continues serving requests while the child writes the snapshot to disk.
- Lesson 350 — Redis Persistence: RDB Snapshots
- Corner cases everywhere
- Handling preemptions, retries, and failures creates a combinatorial explosion of edge cases
- Lesson 617 — Why Paxos Is Difficult in Practice
- Correctness
- matters more than **speed**
- Lesson 529 — When to Choose Strong ConsistencyLesson 1273 — Choosing Good SLIs
- correctness trumps speed
- when money is involved.
- Lesson 496 — Banking and Financial Systems (CP)Lesson 518 — PC/EC Systems: Consistency Always
- Correlate easily
- Join logs across services using `correlation_id` without parsing strings
- Lesson 1137 — What is Structured Logging
- Correlation
- Connect logs from multiple services using correlation IDs
- Lesson 1169 — Centralized vs Localized Logging
- correlation ID
- is a unique identifier (often a UUID) attached to a request when it enters your system.
- Lesson 1132 — Correlation IDs and Request TracingLesson 1158 — Correlation IDs Across ServicesLesson 1161 — Context-Rich Logging
- CORS considerations
- Bearer tokens require proper CORS headers because JavaScript makes the request.
- Lesson 918 — Cookie vs Bearer Token Transport
- Cortex
- offers similar capabilities with a focus on multi-tenancy.
- Lesson 1206 — Metrics Federation and Long-Term Storage
- Cost
- Cache-only infrastructure can be cheaper for ephemeral data
- Lesson 141 — Cache-as-SoR (System of Record) PatternLesson 349 — Redis In-Memory Storage ModelLesson 366 — Sloppy Quorums and Hinted HandoffLesson 736 — What is Batch Processing?Lesson 763 — Cost and Storage EfficiencyLesson 1257 — Storage and Retention CostsLesson 1417 — Hot Standby vs Cold StandbyLesson 1428 — Backup Storage Tiers (+1 more)
- Cost Considerations
- Lesson 119 — Choosing Load Balancer TechnologyLesson 765 — Choosing Lake vs WarehouseLesson 1688 — Channel Selection StrategyLesson 1703 — Channel Routing Logic
- Cost Constraints
- Every "nine" of availability roughly multiplies infrastructure costs.
- Lesson 1276 — Setting Realistic SLOs
- Cost control
- Limits expensive operations (database queries, third-party API calls)
- Lesson 955 — What is Rate Limiting?Lesson 1577 — Paste Editing and Version History
- Cost efficiency
- Pay only for the resources each service actually needs
- Lesson 795 — Independent ScalingLesson 1255 — Adaptive SamplingLesson 1332 — Active-Active vs Active-Passive RedundancyLesson 1588 — Object Storage vs Block Storage
- Cost flexibility
- Use smaller, cheaper commodity hardware instead of expensive high-end machines
- Lesson 44 — What is Horizontal Scaling?
- Cost inefficiency
- Database storage is typically 10-20× more expensive than object storage
- Lesson 1550 — Object Storage for Paste Content
- Cost Management
- Lesson 196 — Multi-CDN Strategies
- Cost reduction
- Storing 1% of debug logs instead of 100% can reduce your logging bill by 99%
- Lesson 1164 — Sampling for High-Volume Logs
- Cost savings
- Less bandwidth and compute at the origin
- Lesson 179 — Origin Shield: Protecting Origin Servers
- Cost tradeoff
- You're trading increased backend load (~5-10% more requests) for better user-facing latency
- Lesson 1031 — Hedged Requests and Speculative Execution
- Cost-based rate limiting
- charges users differently based on what their requests actually cost your system.
- Lesson 992 — Cost-Based Rate Limiting
- Cost-effective at scale
- Built on cheap object storage (like AWS S3, Azure Data Lake Storage)
- Lesson 758 — Data Lake Fundamentals
- Cost-effective scaling
- Replicas are cheaper than sharding or massive vertical scaling
- Lesson 1522 — Read-Heavy Workload and Database Scaling
- Cost-sensitive calculations
- – Bandwidth costs compound; 10% error means real money
- Lesson 32 — Rounding and Approximation Techniques
- Costs
- Lesson 288 — Why Denormalization?Lesson 293 — Duplicate Critical FieldsLesson 583 — Alternative: Best Effort with Eventual ConsistencyLesson 1357 — Semi-Synchronous Replication
- Costs (rough AWS example)
- Lesson 33 — Putting It All Together: Worked Example
- Costs skyrocket
- Redundancy and safeguards have diminishing returns
- Lesson 1310 — Embracing Risk: The 100% Availability Trap
- counter
- is a metric type that represents a monotonically increasing value—it only goes up (or resets to zero).
- Lesson 1174 — Counter MetricsLesson 1179 — Aggregation and Roll-UpsLesson 1183 — Counter Metrics
- Counter Resets
- When a service restarts, counters reset to zero.
- Lesson 1187 — Rate Calculations from Counters
- Counters
- Increment-only counters where each replica tracks its own increments, then sums them
- Lesson 538 — Conflict-Free Replicated Data Types (CRDTs)Lesson 1172 — What Are Metrics and Why They MatterLesson 1175 — Gauge MetricsLesson 1182 — Metric Naming ConventionsLesson 1184 — Gauge MetricsLesson 1193 — Aggregation FunctionsLesson 1201 — StatsD and Metric Aggregation DaemonsLesson 1516 — Counter-Based vs UUID Approaches
- Counters are maintained
- – It keeps a rolling window of recent call results (e.
- Lesson 1045 — The Three States: Closed
- covering index
- is an index that contains *all* the columns needed to satisfy a query.
- Lesson 279 — Covering IndexesLesson 284 — Aggregation Query Optimization
- Covering indexes
- When aggregations vary but use consistent columns
- Lesson 284 — Aggregation Query Optimization
- CP (Consistency over Availability)
- systems we learned about in the CAP theorem — you sacrifice some availability to maintain perfect consistency.
- Lesson 522 — What is Strong Consistency?
- CP approach
- "Sorry, we can't process your withdrawal right now—our systems are temporarily disconnected.
- Lesson 483 — The CAP Tradeoff During PartitionsLesson 513 — Hybrid Approaches: Different Guarantees Per OperationLesson 532 — Why Eventual Consistency Exists
- CP response
- Stop taking orders until communications restore (preserve consistency)
- Lesson 505 — The Partition Question: When, Not If
- CP system
- (prioritizing consistency) instead refuses to complete a purchase unless it can guarantee the inventory count is accurate and up-to-date.
- Lesson 499 — Inventory Management (CP)Lesson 502 — Mixed Strategies: Hybrid Systems
- CP systems
- (like MongoDB with strict settings): Reject requests rather than serve stale data
- Lesson 481 — What CAP Theorem StatesLesson 493 — CP Systems: Prioritizing ConsistencyLesson 496 — Banking and Financial Systems (CP)Lesson 511 — Banking Systems: Consistency Over AvailabilityLesson 606 — The CAP Trade-off in Consensus
- CPU
- Even the most powerful processors top out.
- Lesson 46 — Hardware Limits of Vertical ScalingLesson 867 — Resource Consumption at ScaleLesson 1189 — The USE MethodLesson 1264 — USE Method: Utilization, Saturation, Errors
- CPU and Memory
- Every sidecar proxy is a separate process consuming resources.
- Lesson 834 — Service Mesh Performance OverheadLesson 841 — Data Plane: Performance and Latency Overhead
- CPU and memory utilization
- tighten limits when resources are strained
- Lesson 972 — Adaptive Rate Limiting
- CPU caches (L1/L2/L3)
- are tiny, ultra-fast memory banks built directly into your processor:
- Lesson 127 — CPU and Disk Caching Layers
- CPU constraints
- Parsing HTML, extracting links, and computing content fingerprints are CPU-intensive.
- Lesson 1862 — Why Distribute a Web Crawler
- CPU cores × 2
- While one query waits for disk I/O, another can use the CPU
- Lesson 269 — Pool Size Configuration
- CPU cycles
- spent converting objects to JSON strings
- Lesson 1143 — Performance Impact of Structured Logging
- CPU limits
- Maximum CPU cores or time slices (e.
- Lesson 1072 — CPU and Memory Bulkheads: Resource Quotas
- CPU overhead
- to update metadata on every cache access
- Lesson 154 — Implementation TradeoffsLesson 865 — Performance Overhead: Latency and Throughput
- CPU savings
- QR generation involves matrix calculations and image encoding — expensive to repeat
- Lesson 1539 — QR Code Generation
- CPU upgrade
- 4 cores → 16 cores to handle more concurrent requests
- Lesson 43 — What is Vertical Scaling?
- Crash-stop
- (or fail-stop) failures are the "well-behaved" failures.
- Lesson 602 — Crash-Stop vs Byzantine Failures
- Crawl delay
- is the enforced wait time between consecutive requests to the same domain, ensuring you don't hammer servers with rapid-fire requests.
- Lesson 1842 — Politeness Budget and Crawl Delay
- Crawl freshness
- is about keeping your index up-to-date while respecting resource limits—you can't recrawl the entire web constantly.
- Lesson 1835 — Crawl Freshness Requirements
- Crawl-delay
- Minimum seconds between requests (integrates with your politeness budget)
- Lesson 1861 — Robots.txt Caching and Parsing
- Crawling
- is the process of systematically discovering and downloading web pages (or internal documents) so they can be indexed later.
- Lesson 1732 — Crawling and Document Collection
- Create
- inserts a new document into a collection, often auto-generating a document ID if you don't provide one.
- Lesson 387 — CRUD Operations on DocumentsLesson 1223 — Instrumentation BasicsLesson 1542 — Pastebin System Overview
- Create an ephemeral resource
- A client creates a special node/key with a **session/lease**
- Lesson 637 — Distributed Locks via Consensus
- Create cascading failures
- when dependent systems (cache clusters, storage nodes) become overloaded
- Lesson 1654 — Fanout Rate Limiting
- Create ephemeral nodes
- Automatically deleted when the client disconnects (useful for leader election)
- Lesson 633 — ZooKeeper: Coordination Service Built on Consensus
- Create new context
- if this is a fresh request (generate `trace_id` and `span_id`)
- Lesson 1239 — Root Span and Entry Points
- Create sequential nodes
- Auto-numbered for implementing distributed queues or locks
- Lesson 633 — ZooKeeper: Coordination Service Built on Consensus
- Create, Read, Update, Delete
- the four fundamental operations you perform on data.
- Lesson 387 — CRUD Operations on Documents
- Creating records
- `POST /orders` — repeating creates duplicate orders
- Lesson 1006 — Natural Idempotency vs Engineered Idempotency
- Creation
- A span is created when an operation starts, recording the start time and operation name
- Lesson 1231 — Span Lifecycle and Structure
- Credentials
- passwords, tokens, API keys, session IDs
- Lesson 1145 — Sensitive Data in Structured LogsLesson 1163 — Avoid Logging Sensitive Data
- Critical
- Payment processor, inventory service, user authentication
- Lesson 1082 — Critical Path IdentificationLesson 1083 — Graceful Degradation StrategiesLesson 1810 — Counter Expiration and TTL Management
- Critical data
- (payments, inventory counts, user passwords): Use CP strategies.
- Lesson 502 — Mixed Strategies: Hybrid Systems
- Critical financial transactions
- might favor strong consistency (CP-leaning), refusing to proceed if data sync is uncertain
- Lesson 488 — CAP as a Spectrum, Not Binary
- Critical operations
- 20% of total capacity, no rate limit
- Lesson 974 — Rate Limiting with Priority Queues
- critical path
- is a dependency chain whose failure would cause the most severe cascading damage to your system.
- Lesson 1082 — Critical Path IdentificationLesson 1442 — Dependency Mapping and Critical Path Analysis
- Critical Path Analysis
- examines your trace data to identify the *longest chain* of dependent spans — the bottleneck sequence that, if optimized, would actually reduce total response time.
- Lesson 1227 — Critical Path AnalysisLesson 1229 — Service Dependency GraphsLesson 1232 — Span Relationships and HierarchyLesson 1442 — Dependency Mapping and Critical Path Analysis
- Critical requests
- (login, checkout, emergency services): minimal or no throttling
- Lesson 995 — Graceful Degradation Through Throttling
- Critical vs non-critical paths
- Protecting payment processing from search analytics failures
- Lesson 1076 — Bulkhead Tradeoffs: Complexity and Resource Overhead
- Critical/Urgent queue
- Security alerts, payment failures, password resets
- Lesson 1700 — Priority Queues and Urgency Levels
- Cross-boundary queries
- If you allow fuzzy matching or typo correction, you might need to query multiple shards in parallel and merge results.
- Lesson 1764 — Distributed Trie Architecture
- Cross-collection operations
- Since document stores discourage joins, fetching related data often requires multiple round-trips or application-level logic, multiplying latency.
- Lesson 408 — Query Performance Limitations
- Cross-cutting concerns
- (authentication, rate limiting, logging) get duplicated across services
- Lesson 870 — What is an API Gateway?
- Cross-cutting concerns becoming duplicated
- across services (authentication, rate limiting, logging)
- Lesson 879 — When to Introduce an API Gateway
- Cross-Key Transactions
- Lesson 346 — When Not to Use Key-Value Stores
- Cross-Region Strategy
- Data center affinity vs global session replication
- Lesson 947 — Distributed Session Management
- Cross-Shard Joins
- Lesson 238 — Query Limitations in Sharded Systems
- Crystal clear
- Version is immediately visible in URLs and logs
- Lesson 1899 — URI Versioning (Path-Based)
- CSRF (Cross-Site Request Forgery)
- attacks because browsers auto-send them even from malicious sites.
- Lesson 918 — Cookie vs Bearer Token Transport
- CSS stylesheets
- Lesson 173 — Content Types Suited for CDNs
- Cumulative counts
- Many implementations store cumulative counts: "≤10ms", "≤50ms", etc.
- Lesson 1185 — Histogram Metrics
- Current metrics
- The actual value that triggered the alert (e.
- Lesson 1293 — Alert Context and Enrichment
- current state
- of entities in a database—like "User balance: $100.
- Lesson 691 — Events as First-Class CitizensLesson 1175 — Gauge Metrics
- Current window
- (10:01:00–ongoing): 40 requests
- Lesson 969 — Sliding Window CounterLesson 1797 — Sliding Window Counter with Redis
- Cursor-based pagination
- replaces numeric offsets with **opaque tokens** (cursors) that encode a specific position in the dataset.
- Lesson 1889 — Cursor-Based Pagination
- Custom Conflict Resolution
- For shopping carts, you might merge conflicting versions (combine items).
- Lesson 219 — Application-Level Consistency Patterns
- Custom expiration
- (1 hour, 1 day, 1 week, 1 month, etc.
- Lesson 1565 — Expiration Requirements and TTL Basics
- Custom Header Pattern
- Lesson 1901 — Header-Based Versioning
- Custom logic
- You can use business rules, not just hash functions
- Lesson 242 — Directory-Based ShardingLesson 702 — Producers and Message Publishing
- Custom Partitioning
- Lesson 703 — Partitioning Strategies and Key Selection
- Custom signals
- (user interest, business priorities, content type)
- Lesson 1844 — Front Queue: Priority Management
- Customer Service
- owns user profiles—it manages authentication and preferences
- Lesson 817 — Identifying Service Boundaries by Data Ownership
- Cypher
- (used by Neo4j) is designed to look like the graph patterns you're searching for.
- Lesson 456 — Graph Query Languages: Cypher and GremlinLesson 465 — Variable-Length Paths
D
- Daily Active Users (DAU)
- How many unique users use your system per day
- Lesson 23 — QPS and Daily Active Users EstimationLesson 33 — Putting It All Together: Worked Example
- Daily bandwidth
- Lesson 30 — CDN Bandwidth and Cost Estimation
- Dangling references
- Creating records pointing to non-existent entities
- Lesson 262 — Referential Integrity Across Shards
- Dashboard Architecture
- Build a user dashboard that queries pastes by `user_id` with pagination.
- Lesson 1578 — User Accounts and Paste Management
- Dashboards
- are collections of panels organized into rows.
- Lesson 1200 — Grafana for Metrics Visualization
- Data Aggregation
- Collect streams from edge locations into a central analytics cluster
- Lesson 726 — Multi-Datacenter Replication
- data consistency
- .
- Lesson 54 — Scaling Databases: Special ConsiderationsLesson 134 — Write-Through Caching PatternLesson 258 — Resharding and Data Migration
- Data corruption
- can occur when the partition heals and nodes try to merge state
- Lesson 1340 — Split-Brain Problem
- Data distribution
- Choose partition keys that spread data evenly across nodes (avoid hot spots)
- Lesson 423 — Primary Key Components
- Data filtering
- Remove internal metadata or sensitive information
- Lesson 882 — Request and Response Transformation
- Data flow
- How information moves between these components (like planning how packages get from warehouses to customers)
- Lesson 1 — What Is System Design?
- Data keys
- (like user IDs, session tokens, cache keys)
- Lesson 1458 — Mapping Keys and Nodes to the Ring
- Data Lakes
- prioritize **flexibility** by storing raw, unprocessed data.
- Lesson 762 — Query Performance TradeoffsLesson 763 — Cost and Storage Efficiency
- Data Locality
- Comply with regulations requiring data processing in specific regions
- Lesson 726 — Multi-Datacenter Replication
- Data locality matters
- Accessing nearby memory addresses is dramatically faster
- Lesson 127 — CPU and Disk Caching Layers
- Data loss is guaranteed
- earlier writes disappear completely
- Lesson 1380 — Last-Write-Wins (LWW) Strategy
- Data loss occurs
- – Recent writes haven't reached replicas yet
- Lesson 1356 — Asynchronous Replication: Speed and Risk
- Data loss risk
- if cache crashes before flushing, recent writes are lost
- Lesson 136 — Write-Behind (Write-Back) Caching Pattern
- Data Mining
- Companies crawl e-commerce sites, news outlets, or social platforms to gather pricing data, trends, or public sentiment.
- Lesson 1826 — What is a Web Crawler
- Data Operations
- Lesson 10 — Identifying Functional Requirements
- Data partitioning
- How do you split data across multiple databases?
- Lesson 49 — Application Complexity Trade-offsLesson 360 — What Makes a Key-Value Store DistributedLesson 1446 — What is Data Partitioning?
- data plane
- consists of the sidecar proxies deployed alongside each service instance.
- Lesson 837 — Service Mesh Architecture: Control vs Data PlaneLesson 838 — Data Plane: Sidecar Proxy PatternLesson 845 — Control Plane: Telemetry CollectionLesson 847 — Plane Separation: Scalability and ReliabilityLesson 848 — Traffic Management and RoutingLesson 852 — Circuit Breaking at the Mesh LevelLesson 853 — Retry Policies and Timeout ConfigurationLesson 894 — Kong Gateway Architecture (+1 more)
- Data Plane: Envoy Sidecars
- Lesson 861 — Istio: Architecture and Components
- Data quality is guaranteed
- queries can trust the data types
- Lesson 301 — Schema Enforcement and Type Safety
- Data Quality Issues
- Lesson 407 — Schema Flexibility Trade-offs
- Data replication
- means storing the same data on multiple servers (nodes) instead of keeping it in just one place.
- Lesson 68 — What is Data Replication?Lesson 1334 — Geographic Redundancy and Multi-RegionLesson 1338 — Stateless vs Stateful Redundancy
- Data Requiring Aggregations
- Lesson 346 — When Not to Use Key-Value Stores
- Data Residency and Compliance
- Different countries have laws requiring user data to stay within their borders (GDPR in Europe, data sovereignty in China).
- Lesson 1435 — Multi-Region Architecture for DR
- Data sprawl
- Multiple copies, outdated versions, unknown lineage
- Lesson 764 — Data Governance and Quality
- Data Structure
- Lesson 765 — Choosing Lake vs Warehouse
- data warehouse
- is a centralized repository designed specifically for analytical workloads, not day-to-day transactions.
- Lesson 757 — Data Warehouse FundamentalsLesson 1530 — Analytics and Click Tracking
- Data Warehouses
- use **pre-aggregation** and **indexing** to optimize query speed.
- Lesson 762 — Query Performance TradeoffsLesson 763 — Cost and Storage Efficiency
- database
- .
- Lesson 6 — Components of a System Design SolutionLesson 134 — Write-Through Caching PatternLesson 383 — Collections and DatabasesLesson 910 — Session Storage Options
- Database connection pool utilization
- Lesson 1175 — Gauge Metrics
- Database load
- More UPDATE queries mean higher CPU, I/O, and lock contention
- Lesson 296 — Write Amplification Costs
- Database load spikes
- – Every feed read hits the database with complex queries
- Lesson 1637 — Pull (Read-Time) Feed Model
- Database proxy layer
- Tools like PgBouncer or ProxySQL can enforce per-user or per-application connection limits
- Lesson 1071 — Connection Pool Bulkheads: Database and Service Isolation
- Database replication
- is the process of copying data from a **primary (master) database** to one or more **replica (slave) databases**.
- Lesson 198 — What is Database Replication?
- Database restarts
- invalidate all existing connections
- Lesson 271 — Connection Validation and Stale Connections
- Database strain
- – every feed refresh triggers complex joins and aggregations
- Lesson 1647 — Fanout-on-Read (Pull Model)
- Database tier
- 10-30 second intervals, 5-10 second timeouts (databases can have legitimate temporary slowdowns)
- Lesson 100 — Health Check Intervals and Timeouts
- Database-modifying functions
- `SELECT my_update_function()` looks like a read but isn't
- Lesson 223 — Detecting Read vs Write Queries
- Database-side connection limits
- might force-close old connections
- Lesson 271 — Connection Validation and Stale Connections
- Databases
- Traditional storage for persistent user state
- Lesson 59 — Externalizing State with Shared StorageLesson 383 — Collections and DatabasesLesson 1040 — Idempotency Token Storage StrategiesLesson 1712 — Deduplication Windows and Storage
- Datadog's metric summaries
- to maintain these catalogs.
- Lesson 1216 — Metric Documentation and Discovery
- DataFrames
- provide a higher-level abstraction with schema awareness (like database tables).
- Lesson 768 — Apache Spark Overview
- DataLoader
- solves this by collecting all data requests within a single execution tick, batching them into one efficient query, and caching results.
- Lesson 1914 — DataLoader and Batching Solutions
- Date
- True timestamp values, not just strings that look like dates
- Lesson 390 — BSON Format and Data Types
- Date-prefixed records
- Orders keyed as `2024-12-15-order-001`
- Lesson 1474 — Hotspot Problems in Range Partitioning
- Day 1
- Differential backup (5GB changed)
- Lesson 1404 — Differential BackupsLesson 1422 — Incremental Backup Strategy
- Day 2
- Differential backup (12GB changed since Day 0)
- Lesson 1404 — Differential BackupsLesson 1422 — Incremental Backup Strategy
- Day 3
- Differential backup (18GB changed since Day 0)
- Lesson 1404 — Differential BackupsLesson 1422 — Incremental Backup Strategy
- Day 4
- Incremental backup captures 4 GB (changes since Day 3)
- Lesson 1422 — Incremental Backup Strategy
- Dead letter destination
- Where permanently failed messages go
- Lesson 684 — Negative Acknowledgments and Redelivery
- Dead letter handling
- quarantine bad data without blocking the entire pipeline
- Lesson 777 — Workflow Orchestration Patterns
- Dead letter queue
- After N failures, route to a special queue for investigation
- Lesson 684 — Negative Acknowledgments and RedeliveryLesson 1705 — Retry and Dead Letter Queues
- Dead Letter Queue (DLQ)
- is a special holding queue where messages go after exhausting all retry attempts.
- Lesson 687 — Dead Letter QueuesLesson 1715 — Retry Strategies for Failed Deliveries
- Dead Letter Queues
- Redirect undeliverable messages
- Lesson 671 — ActiveMQ and Traditional Enterprise Messaging
- Dead Letter Queues (DLQ)
- catch poison messages.
- Lesson 1605 — Distributed Worker ArchitectureLesson 1656 — Fanout Failure Handling
- Deadline propagation
- solves this by passing an absolute deadline down the call chain instead of durations.
- Lesson 1108 — What is Deadline Propagation
- Debezium
- is an open-source CDC platform built on Kafka Connect.
- Lesson 776 — Change Data Capture Tools
- Debouncing
- means waiting for a brief pause in typing before sending a request.
- Lesson 1763 — Debouncing and Request Optimization
- DEBUG
- Detailed diagnostic information for troubleshooting during development
- Lesson 1141 — Log Levels in Structured Logs
- Debug logs
- 3-7 days (expensive, high volume, rarely needed after immediate troubleshooting)
- Lesson 1135 — Log Retention and Volume Management
- Debug logs in production
- Fine-grained debug statements should stay off unless actively troubleshooting—they create massive volumes and performance drag.
- Lesson 1129 — What to Log vs What Not to Log
- Debug sampling
- Engineers add a debug header (`X-Trace-Debug: 1`) to specific requests during investigation.
- Lesson 1256 — Priority and Debug Sampling
- Debugging
- When something breaks, logs tell you the sequence of events leading to failure.
- Lesson 1127 — What is Logging and Why It Matters
- Debugging is easier
- if data exists, it passed all checks
- Lesson 301 — Schema Enforcement and Type Safety
- Decimal128
- High-precision decimal numbers for financial calculations
- Lesson 390 — BSON Format and Data Types
- Decision
- Was access granted or denied?
- Lesson 944 — Auditing and Compliance for AuthorizationLesson 1437 — Failover and Failback Procedures
- Decorrelated Jitter
- Lesson 1024 — Adding Jitter to Prevent Thundering Herd
- Decoupling
- Producers and consumers don't need to know about each other or be online simultaneously
- Lesson 646 — The Producer-Consumer ModelLesson 647 — Message Queue BasicsLesson 1698 — Message Queue for Decoupling
- Decrease MTTR
- (fix failures faster) — automated failover, better monitoring, faster deployments
- Lesson 1325 — Availability Formula: MTBF and MTTR Relationship
- Decrements the remaining budget
- automatically at each proxy
- Lesson 1101 — Timeout Propagation in Service Meshes
- Dedicated DNS Resolution Pool
- Instead of each worker handling DNS independently, deploy a cluster of specialized DNS resolver services.
- Lesson 1869 — Scaling DNS Resolution
- Dedicated sharding
- Route hot tenants to isolated Redis instances to prevent interference
- Lesson 1823 — Hot Tenant Problem
- Deduplicate intelligently
- Track which messages have been successfully delivered across channels.
- Lesson 1689 — Multi-Channel Delivery
- Deduplication
- using unique message IDs
- Lesson 680 — Exactly-Once DeliveryLesson 1294 — Rate Limiting and DeduplicationLesson 1305 — On- Call Tooling and AutomationLesson 1699 — Notification Processing WorkersLesson 1838 — URL Frontier: Definition and PurposeLesson 1914 — DataLoader and Batching Solutions
- Deduplication Layer
- Before saving, compute hash and check if it exists; if yes, just add new URL → hash reference
- Lesson 1870 — Content Storage and Deduplication
- deduplication window
- track event IDs for a fixed time period (e.
- Lesson 1035 — Idempotency in Event ProcessingLesson 1681 — Mobile Push Notification Integration
- Deep checks
- provide more confidence that the service can handle real traffic, but they:
- Lesson 102 — Shallow vs Deep Health Checks
- Default Expiration Policy
- Instead of allowing infinite TTL, set a "very long" default—perhaps 1 year, 5 years, or 10 years.
- Lesson 1573 — Handling Never-Expiring Pastes
- Default sort order
- Define sensible defaults (often ID or creation timestamp)
- Lesson 1894 — Sorting Query Parameters
- Default values
- Make new fields optional with sensible defaults
- Lesson 809 — Versioning and Backward CompatibilityLesson 1061 — Fallback Strategies
- Defense in depth
- means implementing rate limits at every major boundary, so if one layer fails, others still protect your resources.
- Lesson 962 — Rate Limiting at Different LayersLesson 991 — Hierarchical Rate Limiting
- Defer costly decisions
- Once domain boundaries emerge naturally through growth, you can extract services intentionally
- Lesson 820 — When a Monolith is the Right ChoiceLesson 825 — Starting with a Modular Monolith
- Defer side effects
- until after the core operation completes successfully
- Lesson 1038 — Side Effect Management
- Define rejection criteria
- thresholds that indicate your system can't handle current load
- Lesson 1084 — Load Shedding Under Cascading Failure
- Degraded mode
- Return results from N-1 shards with a warning flag rather than failing completely—users get 95% coverage instead of nothing.
- Lesson 1780 — Distributed Query Coordination
- Degrades functionality
- gracefully rather than failing completely
- Lesson 315 — Basically Available: Prioritizing Uptime
- Degrading dependencies
- makes external services slow, unavailable, or return errors.
- Lesson 1347 — Common Chaos Experiments
- Degree centrality
- Simply counts connections—who knows the most people?
- Lesson 468 — Graph Algorithms: PageRank and Centrality
- Delay enforcement
- Before dequeuing for fetch, check if enough time has passed since the last request to that host
- Lesson 1845 — Back Queue: Politeness Enforcement
- Delayed expiration
- Tokens accepted when they should be expired
- Lesson 949 — Clock Skew and Token Validation
- Delayed retry
- Wait before making it available again (backoff strategy)
- Lesson 684 — Negative Acknowledgments and RedeliveryLesson 1021 — Immediate Retry vs Delayed Retry
- DELETE
- operation removes a key and its associated value from the store.
- Lesson 339 — Key-Value Store OperationsLesson 382 — Document IDs and Primary KeysLesson 387 — CRUD Operations on DocumentsLesson 1000 — Idempotent vs Non-Idempotent OperationsLesson 1009 — HTTP Methods and Natural IdempotencyLesson 1557 — Hot vs Cold Storage TieringLesson 1572 — Storage Tier MigrationLesson 1875 — HTTP Methods: GET, POST, PUT, DELETE Semantics (+1 more)
- Delete before adding
- Before introducing a new service, library, or pattern, ask if existing tools can solve the problem
- Lesson 1315 — Simplicity as a Core Value
- Deleted (90+ days)
- Unless regulatory requirements demand it, purge everything.
- Lesson 1246 — Trace Data Retention Policies
- Deleting a record
- `DELETE FROM orders WHERE id = 123` — deleting again changes nothing
- Lesson 1006 — Natural Idempotency vs Engineered Idempotency
- Deletion
- When messages expire per retention policy, Kafka simply deletes entire old segment files (fast!
- Lesson 711 — Message Retention and Log Segments
- Deletion (after retention period)
- Permanently remove logs that exceed legal and business requirements.
- Lesson 1165 — Log Retention Policies
- Delivered
- The provider successfully delivered the message to the user's device or inbox.
- Lesson 1724 — Notification Analytics Events
- Delivery Guarantee Requirements
- Lesson 1688 — Channel Selection Strategy
- Delivery guarantees
- High-priority notifications might require a fallback: try push first, then SMS if undelivered.
- Lesson 1703 — Channel Routing Logic
- Delta encoding
- for document IDs (store differences, not absolute values)
- Lesson 1745 — Posting Lists and Document IDs
- Delta/diff storage
- saves only the changes between versions.
- Lesson 1577 — Paste Editing and Version History
- Denormalization
- Duplicate data across shards to avoid cross-shard queries
- Lesson 261 — Distributed Transactions Across ShardsLesson 1519 — Database Schema for URL Shortener
- Denormalized
- Optimizes for reads—fetch everything in one query, but updates require touching multiple records
- Lesson 289 — Normalized vs Denormalized Schema Design
- Denormalized approach
- Store `seller_name` and `rating` directly in the Products table.
- Lesson 288 — Why Denormalization?
- Dependency failures
- Make a downstream service unavailable to test bulkhead isolation
- Lesson 1342 — Testing Redundancy with Fault Injection
- Dependency graphs
- automatically built from your SQL
- Lesson 774 — dbt for Analytics EngineeringLesson 810 — Deployment ComplexityLesson 1248 — Trace Visualization and UI
- Dependency mapping
- creates a visual graph showing which services rely on which, while **critical path analysis** determines the optimal restoration sequence to minimize total recovery time.
- Lesson 1442 — Dependency Mapping and Critical Path Analysis
- Deploy
- Replace or augment traditional scoring with ML predictions
- Lesson 1781 — Machine Learning for Ranking
- Deploy application code
- that works with both old and new schemas
- Lesson 265 — Schema Changes in Sharded Environments
- Deployment complexity
- Each service needs its own CI/CD pipeline, container orchestration config, and rollback strategy
- Lesson 803 — Operational Overhead
- Deployment coordination
- Schema changes, software upgrades, or configuration updates must roll out across all shards.
- Lesson 264 — Operational Complexity of Sharded Systems
- Deployment pipelines
- for each service with its own build, test, and release cycle
- Lesson 810 — Deployment Complexity
- deployment simplicity
- advantage of monoliths for network complexity, while losing the **independent deployability** promise of microservices.
- Lesson 789 — The Distributed Monolith Anti-PatternLesson 1864 — Stateless Worker Design
- Deployments Are Safer
- Rolling updates become straightforward—drain traffic from old instances, start new ones.
- Lesson 878 — Stateless Gateway Design
- Deprecation periods
- Announce breaking changes months in advance, giving teams time to migrate
- Lesson 809 — Versioning and Backward Compatibility
- Deprecation with Forwarding
- Lesson 1904 — Maintaining Multiple API Versions
- Depth limiting
- restricts how many levels deep a client can nest fields.
- Lesson 1916 — Rate Limiting and Complexity Analysis in GraphQL
- Depth-First Search (DFS)
- follows one path as deeply as possible before backtracking.
- Lesson 1830 — Breadth-First vs Depth-First Crawling
- Description
- What the metric measures in plain language
- Lesson 1216 — Metric Documentation and Discovery
- Design for Change
- when requirements shift, documented decisions help you understand what's safe to modify.
- Lesson 42 — Document Your Decisions
- Design for failure
- building systems that anticipate and gracefully handle component failures
- Lesson 1307 — What is Site Reliability Engineering (SRE)?
- Design to avoid them
- Lesson 1487 — Cross-Partition Queries
- Details/Context
- Optional additional information like which field caused the problem, validation rules violated, or trace IDs for debugging.
- Lesson 1883 — Error Response Structure and Consistency
- Detect
- that multiple versions exist
- Lesson 377 — Eventual Consistency and Application ReconciliationLesson 1379 — Conflict Detection Mechanisms
- Detect threshold breach
- – when a partition exceeds a configured limit (e.
- Lesson 1475 — Dynamic Range Splitting
- Detection
- Monitoring systems or consensus mechanisms detect the primary is unresponsive
- Lesson 207 — Replica Promotion and Failover BasicsLesson 1330 — What is Fault Tolerance?Lesson 1437 — Failover and Failback ProceduresLesson 1509 — Handling Hash Collisions
- Detection Speed vs Accuracy
- Fast detection means quicker failover, but too sensitive checks cause false positives.
- Lesson 1335 — Failover Mechanisms
- Detection strategies
- Lesson 1823 — Hot Tenant Problem
- Determine applicable channels
- Check user preferences, channel availability, and message urgency.
- Lesson 1689 — Multi-Channel Delivery
- Determinism
- Same URL always generates the same short code—no duplicates stored for identical links.
- Lesson 1508 — Hash-Based Generation Approach
- Deterministic
- Same content always produces the same hash
- Lesson 1852 — Content Fingerprinting with Hashing
- Developer intuition
- How quickly new API consumers understand your endpoints
- Lesson 1877 — Singular vs Plural Resource Names
- Development speed matters
- One codebase means faster iteration, easier testing, and fewer bugs from maintaining duplicate logic.
- Lesson 755 — When to Choose Lambda vs Kappa
- Device offline scenarios
- User reconnects after hours—sees only final state
- Lesson 1713 — Provider-Side Deduplication
- Device token registration
- Store tokens when users log in on mobile
- Lesson 1681 — Mobile Push Notification Integration
- Device-specific schemas
- Different IoT device models may report different metrics.
- Lesson 404 — Mobile and IoT Backend Storage
- DevOps tooling
- expands dramatically: CI/CD pipelines for each service, container registries, automated testing frameworks, deployment automation, and configuration management systems all require licenses, infrastructure, and maintenance.
- Lesson 811 — Infrastructure and Tooling Costs
- DFS
- for targeted crawls of specific sites or when memory is severely constrained.
- Lesson 1830 — Breadth-First vs Depth-First Crawling
- DFS advantages
- Lesson 1830 — Breadth-First vs Depth-First Crawling
- Diagnostic Commands
- Lesson 1299 — Runbooks and Playbooks
- Different environments
- Test restoring to alternate infrastructure
- Lesson 1408 — Backup Verification and Testing
- Different Questions Answered
- Databases answer "what is the state now?
- Lesson 692 — Streams vs Traditional Databases
- Different security rules
- Refresh endpoints can require additional checks (device fingerprinting, IP validation)
- Lesson 915 — Token Expiration and Refresh Tokens
- Different Storage Needs
- Unlike URL shorteners that store tiny key-value pairs, Pastebin stores variable-length text blobs (bytes to megabytes), introducing interesting storage and retrieval challenges.
- Lesson 1542 — Pastebin System Overview
- differential backup
- captures all changes made *since the last full backup*.
- Lesson 1404 — Differential BackupsLesson 1423 — Differential Backup StrategyLesson 1424 — Backup Scheduling and Frequency
- Difficult updates
- Changing rules requires redeploying applications
- Lesson 941 — Policy Decision Points (PDP) and Enforcement Points (PEP)
- Dimension and Duration Checks
- validate image resolution and video length to prevent edge cases that could crash processing systems or violate business rules.
- Lesson 1599 — Upload Validation and Virus Scanning
- Dimension Optimization
- Lesson 1621 — Compression and Format Optimization
- Dimension tables
- describe the "who, what, when, where"—customer details, product catalogs, dates.
- Lesson 760 — Data Warehouse Architecture
- Diminishing returns
- Going from 5 to 7 nodes adds significant cost for just one more failure
- Lesson 639 — Consensus Cluster Sizing Tradeoffs
- Direct database read
- When fetching *your own* feed, bypass caches and read directly from the authoritative source (Posts table) to include any content you just created.
- Lesson 1678 — Read-After-Write Consistency
- Direct Function Calls
- Lesson 784 — Development Velocity in Early Stages
- directed
- from Alice to Bob.
- Lesson 462 — Creating Nodes and RelationshipsLesson 766 — Apache Airflow Fundamentals
- Direction
- Points from one node to another (though some systems support undirected edges)
- Lesson 452 — Graph Model: Nodes and Edges
- Directory + Consistent Hashing
- Use a directory to map logical shards, but employ consistent hashing within the directory to minimize data movement when adding shards.
- Lesson 250 — Hybrid Sharding Strategies
- directory partitioning
- maintains an explicit lookup table—a "directory"—that records which partition key belongs to which physical node or partition.
- Lesson 1476 — Directory Partitioning FundamentalsLesson 1478 — Directory Partitioning FlexibilityLesson 1480 — Hybrid Partitioning ApproachesLesson 1481 — Range vs Directory Tradeoffs
- Disable optional features
- entirely (turn off recommendations)
- Lesson 1083 — Graceful Degradation Strategies
- Disadvantage
- Higher latency — every write operation must wait for network round-trips and replica disk writes.
- Lesson 203 — Synchronous Replication Explained
- Disadvantages
- Lesson 79 — Hardware vs Software Load BalancersLesson 242 — Directory-Based ShardingLesson 289 — Normalized vs Denormalized Schema DesignLesson 349 — Redis In-Memory Storage ModelLesson 350 — Redis Persistence: RDB SnapshotsLesson 982 — Sticky Sessions and Rate LimitingLesson 1177 — Summary MetricsLesson 1181 — Push vs Pull Collection Models (+8 more)
- Disadvantages of summaries
- Lesson 1186 — Summary Metrics
- Disaster recovery
- Traffic automatically routes away from failed regions
- Lesson 117 — Global Server Load Balancing (GSLB)Lesson 726 — Multi-Datacenter ReplicationLesson 1401 — Backup vs Replication vs SnapshotsLesson 1433 — Disaster Recovery vs Business Continuity
- Disaster Recovery (DR) plan
- is your blueprint for restoring systems and data after a catastrophic event—whether that's a data center fire, ransomware attack, or natural disaster.
- Lesson 1434 — Disaster Recovery Planning Fundamentals
- Disaster resilience
- A fire in your primary data center won't destroy backups stored 1,000 miles away.
- Lesson 1429 — Geographic Backup Distribution
- Discard old log entries
- All entries up to and including that index can now be safely deleted
- Lesson 632 — Log Compaction: Snapshotting
- Discover gaps
- in monitoring, alerts, and documentation in a controlled setting
- Lesson 1345 — Starting with Game Days
- Discovery-First Feed
- Lesson 1634 — Feed Scope: What Content to Show
- Disk I/O saturation
- Writing crawled content and persisting frontier state creates I/O bottlenecks.
- Lesson 1862 — Why Distribute a Web Crawler
- Disk Seek (~10 milliseconds)
- Lesson 21 — Latency Numbers Every Engineer Should Know
- Disk Writes
- After logging, changes are eventually written to the actual data files on disk.
- Lesson 313 — Durability: Surviving System Failures
- Distribute to downstream calls
- Split remainder among dependencies (equally or weighted by expected latency)
- Lesson 1119 — Timeout Budget Management Across Service Chains
- Distribute to Workers
- Each batch is sent to a different worker instance—typically via the message queue you set up for asynchronous fanout processing.
- Lesson 1652 — Fanout Worker Parallelization
- Distributed cache
- Production systems requiring both speed and horizontal scaling
- Lesson 910 — Session Storage Options
- distributed cache layer
- moves the cache outside your application servers into a specialized external service.
- Lesson 123 — Distributed Cache Layer (Redis/Memcached)Lesson 124 — Database Query Result Caching
- Distributed caches
- Systems designed for shared access across many servers
- Lesson 59 — Externalizing State with Shared Storage
- Distributed coordination services
- like ZooKeeper and etcd
- Lesson 493 — CP Systems: Prioritizing Consistency
- Distributed counters without coordination
- Accept that different servers might have slightly stale views
- Lesson 1785 — Non-Functional Requirements: Accuracy vs Performance
- Distributed Denial-of-Service (DDoS)
- attack floods your servers with overwhelming traffic from many sources, trying to make your service unavailable to legitimate users.
- Lesson 195 — CDN for DDoS Protection
- Distributed Denial-of-Service (DDoS) attack
- against target servers.
- Lesson 1840 — Politeness Requirements for Web Crawling
- distributed monolith
- occurs when you've adopted microservices architecture—multiple services, separate deployments —but these services remain **tightly coupled** behind the scenes.
- Lesson 789 — The Distributed Monolith Anti-PatternLesson 824 — Avoiding Distributed Monoliths
- Distributed ownership
- Multiple teams/services must coordinate (e.
- Lesson 598 — Saga Frameworks and Real-World Adoption
- Distributed Politeness Table
- Each worker maintains local politeness state but synchronizes with peers.
- Lesson 1868 — Coordinating Politeness Across Workers
- Distributed scenarios
- If using multiple nodes, verify counters sync properly and don't over-allow or over-restrict
- Lesson 997 — Testing and Monitoring Rate Limiters
- Distributed storage
- S3, HDFS, or database for multi-node crawlers
- Lesson 1849 — URL Frontier Persistence and Recovery
- Distributed Stores with TTLs
- Lesson 1040 — Idempotency Token Storage Strategies
- Distributed Transactions
- Lesson 823 — Signs You're Over-Decomposing Services
- Distributes certificates
- – Securely pushes certificates to each sidecar proxy through encrypted channels
- Lesson 844 — Control Plane: Certificate Management
- Distribution challenge
- Requires coordination (like distributed ID generation from lesson 1511) to avoid duplicates across servers
- Lesson 1516 — Counter-Based vs UUID Approaches
- DMCA compliance workflow
- (disable pastes upon valid notices)
- Lesson 1581 — Abuse Prevention and Content Moderation
- DNS (Domain Name System)
- Lesson 540 — Use Cases for Eventual Consistency
- DNS failover
- monitors your primary site's health and automatically updates DNS records when problems arise.
- Lesson 1440 — DNS and Traffic Management in DR
- DNS pre-resolution
- and **connection pre-warming** to CDN edges
- Lesson 1618 — Optimizing for Mobile Networks
- DNS resolution
- happens for *every unique domain* you crawl.
- Lesson 1856 — DNS Resolution Fundamentals for Crawlers
- DNS resolution delays
- Temporary lookup failures
- Lesson 1020 — Why Retries Are Necessary in Distributed Systems
- DNS round-robin
- the DNS server has multiple IP addresses registered for one domain name and rotates through them in order.
- Lesson 82 — DNS-Based Load Balancing
- DNS-Based Load Balancing
- (#116), but adds health awareness and geographic intelligence.
- Lesson 117 — Global Server Load Balancing (GSLB)
- DNS-based request routing
- to intelligently choose the *best* server for you.
- Lesson 180 — DNS-Based Request RoutingLesson 181 — Anycast Routing for CDNs
- DNS-Level Distribution
- Lesson 81 — Single Point of Failure: Load Balancer HA
- document
- typically JSON or similar formats — that can contain nested objects, arrays, and varying fields.
- Lesson 380 — Document Structure and Schema FlexibilityLesson 383 — Collections and Databases
- Document Count
- Lesson 1731 — Search Requirements and Scale Estimation
- Document everything
- during tests—actual times, issues encountered, and procedure gaps
- Lesson 1438 — DR Testing Strategies
- document ID
- and serves as the document's primary key.
- Lesson 382 — Document IDs and Primary KeysLesson 1736 — Posting Lists and Document IDsLesson 1745 — Posting Lists and Document IDs
- Document metadata cache
- Keep titles and snippets in memory
- Lesson 1742 — Search System Architecture Overview
- document sharding
- because:
- Lesson 1753 — Distributed Index ShardingLesson 1769 — Horizontal Scaling of Search Infrastructure
- document store
- is a type of NoSQL database that stores data as complete, self-contained **documents**—typically in formats like JSON, BSON (binary JSON), or XML.
- Lesson 379 — What Is a Document Store?Lesson 381 — Documents vs Rows in Relational Databases
- Document stores
- (like MongoDB) organize data as self-contained JSON-like documents.
- Lesson 419 — Wide-Column vs Document Stores
- Document the process
- Ensure your team can restore without the one person who "knows how"
- Lesson 1408 — Backup Verification and Testing
- documentation
- or **training** was missing?
- Lesson 1317 — Blameless Culture and Learning from FailureLesson 1894 — Sorting Query Parameters
- Documenting your decisions
- means writing down *what* you decided, *why* you chose it, and *what alternatives you rejected*.
- Lesson 42 — Document Your Decisions
- documents
- typically in formats like JSON, BSON (binary JSON), or XML.
- Lesson 379 — What Is a Document Store?Lesson 389 — MongoDB Document Model and CollectionsLesson 1743 — What Is an Inverted IndexLesson 1756 — Machine Learning for Ranking (Learning to Rank)
- Domain constraints
- Inventory can't go negative, appointments can't overlap
- Lesson 1387 — Custom Merge Functions
- Domain rules
- Inventory can't go negative; bids only increase
- Lesson 1383 — Application-Level Conflict Resolution
- Domain-Based
- identification extracts the tenant from the request domain (e.
- Lesson 1818 — Tenant Identification and Context
- Don't over-index
- each index slows down `INSERT/UPDATE/DELETE` operations
- Lesson 278 — Index Strategy for Large Tables
- Don't retry
- 4xx errors (except 408 Request Timeout, 429 Too Many Requests)
- Lesson 1026 — Retry on Which Errors
- Don't warm everything
- Only cache data with proven access patterns
- Lesson 161 — Cache Warming Strategies
- Done
- All needed data is already in the index
- Lesson 279 — Covering IndexesLesson 1638 — Push (Write-Time) Feed Model
- Download bandwidth
- Your servers send that photo to 1,000 viewers → much larger requirement
- Lesson 26 — Bandwidth Estimation from Data Size
- Downtime
- during cutover (or complex dual-write patterns)
- Lesson 328 — Migration and Legacy System Constraints
- Downtime risk
- The record might be temporarily unavailable during the move
- Lesson 263 — Shard Key Immutability Problem
- DR asks
- "How do we restore our database cluster after the data center floods?
- Lesson 1433 — Disaster Recovery vs Business Continuity
- Drawbacks
- Lesson 979 — Centralized vs Decentralized ApproachesLesson 989 — Per-User vs Per-IP Rate Limiting
- Dropping messages
- means data loss—unacceptable for critical operations like payments or orders.
- Lesson 1080 — Queue Saturation and Backpressure Loss
- Dry runs
- are practice incidents where teams rehearse their response procedures without actual customer impact.
- Lesson 1295 — Testing Alerts and Dry Runs
- Dual writes problem
- Writing to two systems separately (database, then broker) creates a consistency gap.
- Lesson 688 — Transactional Semantics
- Dual-read/write periods
- Applications may need to check both old and new locations temporarily
- Lesson 258 — Resharding and Data Migration
- Dual-write phase
- Write to both old and new shards while copying historical data
- Lesson 258 — Resharding and Data Migration
- Dual-Write Strategy
- Lesson 1908 — Database Schema Evolution with API Versions
- duplicate
- returns the stored result without re-executing
- Lesson 1003 — Idempotency KeysLesson 1010 — Idempotency Keys for POST Requests
- Duplicate Critical Fields
- means intentionally copying certain data across multiple tables so you can retrieve everything you need without performing joins.
- Lesson 293 — Duplicate Critical FieldsLesson 297 — Denormalization in Practice
- Duplicate Detection
- automatically identifies and discards messages with the same `MessageId` within a configurable time window—critical when exactly-once processing matters.
- Lesson 675 — Azure Service Bus FeaturesLesson 1732 — Crawling and Document Collection
- Duplicate Logic
- Your "calculate daily revenue" logic exists in both the batch codebase and the streaming codebase.
- Lesson 751 — Lambda Architecture Tradeoffs
- Duplication
- Every service reimplements similar rules
- Lesson 941 — Policy Decision Points (PDP) and Enforcement Points (PEP)
- durability
- if a replica crashes, it can resume from its last known position without missing changes.
- Lesson 206 — Replication Logs and MechanismsLesson 217 — Semi-Synchronous Replication Trade-offsLesson 309 — ACID Properties OverviewLesson 446 — SSTable and GFS DependenciesLesson 448 — Write Path: MemTable and Commit LogsLesson 470 — Transaction Model and ACID in Neo4jLesson 693 — The Commit Log AbstractionLesson 699 — Event Streaming Platform Requirements (+5 more)
- Duration
- How long does each request take to complete?
- Lesson 1190 — The RED MethodLesson 1265 — RED Method: Rate, Errors, Duration
- Duration increases
- → service degrading, database slow, or resource contention
- Lesson 1265 — RED Method: Rate, Errors, Duration
- Durations
- `<operation>_duration_seconds` → `request_duration_seconds`
- Lesson 1182 — Metric Naming Conventions
- during a partition
- , physics forces you to sacrifice one.
- Lesson 483 — The CAP Tradeoff During PartitionsLesson 520 — Practical PACELC Analysis for Design Decisions
- During normal operation (Else)
- Should I optimize for lower latency or stronger consistency?
- Lesson 520 — Practical PACELC Analysis for Design Decisions
- During partition
- Now you must choose—wait for consistency (CP) or serve potentially stale data (AP)
- Lesson 504 — Why 'Choose Two' is Oversimplified
- Dynamic authorization code flow
- When a user from Tenant X logs in, redirect them to *their* configured IdP.
- Lesson 932 — Multi-Tenant OAuth2 and Identity Federation
- dynamic configuration
- it can update routing rules, health checks, and load balancing algorithms on-the-fly without restarts.
- Lesson 115 — Envoy Proxy ArchitectureLesson 840 — Data Plane: Envoy Proxy Fundamentals
- Dynamic Configuration via APIs
- Envoy's xDS (discovery service) APIs allow a central control plane to push configuration changes in real-time.
- Lesson 115 — Envoy Proxy Architecture
- Dynamic Partition Splitting
- Start with fewer partitions that automatically split when they grow too large.
- Lesson 1485 — Rebalancing Partitions
- Dynamic rebalancing
- through directory updates without full resharding
- Lesson 1480 — Hybrid Partitioning Approaches
- Dynamic Resolution
- The sidecar queries the control plane for current Service B instances
- Lesson 832 — Service Discovery in a Mesh
- Dynamic subscriptions
- Subscribers can join or leave without affecting publishers
- Lesson 656 — Pub-Sub Pattern Fundamentals
- Dynamic Updates
- Change timeout values without restarting services—the mesh propagates updates to all sidecars in real-time.
- Lesson 1126 — Timeout Configuration in Service Mesh
- DynamoDB
- (with eventual consistency settings)
- Lesson 494 — AP Systems: Prioritizing AvailabilityLesson 517 — PA/EL Systems: Availability and Latency FirstLesson 521 — PACELC Tradeoffs in Real Systems
E
- E-commerce checkout
- If the recommendation service fails, show a static "popular items" list instead of personalized suggestions—but keep the checkout flow working
- Lesson 1336 — Graceful Degradation
- E-commerce order processing
- – inventory counts and order placement must be precise
- Lesson 318 — When to Choose ACID or BASE
- E-commerce orders
- Shard key = `(region, order_id)` → enables regional queries, avoids global scans.
- Lesson 245 — Composite Shard KeysLesson 1411 — Defining Recovery Point Objective (RPO)
- Eager deletion
- reclaims storage promptly and keeps the database clean.
- Lesson 1567 — Lazy vs Eager Deletion Strategies
- Eager Deletion (Scheduled Cleanup)
- Background jobs periodically scan for expired pastes and proactively remove them.
- Lesson 1567 — Lazy vs Eager Deletion Strategies
- Eager Rebalancing (Stop-the-World)
- Lesson 717 — Rebalancing Protocol and Strategies
- Early Detection
- Catch problems before they cascade.
- Lesson 1262 — What is Monitoring and Why It MattersLesson 1823 — Hot Tenant Problem
- Early rejection
- prevents wasted processing on doomed requests
- Lesson 859 — Rate Limiting at Service BoundariesLesson 886 — Request Validation
- Early stages
- You might start with basic SLIs like overall HTTP success rate because you lack granular data or understanding of user journeys.
- Lesson 1284 — Iterating on SLIs and SLOs
- Early termination
- Stop scoring after finding the top K results
- Lesson 1741 — Search Latency and Response Time
- Early-stage products
- need speed and flexibility, not rigidity.
- Lesson 814 — When Complexity Outweighs BenefitsLesson 835 — When You Don't Need a Service Mesh
- Easier data consistency
- No multi-region write coordination
- Lesson 1436 — Active-Passive vs Active-Active DR
- Easier Onboarding
- New team members can become productive quickly by focusing on one service rather than learning an entire monolith.
- Lesson 797 — Improved Code Maintainability
- Easier Rollbacks
- If something goes wrong in production, rolling back is straightforward—revert to the previous single artifact.
- Lesson 783 — Deployment Simplicity: Monolith Advantage
- Easier to change
- Modifications have predictable effects, reducing risk
- Lesson 1315 — Simplicity as a Core Value
- Easier to monitor
- Fewer moving parts means clearer signals and less noise
- Lesson 1315 — Simplicity as a Core Value
- Easier to recover
- Fewer failure modes and clearer recovery paths
- Lesson 1315 — Simplicity as a Core Value
- Easier to understand
- New team members ramp up faster; on-call engineers diagnose issues quickly
- Lesson 1315 — Simplicity as a Core Value
- Easy communication
- Simpler designs are easier to explain and discuss with interviewers or teammates
- Lesson 34 — Start Simple: The Minimum Viable Design
- Easy debugging
- All logs and metrics in one place
- Lesson 1791 — Single Data Center vs Distributed Setup
- Easy enumeration
- All completions live in one subtree
- Lesson 1758 — Trie Data Structure for Prefix Matching
- Easy rebalancing
- Just update the directory entries—no mathematical recalculation needed
- Lesson 1476 — Directory Partitioning Fundamentals
- Easy routing
- Most frameworks handle path-based routing naturally
- Lesson 1899 — URI Versioning (Path-Based)
- Easy state changes
- Update permissions in one place, affect all requests
- Lesson 916 — Session vs Token Tradeoffs
- Easy to configure
- uses simple configuration files, not complex GUIs
- Lesson 111 — NGINX as a Load Balancer
- Easy to understand
- The mental model is straightforward—everyone talks to everyone
- Lesson 1369 — Multi-Leader Topologies: All-to-All
- EC
- Even during normal operation (the "Else" clause), it sacrifices **latency** for consistency
- Lesson 518 — PC/EC Systems: Consistency Always
- Economic Defense
- Without rate limiting, a single misbehaving client (malicious or buggy) can rack up massive infrastructure costs or degrade service for everyone.
- Lesson 1782 — Rate Limiter Service Overview
- Edge caches
- sit closest to users (geographically distributed)
- Lesson 1611 — Multi-Tier Caching Architecture
- Edge filtering
- The CDN analyzes incoming requests at edge locations using rate limiting, pattern detection, and behavioral analysis.
- Lesson 189 — DDoS Protection and Security at CDN Edge
- Edge servers
- are distributed caching nodes deployed in many geographic locations around the world.
- Lesson 170 — CDN Architecture: Edge Servers and OriginLesson 171 — Points of Presence (PoPs) and Edge LocationsLesson 189 — DDoS Protection and Security at CDN Edge
- Edge Tier
- (closest to users)
- Lesson 182 — Cache Hierarchies and Tiered CachingLesson 1611 — Multi-Tier Caching Architecture
- Edges
- (also called relationships): Represent connections between nodes, like "follows," "purchased," or "located_in"
- Lesson 451 — What is a Graph Database?Lesson 452 — Graph Model: Nodes and EdgesLesson 453 — Property Graphs vs RDF TriplesLesson 938 — Relationship-Based Access Control (ReBAC)
- Edit distance
- measures how many single-character operations (insert, delete, substitute) transform one word into another.
- Lesson 1774 — Spell Correction and Query Expansion
- Efficiency
- Process millions of records with optimized, parallelized operations
- Lesson 736 — What is Batch Processing?Lesson 1214 — Tagging Strategy for FilteringLesson 1817 — Multi- Tenant Rate Limiting Architecture
- Efficient for celebrities
- – avoids pushing to millions of followers
- Lesson 1647 — Fanout-on-Read (Pull Model)
- Efficient for frequent backups
- hourly backups become practical
- Lesson 1422 — Incremental Backup Strategy
- Efficient range queries
- Fetching all users born between 1990–1995 only touches one or two partitions
- Lesson 1451 — Range-Based PartitioningLesson 1471 — Range Partitioning Fundamentals
- Efficient reads
- GFS handles file serving, block caching, and network optimization
- Lesson 446 — SSTable and GFS Dependencies
- Efficient resource use
- All infrastructure serves production workload
- Lesson 1436 — Active-Passive vs Active-Active DR
- Egress (outgoing)
- Data leaving when your system responds with redirects
- Lesson 1499 — Bandwidth Requirements for Redirects
- Egress cost savings
- Cloud providers charge heavily for data leaving a region (egress fees).
- Lesson 1626 — Geolocation-Based Storage
- Elastic Load Balancer (ELB)
- and **Application Load Balancer (ALB)**.
- Lesson 113 — Cloud Load Balancers (AWS ELB/ALB)
- Elastic response
- Auto-scale hot services during traffic spikes without touching stable services
- Lesson 795 — Independent Scaling
- Elasticsearch
- for product search (full-text search optimization)
- Lesson 327 — Polyglot Persistence PatternLesson 1150 — The ELK Stack: ElasticsearchLesson 1242 — Zipkin Architecture and Design
- Election Frequency
- How often leadership changes occur.
- Lesson 643 — Monitoring and Operating Consensus Clusters
- Election Restriction
- ensures new leaders have all committed entries
- Lesson 630 — Safety Argument: Committing Entries from Current Term
- Else
- (no partition), choose Latency or Consistency.
- Lesson 516 — The 'Else' Clause: Normal Operation Tradeoffs
- HTML template with logo, tracking map, order details table
- Lesson 1692 — Channel-Specific FormattingLesson 1693 — Delivery Receipt TrackingLesson 1694 — Channel Costs and Economics
- Email addresses or UUIDs
- Guaranteed uniqueness means guaranteed cardinality explosion.
- Lesson 1211 — Avoiding High-Cardinality Labels
- Email notifications
- Sending duplicate emails creates poor user experience
- Lesson 1001 — Side Effects and Idempotency
- Email Services (SendGrid, SES)
- Lesson 1691 — Rate Limits per Channel
- embedding
- and **referencing**.
- Lesson 292 — Embedding vs Referencing in DocumentsLesson 386 — Embedded Documents vs References
- Emergency debugging
- Structured fields in a human-scannable format
- Lesson 1166 — Human-Readable vs Machine-Parseable
- Enable API validation tools
- to ensure requests/responses match the spec
- Lesson 1885 — API Documentation with OpenAPI/Swagger
- Enable faster incident response
- Automated remediation runs in seconds, not minutes
- Lesson 1308 — The SRE Philosophy: Treating Operations as Software
- Enables seamless scaling
- by adding/removing servers without changing client configuration
- Lesson 76 — What Is a Load Balancer?
- Encourages honesty
- When engineers fear punishment, they hide mistakes or provide incomplete information.
- Lesson 1351 — Blameless Postmortem Culture
- encryption
- (protecting data in transit), **authentication** (verifying identities), and **authorization** (controlling who can do what).
- Lesson 727 — Kafka Security: Authentication and EncryptionLesson 851 — Mutual TLS (mTLS) Authentication
- End-to-end latency
- Measure the time from metric emission to dashboard visibility.
- Lesson 1218 — Testing Metric Pipelines
- Endpoint-level limits
- protect expensive operations differently than cheap ones (e.
- Lesson 973 — Multi-Tier Rate Limiting
- Enforced delays
- Between requests from the same queue, insert a delay (e.
- Lesson 1841 — Single-Host Queue Pattern
- Enforcement
- Can actually reject requests and return 429 (Too Many Requests)
- Lesson 1789 — Client-Side vs Server-Side Rate Limiting
- Enforcement in the frontier
- Before dispatching a URL from a single-host queue, check the timestamp of the last request to that host.
- Lesson 1842 — Politeness Budget and Crawl Delay
- Enforces per-hop timeouts
- without service code changes
- Lesson 1101 — Timeout Propagation in Service Meshes
- Engagement Metadata
- Lesson 1642 — Post Metadata and Schema Design
- Engagement Metrics
- Likes, comments, shares, saves, and click-through rates.
- Lesson 1666 — Ranking Signals and Features
- Engineering Feasibility
- Your SLOs must account for dependencies you can't control.
- Lesson 1276 — Setting Realistic SLOs
- Engineering teams
- traditionally want perfect reliability.
- Lesson 1282 — Error Budget as a Shared Currency
- Enqueue
- Add newly discovered URLs to a queue for future crawling
- Lesson 1732 — Crawling and Document Collection
- Ensure database indexes exist
- for all filterable and sortable fields.
- Lesson 1897 — Performance Considerations and Limits
- Enterprise data
- connects customers, products, suppliers, and regulations
- Lesson 458 — Use Cases: Fraud Detection and Knowledge Graphs
- Enterprise Integration Patterns (EIP)
- design patterns for system integration:
- Lesson 671 — ActiveMQ and Traditional Enterprise Messaging
- Enterprise users
- 10,000 requests/hour (or custom limits)
- Lesson 990 — Tiered Rate Limits for Different User Classes
- Entity + Range
- Shard by customer ID (entity-based) to keep all customer data together, then range-shard historical data by timestamp to archive old records efficiently.
- Lesson 250 — Hybrid Sharding Strategies
- Entity integrity
- is a fundamental database principle: the primary key of a table must never be null or duplicated.
- Lesson 299 — Primary Keys and Entity Integrity
- Entity-based
- keeps related data together for transactions
- Lesson 253 — Evaluating Sharding Strategy Tradeoffs
- entry point
- of your system where external requests first arrive.
- Lesson 1239 — Root Span and Entry PointsLesson 1696 — Notification System High-Level Architecture
- Environment attributes
- Time, location, IP address, device type
- Lesson 935 — Attribute-Based Access Control (ABAC) Introduction
- Envoy
- is an open-source, high-performance proxy designed for microservices and service mesh architectures.
- Lesson 115 — Envoy Proxy ArchitectureLesson 856 — Observability: Metrics Collection
- Envoy Integration
- For advanced scenarios, Consul Connect can configure Envoy as the sidecar proxy instead, giving you all of Envoy's sophisticated traffic management features while Consul handles service discovery and certificate management.
- Lesson 863 — Consul Connect: HashiCorp's Approach
- Envoy Proxy
- is a high-performance C++ proxy originally developed at Lyft.
- Lesson 897 — Envoy Proxy for API ManagementLesson 1062 — Circuit Breaker Libraries and Frameworks
- ERROR
- Serious problems that require attention but the service continues
- Lesson 1141 — Log Levels in Structured Logs
- error budget
- is the mathematical inverse of your SLO — it represents the amount of "failure" you can afford before breaking your reliability promise.
- Lesson 1279 — Error Budgets: The Core ConceptLesson 1280 — Calculating and Tracking Error Budget
- Error Budget Policies
- are predefined agreements that answer these questions by establishing team behaviors and priorities tied to budget health.
- Lesson 1281 — Error Budget PoliciesLesson 1350 — What is a Postmortem?
- error budgets
- if your SLO is 99.
- Lesson 1275 — What Are Service Level Objectives (SLOs)Lesson 1310 — Embracing Risk: The 100% Availability Trap
- Error codes
- – Application-specific codes that map to documentation or runbooks (e.
- Lesson 1142 — Logging Exceptions and Stack Traces
- Error injection
- Force downstream services to return errors that count toward your failure threshold
- Lesson 1065 — Testing Circuit Breaker Behavior
- Error logs
- 30-90 days typically suffice for investigating recent incidents
- Lesson 1135 — Log Retention and Volume Management
- Error message
- – The human-readable description from the exception itself.
- Lesson 1142 — Logging Exceptions and Stack Traces
- Error percentage
- Lesson 1265 — RED Method: Rate, Errors, Duration
- Error policies
- define fallback behavior when things go wrong
- Lesson 899 — Azure API Management Features
- Error rate
- "Open when 50% of the last 20 requests fail" — better for intermittent problems mixed with successes.
- Lesson 1057 — Failure Detection and CountingLesson 1215 — Avoiding Vanity MetricsLesson 1272 — What Are Service Level Indicators (SLIs)Lesson 1278 — Multiple SLOs for Comprehensive Coverage
- Error Rate SLO
- tracks the percentage of requests that fail.
- Lesson 1278 — Multiple SLOs for Comprehensive Coverage
- Error rates
- reduce traffic if errors spike (cascading failures, database timeouts)
- Lesson 972 — Adaptive Rate LimitingLesson 993 — Adaptive Rate LimitingLesson 1255 — Adaptive SamplingLesson 1707 — Processing Pipeline MonitoringLesson 1871 — Monitoring Crawler Fleet Performance
- Error responses
- Return HTTP 503 for 20% of calls to test retry logic
- Lesson 858 — Fault Injection for Testing
- Errors
- HTTP error codes (4xx, 5xx) and failed requests
- Lesson 856 — Observability: Metrics CollectionLesson 1189 — The USE MethodLesson 1190 — The RED MethodLesson 1263 — Four Golden Signals: Latency, Traffic, Errors, SaturationLesson 1264 — USE Method: Utilization, Saturation, ErrorsLesson 1265 — RED Method: Rate, Errors, Duration
- Errors and exceptions
- Stack traces, error codes, context about what the system was attempting.
- Lesson 1129 — What to Log vs What Not to Log
- Errors spike
- → bugs deployed, dependencies failing, or capacity exceeded
- Lesson 1265 — RED Method: Rate, Errors, Duration
- Escalation paths
- If the primary responder doesn't acknowledge, does it escalate?
- Lesson 1295 — Testing Alerts and Dry Runs
- Essential sections
- Lesson 1352 — Postmortem Structure and Action Items
- ETag
- (entity tag) is a unique identifier for a resource version.
- Lesson 121 — Browser Caching and HTTP HeadersLesson 1570 — CDN Cache Control Headers
- etcd
- is a distributed key-value store used by Kubernetes for cluster coordination.
- Lesson 530 — Strong Consistency in PracticeLesson 636 — Consensus for Leader ElectionLesson 637 — Distributed Locks via ConsensusLesson 638 — Configuration Management with Consensus
- Even distribution
- Values appear with similar frequency
- Lesson 232 — Shard Key SelectionLesson 247 — Choosing the Right Shard KeyLesson 1472 — Range Partition Key SelectionLesson 1541 — Sharding and Database ScalingLesson 1854 — Distributed URL Deduplication
- Even load distribution
- Hash function spreads keys uniformly
- Lesson 1806 — Rate Limiting with Consistent Hashing
- Event broadcasting
- Publish an event that all cache layers listen to
- Lesson 163 — Multi-Level Cache InvalidationLesson 357 — Redis Pub/Sub for Real-Time Messaging
- Event Notifications
- Lesson 660 — Pub-Sub Use Cases: Event Broadcasting
- Event Publishing
- When someone posts, the fanout service doesn't just write to timelines—it also publishes an event (e.
- Lesson 1672 — WebSocket Architecture for Live Updates
- event sourcing
- records every state change as an immutable event in an append-only log.
- Lesson 586 — Alternative: Event Sourcing for ConsistencyLesson 720 — Log Compaction
- Event streaming
- A river flows continuously, and anyone can drink from it at any time
- Lesson 690 — What is Event Streaming?
- Event time
- when the event actually occurred (timestamp on the package)
- Lesson 770 — Apache Flink Architecture
- Event time semantics
- Handle late-arriving data gracefully regardless of processing mode
- Lesson 756 — Hybrid and Modern Alternatives
- Event-driven
- Each request triggers a function execution
- Lesson 895 — AWS API Gateway and Serverless Integration
- event-driven architectures
- where adding a new feature often means adding a new subscriber, not modifying existing services.
- Lesson 662 — Fan-Out with Pub-SubLesson 732 — Google Cloud Pub/Sub
- Event-Driven Revocation
- Publish revocation events to a message bus (Kafka, RabbitMQ).
- Lesson 948 — Token Revocation at Scale
- Event-driven warming
- triggers cache loading when certain events occur—a new product launch, a viral post, or a scheduled sale—preloading data you *know* will be requested heavily.
- Lesson 140 — Cache Warming Strategies
- Events are facts
- "OrderPlaced at 10:05am" cannot be undone, only compensated with a new event like "OrderCancelled"
- Lesson 586 — Alternative: Event Sourcing for Consistency
- eventual consistency
- all copies of data will match *eventually*, but there might be brief delays.
- Lesson 15 — Consistency Requirements and TradeoffsLesson 18 — Prioritizing Requirements Under ConstraintsLesson 137 — Write-Behind: Risks and Use CasesLesson 167 — Cache Coherence in Distributed SystemsLesson 261 — Distributed Transactions Across ShardsLesson 295 — Maintaining Data ConsistencyLesson 316 — Soft State and Eventual ConsistencyLesson 352 — Redis Replication Architecture (+21 more)
- Eventual consistency acceptable
- (social feeds, analytics): **Asynchronous replication** works beautifully—prioritize speed and availability over immediate consistency.
- Lesson 1364 — Choosing a Replication Mode
- Eventual consistency is acceptable
- A few milliseconds delay before a new short URL appears on replicas doesn't matter
- Lesson 1522 — Read-Heavy Workload and Database Scaling
- Eventual consistency OK
- → CDN or browser cache with longer TTLs
- Lesson 130 — Choosing the Right Caching Layer
- Eventual read
- "Give me whatever's available on the nearest replica"
- Lesson 1398 — Consistency Level Per-Operation
- Eventually consistent
- The system will become consistent eventually, but not immediately
- Lesson 314 — BASE Properties Overview
- Eventually consistent reads
- (default): Lower latency, may not reflect recent writes
- Lesson 554 — Consistency Model Examples in Real Systems
- Eventually consistent session stores
- Use global distributed caches (like DynamoDB Global Tables or Cassandra) that replicate session state, accepting brief inconsistency windows.
- Lesson 952 — Cross-Region Authentication
- Eventually-consistent data
- (social feeds, recommendations, DNS) → **AP**
- Lesson 503 — Choosing Between CP and AP
- every
- replica must apply W1 before W2
- Lesson 544 — Monotonic Writes ConsistencyLesson 569 — The Coordinator Role in 2PCLesson 996 — Rate Limit Headers and Client CommunicationLesson 1140 — Contextual Fields
- every resource
- in your system through three dimensions:
- Lesson 1189 — The USE MethodLesson 1264 — USE Method: Utilization, Saturation, Errors
- Evicts cold URLs automatically
- Links that haven't been accessed in a while get pushed out organically
- Lesson 1525 — Cache Eviction Policy for URL Shortener
- Evolve services independently
- without breaking existing clients
- Lesson 882 — Request and Response Transformation
- Exactly one leader
- is elected per term (logical time period)
- Lesson 636 — Consensus for Leader Election
- Exactly-once
- is necessary when duplicates are unacceptable and natural idempotency is hard: financial transactions, inventory updates, or billing events.
- Lesson 689 — Choosing Delivery SemanticsLesson 718 — Exactly-Once Semantics (EOS)
- exactly-once semantics
- each local transaction executes once, even across coordinator crashes.
- Lesson 597 — Saga State Management and PersistenceLesson 676 — Choosing Between Message Broker TechnologiesLesson 680 — Exactly-Once Delivery
- Example (conceptual)
- With RF=3, the coordinator places one copy on the primary node (determined by the partition key), then one each on the next two nodes clockwise around the ring.
- Lesson 424 — Replication Strategy and Factor
- Example action items
- Lesson 1352 — Postmortem Structure and Action Items
- Example aggregation table schema
- Lesson 1726 — Aggregation and Reporting
- Example approach
- Lesson 1290 — Threshold Selection and Tuning
- Example flow
- Lesson 157 — Active Invalidation on WriteLesson 951 — Caching Authorization DecisionsLesson 1014 — Idempotent Response CachingLesson 1805 — Gossip Protocols for Approximate Limits
- Example hierarchy
- Lesson 939 — Permission Inheritance and Hierarchies
- Example mental model
- Lesson 292 — Embedding vs Referencing in Documents
- Example pattern
- Lesson 1705 — Retry and Dead Letter Queues
- Example query pattern
- Lesson 1890 — Keyset Pagination
- Example scenario
- Lesson 149 — Time-To-Live (TTL) ExpirationLesson 406 — Complex Transaction RequirementsLesson 670 — Amazon SNS and SNS-SQS IntegrationLesson 938 — Relationship-Based Access Control (ReBAC)Lesson 1034 — Database Patterns for IdempotencyLesson 1283 — SLOs vs SLAs: The Critical DifferenceLesson 1383 — Application-Level Conflict Resolution
- Excessive Network Calls
- Lesson 823 — Signs You're Over-Decomposing Services
- Exclude from SLA calculations
- Lesson 1328 — Scheduled Maintenance and Availability Accounting
- Exclusive
- One consumer per topic (like Kafka with one consumer group)
- Lesson 731 — Pulsar's Unique Features
- Execute locally first
- Each system commits its part independently
- Lesson 583 — Alternative: Best Effort with Eventual Consistency
- Execute statements
- Perform inserts, updates, deletes
- Lesson 310 — Atomicity: All-or-Nothing Transactions
- Execution status
- of each transaction (pending, completed, failed, compensated)
- Lesson 597 — Saga State Management and Persistence
- Exhausting resources
- means consuming CPU, memory, disk I/O, or file descriptors until components struggle.
- Lesson 1347 — Common Chaos Experiments
- Expanding enums
- Adding new status values (if clients handle unknowns gracefully)
- Lesson 1905 — Breaking vs Non-Breaking Changes
- Expected Load
- Start with your baseline traffic patterns.
- Lesson 1073 — Bulkhead Sizing: Balancing Isolation and Utilization
- Expensive bandwidth
- Every gigabyte served directly from your origin incurs cloud egress fees (often $0.
- Lesson 1609 — Why CDNs Are Essential for Media Hosting
- Expensive reads
- Every paste retrieval loads megabytes through the database, wasting I/O bandwidth
- Lesson 1550 — Object Storage for Paste Content
- Expensive writes
- celebrities with millions of followers create massive fan-out
- Lesson 1638 — Push (Write-Time) Feed Model
- Expiration
- The indexed `expires_at` field enables efficient cleanup jobs that periodically delete or recycle expired links.
- Lesson 1519 — Database Schema for URL Shortener
- Expires
- is the older header specifying an absolute date/time when the resource becomes stale (mostly replaced by `Cache-Control`).
- Lesson 121 — Browser Caching and HTTP HeadersLesson 1570 — CDN Cache Control Headers
- Explicit Boundaries
- Clear interfaces between contexts prevent model confusion
- Lesson 815 — Domain-Driven Design and Bounded Contexts
- Explicit contracts
- Each media type can have its own documented schema
- Lesson 1902 — Content Negotiation with Media Types
- Explicit Defaults
- Return explicit values rather than relying on implicit behavior.
- Lesson 1919 — API Design for Polyglot Clients and Backwards Compatibility
- Explicit overrides
- Allow child resources to override inherited permissions when needed
- Lesson 939 — Permission Inheritance and Hierarchies
- Explicit proxying
- requires your application to be configured to send traffic directly to the proxy.
- Lesson 831 — Transparent vs Explicit Proxying
- Explicit Renewal Mechanism
- Require users to actively renew pastes beyond a certain period (e.
- Lesson 1573 — Handling Never-Expiring Pastes
- Exploratory analysis
- Try different schemas on the same raw data
- Lesson 759 — Schema-on-Write vs Schema-on-ReadLesson 762 — Query Performance Tradeoffs
- Exponential backoff
- Wait longer between retries (1s, 2s, 4s.
- Lesson 687 — Dead Letter QueuesLesson 853 — Retry Policies and Timeout ConfigurationLesson 1023 — Exponential Backoff FundamentalsLesson 1564 — Retrieval Error Handling and FallbacksLesson 1604 — Message Queue for Processing JobsLesson 1705 — Retry and Dead Letter QueuesLesson 1715 — Retry Strategies for Failed DeliveriesLesson 1859 — Handling DNS Failures and Timeouts
- Export
- The completed span is sent to your tracing backend for storage and analysis
- Lesson 1231 — Span Lifecycle and Structure
- exporters
- to send data to any backend (Prometheus, Grafana, CloudWatch, etc.
- Lesson 1205 — OpenTelemetry Metrics SDKLesson 1240 — OpenTelemetry Overview
- Extensible
- Powerful filter chain architecture allows custom logic via WebAssembly or native extensions
- Lesson 840 — Data Plane: Envoy Proxy Fundamentals
- External API calls
- Third-party service limits concurrent connections
- Lesson 971 — Concurrency Limiter Pattern
- External system interactions
- API calls (with correlation IDs), database queries that fail, third-party service responses.
- Lesson 1129 — What to Log vs What Not to Log
- Extra lookup overhead
- Every request requires two hops (directory, then shard)
- Lesson 242 — Directory-Based Sharding
- Extra uncommitted entries
- (from a failed leader that never committed them)
- Lesson 629 — Log Inconsistencies and Repair
- Extract
- the deadline from the incoming protocol's format
- Lesson 1113 — Cross-Protocol Deadline HandlingLesson 1223 — Instrumentation Basics
- Extract content
- from the fetched page (after HTML parsing)
- Lesson 1852 — Content Fingerprinting with Hashing
- Extract features
- For each query-document pair, compute signals like BM25 score, document freshness, click- through rate, time-on-page, and domain authority
- Lesson 1781 — Machine Learning for Ranking
- Extract the deadline
- from incoming request context (header, metadata, etc.
- Lesson 1110 — Calculating Remaining Time
- Extracted functional requirements
- Lesson 10 — Identifying Functional Requirements
- Extreme performance
- Purpose-built chips can handle millions of connections
- Lesson 108 — Hardware vs Software Load Balancers
- Extremely lagging replicas
- can be temporarily removed from rotation
- Lesson 218 — Lag-Aware Load Balancing
F
- Fact tables
- contain the measurable events or transactions—sales amounts, click counts, temperatures.
- Lesson 760 — Data Warehouse Architecture
- Fail fast
- so you can take alternative action quickly
- Lesson 1086 — What Timeouts Are and Why They MatterLesson 1102 — Handling Zero or Negative Timeouts
- Fail-open defaults
- when rate limiter is unreachable
- Lesson 1784 — Non-Functional Requirements: Latency and Availability
- Failback
- is the reverse—returning operations from the DR site back to the restored primary site.
- Lesson 1437 — Failover and Failback Procedures
- Failed
- The notification could not be delivered—device token invalid, phone number inactive, email bounced, etc.
- Lesson 1724 — Notification Analytics Events
- failover
- is the process of automatically redirecting all traffic away from that failed server to remaining healthy servers.
- Lesson 104 — Failover FundamentalsLesson 201 — Why Replicate: Availability and FailoverLesson 731 — Pulsar's Unique FeaturesLesson 1332 — Active-Active vs Active-Passive RedundancyLesson 1338 — Stateless vs Stateful RedundancyLesson 1366 — Leader Election and FailoverLesson 1437 — Failover and Failback ProceduresLesson 1616 — Geographic Routing and DNS (+1 more)
- Failover is lossy
- – Promoting a replica means accepting data loss
- Lesson 1356 — Asynchronous Replication: Speed and Risk
- Failover Mechanisms
- Use global load balancers or DNS-based routing (with health checks) to automatically redirect traffic when a region fails.
- Lesson 1435 — Multi-Region Architecture for DR
- Fails fast
- – Users get instant error responses instead of hanging for 30+ seconds waiting for timeouts
- Lesson 1046 — The Three States: Open
- Failure correlation
- When Service D fails, which upstream services suffer?
- Lesson 1229 — Service Dependency Graphs
- Failure count threshold
- is the simplest approach: open the circuit after N consecutive failures (e.
- Lesson 1048 — Failure Thresholds and Detection
- Failure Detection
- If health checks fail repeatedly, the DNS service marks that endpoint as unhealthy
- Lesson 1440 — DNS and Traffic Management in DR
- Failure detection and counting
- determines *how* the circuit breaker recognizes problems, accumulates evidence, and decides when a downstream service is unhealthy enough to open the circuit.
- Lesson 1057 — Failure Detection and Counting
- Failure handling
- Server 3 crashed—how do other servers continue working?
- Lesson 49 — Application Complexity Trade-offs
- Failure isolation
- If a replica goes down, your primary pool remains unaffected
- Lesson 221 — Application-Level Connection ManagementLesson 648 — Decoupling Through Messaging
- Failure patterns
- reveal whether problems are isolated incidents or systemic.
- Lesson 107 — Monitoring Health Check Metrics
- Failure rate threshold
- is more sophisticated: open when the error rate exceeds a percentage over a time window (e.
- Lesson 1048 — Failure Thresholds and Detection
- Failure Scenarios
- Model what happens when a dependency hangs.
- Lesson 1073 — Bulkhead Sizing: Balancing Isolation and Utilization
- Failure threshold
- How many consecutive failures trigger removal (e.
- Lesson 103 — Marking Servers UnhealthyLesson 106 — Health Check False Positives and Flapping
- failure thresholds
- that define when enough errors have occurred to warrant opening the circuit.
- Lesson 1048 — Failure Thresholds and DetectionLesson 1066 — Tuning for Production Workloads
- Failures
- If a middle replica fails, the chain reconnects around it (A → C).
- Lesson 1362 — Chain Replication
- Failures spread
- slow backends tie up connections, exhausting resources across the system
- Lesson 105 — Graceful Degradation and Circuit Breaking
- Fair resource allocation
- Power users running expensive operations don't get the same treatment as those making lightweight calls.
- Lesson 992 — Cost-Based Rate Limiting
- Fairness
- True rolling enforcement
- Lesson 968 — Sliding Window LogLesson 1789 — Client-Side vs Server-Side Rate Limiting
- Fan-out
- One published message reaches many subscribers
- Lesson 656 — Pub-Sub Pattern FundamentalsLesson 662 — Fan-Out with Pub-Sub
- Fan-out broadcasting
- One message reaches all interested services
- Lesson 663 — Hybrid Patterns: Topic + Queue
- Fan-out on write
- Pre-compute and distribute celebrity updates across shards when they post, rather than having everyone query one shard.
- Lesson 257 — Celebrity Problem in Social Graphs
- Fan-out scenarios
- One event triggers multiple downstream actions
- Lesson 654 — When to Use Async vs Sync
- fanout
- (broadcast), and **headers** (attribute-based).
- Lesson 666 — RabbitMQ Architecture FundamentalsLesson 1645 — What is Fanout in Social Media Systems
- Fanout Completion Time
- measures how long it takes from post creation to the last follower receiving it in their feed.
- Lesson 1657 — Measuring Fanout Performance
- Fanout-on-Read
- (also called the **Pull Model**), when a user creates a post, you simply store it once in a central location (like a `posts` table).
- Lesson 1647 — Fanout-on-Read (Pull Model)Lesson 1648 — Hybrid Fanout StrategyLesson 1665 — Feed Ranking Fundamentals
- fanout-on-write
- .
- Lesson 1648 — Hybrid Fanout StrategyLesson 1649 — The Celebrity Problem in FanoutLesson 1665 — Feed Ranking Fundamentals
- Fast
- Minimal processing since it doesn't parse application protocols
- Lesson 109 — Layer 4 (Transport) Load BalancingLesson 148 — First In First Out (FIFO)Lesson 509 — Latency: The Hidden Cost of CAPLesson 1061 — Fallback StrategiesLesson 1339 — Health Checks and Failure DetectionLesson 1852 — Content Fingerprinting with Hashing
- Fast burn (1-hour window)
- If you see 1% errors over 1 hour, you're burning budget 10× faster than sustainable—alert immediately
- Lesson 1289 — Multi-Window and Multi-Burn-Rate Alerting
- Fast failure detection
- You want to know immediately when a server goes down so traffic stops routing there
- Lesson 100 — Health Check Intervals and Timeouts
- Fast prefix matching
- Walk down the prefix path once
- Lesson 1758 — Trie Data Structure for Prefix Matching
- Fast reads
- Since data is pre-validated and structured, queries run quickly
- Lesson 759 — Schema-on-Write vs Schema-on-ReadLesson 1361 — Quorum-Based ReplicationLesson 1646 — Fanout-on-Write (Push Model)
- Fast reads, slower writes
- Set W=N, R=1 (write to all, read from one)
- Lesson 365 — Tunable Consistency with Quorum Reads and Writes
- Fast recovery
- Failed nodes' data scatters, reducing rebuild time
- Lesson 372 — Consistent Hashing in DynamoLesson 446 — SSTable and GFS DependenciesLesson 730 — Apache Pulsar ArchitectureLesson 1426 — Snapshot-Based Backups
- Fast user experience
- Post creation returns in milliseconds, not seconds
- Lesson 1651 — Asynchronous Fanout Processing
- Fast user feedback
- Upload returns success instantly; thumbnails appear later
- Lesson 1595 — Thumbnail and Preview Generation Trigger
- Fast Writes
- Since SSTables are never updated in place, writes don't require seeking around on disk or locking files.
- Lesson 427 — SSTables and Immutable StorageLesson 759 — Schema-on-Write vs Schema-on-ReadLesson 1361 — Quorum-Based ReplicationLesson 1647 — Fanout-on-Read (Pull Model)
- Fast writes, slower reads
- Set W=1, R=N (write to one, read from all)
- Lesson 365 — Tunable Consistency with Quorum Reads and Writes
- Faster CPUs
- handle complex queries more efficiently
- Lesson 54 — Scaling Databases: Special Considerations
- Faster decisions
- No constant cross-team coordination for every change
- Lesson 788 — Organizational Alignment: Conway's Law
- Faster failover
- Promote a replica to primary quickly to restore service for that shard's users
- Lesson 266 — Shard Failure and Partial Outages
- Faster iteration
- You can quickly sketch the basic architecture and then evolve it based on your estimated load, performance requirements, or constraints
- Lesson 34 — Start Simple: The Minimum Viable DesignLesson 820 — When a Monolith is the Right ChoiceLesson 906 — BFF Ownership and Team Structure
- Faster load times
- Users get files from nearby edge locations
- Lesson 173 — Content Types Suited for CDNs
- Faster recovery
- If a server fails, its load spreads across many others instead of overwhelming a single neighbor
- Lesson 363 — Virtual Nodes and Load Distribution
- Faster releases
- No waiting for other teams to be "ready"
- Lesson 786 — Independent Deployability of Microservices
- Faster response times
- for users worldwide
- Lesson 53 — Geographic Distribution BenefitsLesson 125 — CDN as Edge Caching Layer
- FATAL
- Critical failures that force the application to terminate
- Lesson 1141 — Log Levels in Structured Logs
- Father (weekly backups)
- Keep 4-5 weekly backups (typically the last backup from each week).
- Lesson 1431 — Backup Retention Policies
- Fault injection
- is the practice of intentionally causing failures in production-like environments to test whether your fault-tolerant design holds up under real conditions.
- Lesson 1342 — Testing Redundancy with Fault Injection
- fault isolation
- you've already learned about.
- Lesson 794 — Team Autonomy and OwnershipLesson 1815 — Sharding Rate Limit Data Across Redis Instances
- fault tolerance
- is crucial for meeting availability requirements (like "99.
- Lesson 47 — Single Point of Failure in Vertical ScalingLesson 77 — Why Load Balancers Are NecessaryLesson 132 — Cache-Aside: Pros and ConsLesson 364 — Replication in Distributed Key-Value StoresLesson 661 — Competing Consumers PatternLesson 668 — RabbitMQ Clustering and High AvailabilityLesson 708 — Consumer Groups and Parallel ConsumptionLesson 744 — Stream Processing Frameworks (+5 more)
- Fault tolerance increases
- You can survive hardware failures without data loss
- Lesson 68 — What is Data Replication?
- Favor boring technology
- Proven, well-understood tools beat novel ones
- Lesson 1315 — Simplicity as a Core Value
- Feature flags
- that affect behavior across multiple services
- Lesson 1237 — Baggage and Cross-Cutting ConcernsLesson 1303 — Incident Mitigation vs FixLesson 1314 — Release Engineering and Safe Deployment
- Feature freeze
- Halt all non-critical releases
- Lesson 1281 — Error Budget PoliciesLesson 1309 — Error Budgets: Balancing Reliability and Velocity
- Feature Needs
- Do you need simple routing and rate limiting, or enterprise features like monetization, developer portals, and advanced analytics?
- Lesson 901 — Choosing the Right API Gateway Technology
- features
- in a machine learning model—numerical inputs that predict engagement probability.
- Lesson 1666 — Ranking Signals and FeaturesLesson 1668 — Machine Learning for Feed RankingLesson 1756 — Machine Learning for Ranking (Learning to Rank)
- Federated token refresh
- Store mappings between your refresh tokens and external IdP refresh tokens, allowing seamless session extension across federated boundaries.
- Lesson 932 — Multi-Tenant OAuth2 and Identity Federation
- Federation
- lets one Prometheus scrape metrics from other Prometheus servers.
- Lesson 1206 — Metrics Federation and Long-Term Storage
- Feed & Ranking Signals
- Lesson 1642 — Post Metadata and Schema Design
- feedback loops
- (popular posts get more exposure, becoming more popular)
- Lesson 1644 — Feed Personalization and Ranking RequirementsLesson 1779 — Search Analytics and Click Tracking
- Fewer Moving Parts
- Lesson 784 — Development Velocity in Early Stages
- Fewer round-trips
- by aggregating related data in one query
- Lesson 1910 — GraphQL Fundamentals and Query Language
- Field mapping
- Rename or restructure response fields for client convenience
- Lesson 882 — Request and Response Transformation
- Field traversal
- when logging complex nested objects
- Lesson 1143 — Performance Impact of Structured Logging
- Field validation
- Only allow sorting on indexed fields to avoid performance issues
- Lesson 1894 — Sorting Query Parameters
- Field-level indexes
- for filterable attributes (category, brand, price_bucket, etc.
- Lesson 1775 — Faceted Search and Filters
- Field-level redaction
- Before logging, replace sensitive values with placeholders like `[REDACTED]` or hash them irreversibly.
- Lesson 1145 — Sensitive Data in Structured Logs
- FIFO
- just needs a simple queue—blazing fast but accuracy suffers since it ignores access patterns entirely.
- Lesson 154 — Implementation Tradeoffs
- FIFO queues
- guarantee exactly-once processing and strict ordering within a message group.
- Lesson 669 — Amazon SQS Architecture
- File integrity scans
- Detecting bit rot or storage media degradation
- Lesson 1408 — Backup Verification and Testing
- File Type Validation
- Verify the file extension *and* MIME type match expected formats (JPEG, PNG, MP4, etc.
- Lesson 1592 — Upload Validation and Virus Scanning
- File Type Verification
- checks that uploads match allowed formats.
- Lesson 1599 — Upload Validation and Virus Scanning
- Filter first, paginate second
- Apply `WHERE` clauses before `LIMIT/OFFSET` to reduce the working set
- Lesson 1897 — Performance Considerations and Limits
- Filtering
- drops logs matching specific criteria before they're shipped.
- Lesson 1157 — Log Sampling and FilteringLesson 1194 — Time-Series Queries and PromQLLesson 1582 — Search and Discovery
- Finally, evaluate failure scenarios
- Lesson 1364 — Choosing a Replication Mode
- Financial data
- credit card numbers, account balances
- Lesson 1145 — Sensitive Data in Structured LogsLesson 1163 — Avoid Logging Sensitive Data
- Financial ledger
- You need consistency during partitions (PC) and likely during normal ops too (EC) → PC/EC system like traditional RDBMS with strong replication
- Lesson 520 — Practical PACELC Analysis for Design Decisions
- Financial operations
- Money movement, account balances, billing
- Lesson 322 — Transaction Requirements and Trade-offs
- Financial systems
- requiring ACID transactions across distributed data (banking, trading platforms, payment processors)
- Lesson 337 — When to Choose NewSQL
- Financial transactions
- require exact balances (no "eventual" bank account balance!
- Lesson 317 — ACID vs BASE TradeoffsLesson 553 — Choosing Consistency LevelsLesson 1411 — Defining Recovery Point Objective (RPO)
- Find Responsible Node
- Walk clockwise on the ring until you hit a node; that node owns this URL's deduplication state
- Lesson 1854 — Distributed URL Deduplication
- Find the divergence point
- The leader maintains a `nextIndex` for each follower (initially set to the leader's last log index + 1).
- Lesson 629 — Log Inconsistencies and Repair
- Find the node
- Look up which Redis node owns that hash range
- Lesson 1806 — Rate Limiting with Consistent Hashing
- Firewall timeouts
- may close connections after periods of inactivity
- Lesson 271 — Connection Validation and Stale Connections
- First attempt
- Token is new → process the request, store the token and result
- Lesson 1027 — Idempotency Tokens in Retry LogicLesson 1711 — Idempotency Keys for Notifications
- First level (Range)
- Partition by `order_date` into monthly buckets
- Lesson 1453 — Composite Partitioning
- first line of defense
- in caching hierarchies.
- Lesson 121 — Browser Caching and HTTP HeadersLesson 883 — Authentication at the GatewayLesson 962 — Rate Limiting at Different Layers
- First Normal Form (1NF)
- Eliminate repeating groups—each cell contains a single atomic value, not lists
- Lesson 302 — Normalization Fundamentals
- First read
- Hits Replica B (caught up through transaction #150)
- Lesson 1360 — Monotonic Reads Across Replicas
- First retry
- Wait 100ms
- Lesson 1564 — Retrieval Error Handling and FallbacksLesson 1695 — Fallback and Retry Logic
- first-class citizens
- stored explicitly as data structures with their own identity, properties, and direct pointers between nodes.
- Lesson 454 — When Relationships Are First-Class CitizensLesson 472 — Social Networks and Friend-of- Friend Queries
- First-party fraud rings
- Networks of accounts controlled by one person making coordinated purchases
- Lesson 474 — Fraud Detection Through Pattern Matching
- Fixed limits
- require guessing capacity in advance — set them too high and you risk overload during incidents; too low and you waste capacity during healthy periods.
- Lesson 972 — Adaptive Rate Limiting
- Fixed Number of Partitions
- Create many more partitions than nodes from the start (e.
- Lesson 1485 — Rebalancing Partitions
- Fixed size
- Any content → exactly 256 bits (32 bytes)
- Lesson 1852 — Content Fingerprinting with Hashing
- fixed window
- resets at regular intervals (e.
- Lesson 961 — Time Windows for Rate LimitsLesson 975 — Algorithm Selection CriteriaLesson 1053 — Sliding Window vs Fixed WindowLesson 1057 — Failure Detection and Counting
- fixed window counter
- algorithm splits time into equal, non-overlapping intervals (windows) — say, 1-minute chunks.
- Lesson 967 — Fixed Window CounterLesson 968 — Sliding Window LogLesson 975 — Algorithm Selection CriteriaLesson 1813 — Memory Footprint per User and Limits
- Fixed window counters
- are extremely memory-efficient.
- Lesson 970 — Fixed vs Sliding Window TradeoffsLesson 1808 — Redis Data Structures for Rate Limiting
- Fixed windows
- suffer from boundary problems.
- Lesson 970 — Fixed vs Sliding Window TradeoffsLesson 1053 — Sliding Window vs Fixed Window
- Flat-rate tiers
- (some Cloudflare plans): predictable monthly cost
- Lesson 191 — CDN Provider Feature Comparison
- Flattening
- the hierarchy at query time or cache-load time, so you store the complete permission set
- Lesson 934 — RBAC Implementation Patterns
- Flexibility
- Deploy on commodity servers, VMs, or containers
- Lesson 108 — Hardware vs Software Load BalancersLesson 132 — Cache-Aside: Pros and ConsLesson 242 — Directory-Based ShardingLesson 691 — Events as First-Class CitizensLesson 759 — Schema-on-Write vs Schema-on-ReadLesson 762 — Query Performance TradeoffsLesson 1214 — Tagging Strategy for FilteringLesson 1478 — Directory Partitioning Flexibility (+2 more)
- Flexible Content Negotiation
- Use `Accept` headers to let clients request JSON, Protocol Buffers, or XML.
- Lesson 1919 — API Design for Polyglot Clients and Backwards Compatibility
- Flexible execution
- The framework decides how to optimize—chunking data like batch processing when appropriate, or flowing continuously when needed
- Lesson 756 — Hybrid and Modern Alternatives
- Flexible exploration
- Data scientists and analysts can experiment with raw data without ETL bottlenecks
- Lesson 758 — Data Lake Fundamentals
- Flink
- implements **true event-by-event streaming**: each event flows through the processing pipeline immediately upon arrival.
- Lesson 771 — Flink vs Spark for StreamingLesson 772 — Apache Beam Programming Model
- Flooding
- represents resource exhaustion spreading across your system
- Lesson 1068 — The Ship Bulkhead Analogy: Containing Damage
- Flow
- Lesson 142 — Look-Aside vs Inline Cache TopologiesLesson 1843 — Multi-Queue Frontier Architecture
- FLP impossibility
- you cannot solve consensus in purely asynchronous systems with even one possible failure.
- Lesson 600 — Why Consensus Is Hard
- FLP Impossibility Result
- (named after Fischer, Lynch, and Paterson, 1985) is a foundational theorem in distributed systems.
- Lesson 601 — The FLP Impossibility Result
- Flush to disk (SSTable)
- When the in-memory structure fills up, it's flushed to disk as an immutable sorted file called an SSTable (Sorted String Table).
- Lesson 415 — Write Path and LSM Trees
- Flush to HFiles
- When the MemStore fills up (typically 128MB-256MB), it's flushed to disk as an immutable HFile.
- Lesson 436 — HBase Write Path and WAL
- Follow-the-Sun Model
- Lesson 1297 — On-Call Fundamentals and Rotation Models
- follower
- nodes, which create copies of the data.
- Lesson 71 — Single-Leader Replication ModelLesson 619 — Server States: Leader, Follower, Candidate
- Follower Reach Rate
- tracks what percentage of a user's followers successfully received the post within your SLA (e.
- Lesson 1657 — Measuring Fanout Performance
- Follower replicas
- Other brokers maintain synchronized copies by continuously pulling updates from the leader
- Lesson 705 — Replication and Fault Tolerance
- Following-Only Feed
- Lesson 1634 — Feed Scope: What Content to Show
- Follows REST principles
- Leverages HTTP's built-in content negotiation mechanism
- Lesson 1902 — Content Negotiation with Media Types
- Follows-From
- A looser relationship indicating that one operation was triggered by another, but the parent doesn't wait for completion.
- Lesson 1232 — Span Relationships and Hierarchy
- Follows-from links
- show asynchronous fire-and-forget patterns
- Lesson 1232 — Span Relationships and Hierarchy
- For data keys
- Apply the hash function to the key itself.
- Lesson 1458 — Mapping Keys and Nodes to the Ring
- For nodes
- Apply the hash function to a node identifier (IP, hostname, UUID).
- Lesson 1458 — Mapping Keys and Nodes to the Ring
- For time-series data
- (logs, metrics, sensor readings, financial transactions), most queries focus on recent data.
- Lesson 249 — Time-Based Sharding
- Force breaking changes
- alienate users, break production systems, damage trust
- Lesson 1898 — Why API Versioning Matters
- Foreign key constraints
- Does this reference point to an existing record?
- Lesson 305 — Consistency Guarantees
- Foreign keys
- (relationships between tables) reference primary keys to maintain referential integrity
- Lesson 299 — Primary Keys and Entity Integrity
- Fork Process
- Redis calls the system `fork()` command, creating a child process that shares the same memory view
- Lesson 350 — Redis Persistence: RDB Snapshots
- Format adaptation
- Convert service responses to client-friendly formats
- Lesson 882 — Request and Response Transformation
- Format conversion
- Transform REST payloads to match service contracts
- Lesson 882 — Request and Response TransformationLesson 1601 — Video Transcoding Fundamentals
- Format Integrity
- Parse file headers to ensure the file isn't corrupted or malformed.
- Lesson 1592 — Upload Validation and Virus Scanning
- Formula
- `score = current_connections / weight`
- Lesson 88 — Weighted Least ConnectionsLesson 1499 — Bandwidth Requirements for Redirects
- Forward compatible
- Old consumers can read new messages (remove optional fields)
- Lesson 725 — Schema Registry and Evolution
- Forward recovery
- Continue execution with retries or alternative paths
- Lesson 585 — Alternative: Saga Pattern Introduction
- Forwards
- the request to the backend cluster running version 2 of the API
- Lesson 1907 — Gateway-Level Version Routing
- four golden signals
- that SRE teams rely on:
- Lesson 856 — Observability: Metrics CollectionLesson 1215 — Avoiding Vanity MetricsLesson 1313 — Monitoring and Observability for SRE
- Fraud detection
- Finding suspicious transaction chains
- Lesson 464 — Traversal Queries: Friends of FriendsLesson 739 — Stream Processing Use Cases
- Free and battle-tested
- widely used by companies from startups to tech giants
- Lesson 111 — NGINX as a Load Balancer
- Free tiers
- Cloudflare offers basic CDN free; others provide limited trial credits
- Lesson 191 — CDN Provider Feature Comparison
- Free users
- 30% of capacity, tight limits (100 requests/min)
- Lesson 974 — Rate Limiting with Priority QueuesLesson 990 — Tiered Rate Limits for Different User Classes
- Frequent multi-collection queries
- (orders with products and users)
- Lesson 405 — When Joins Are Required
- Fresh content sources
- News aggregators, forums, and social platforms constantly publish new content and links, helping your crawler stay current.
- Lesson 1828 — Seed URLs and Starting Point
- Freshness
- How current is the cached data?
- Lesson 155 — Cache Invalidation ProblemLesson 736 — What is Batch Processing?Lesson 1730 — What is a Search Engine?Lesson 1732 — Crawling and Document Collection
- Freshness priority
- Discovers important pages near the seed quickly—great for news sites or finding high-value content fast
- Lesson 1830 — Breadth-First vs Depth-First Crawling
- freshness requirements
- or natural expiration, TTL (Time-To-Live) ensures stale data doesn't linger.
- Lesson 153 — Choosing an Eviction PolicyLesson 1844 — Front Queue: Priority Management
- From Closed → Open
- Lesson 1056 — Circuit Breaker State Machine
- From document stores
- Documents store hierarchical JSON-like objects with nested structures.
- Lesson 410 — What is a Wide-Column Store?
- From Half-Open → Closed
- Lesson 1056 — Circuit Breaker State Machine
- From Half-Open → Open
- Lesson 1056 — Circuit Breaker State Machine
- From key-value stores
- While key-value stores map one key to one value, wide-column stores map a row key to many columns, letting you retrieve individual columns or column groups without fetching the entire row.
- Lesson 410 — What is a Wide-Column Store?
- From Open → Half-Open
- Lesson 1056 — Circuit Breaker State Machine
- From relational databases
- In SQL, a table has a fixed schema.
- Lesson 410 — What is a Wide-Column Store?
- front queue
- is where you make these decisions *before* URLs flow into the politeness-controlled back queues.
- Lesson 1844 — Front Queue: Priority ManagementLesson 1845 — Back Queue: Politeness Enforcement
- front queues
- managing priority and multiple **back queues** enforcing per-host politeness.
- Lesson 1846 — Queue Router and Host MappingLesson 1849 — URL Frontier Persistence and Recovery
- full backup
- creates a complete copy of all your data at a specific point in time.
- Lesson 1402 — Full BackupsLesson 1403 — Incremental BackupsLesson 1421 — Full Backup StrategyLesson 1424 — Backup Scheduling and Frequency
- Full compatible
- Both directions work (add/remove only optional fields)
- Lesson 725 — Schema Registry and Evolution
- Full historical detail
- when you need every raw event, not just summaries
- Lesson 762 — Query Performance Tradeoffs
- Full rebuild
- Rare nuclear option that regenerates everything
- Lesson 777 — Workflow Orchestration Patterns
- Full search
- happens when a user submits a complete query and expects comprehensive, highly ranked results.
- Lesson 1757 — Typeahead vs Full Search
- Full table scans
- are a graph database's Achilles heel.
- Lesson 478 — When Graphs Underperform: Aggregations and Scans
- Full-stack feature teams
- work best: a team owns the web client *and* its BFF, communicating directly with backend microservices.
- Lesson 906 — BFF Ownership and Team Structure
- Functional
- "Serve customers food," "Take payment," "Provide a menu"—these are the *actions* the restaurant performs
- Lesson 9 — Functional vs Non-Functional: Core Distinction
- Functional requirements
- describe *what* the system must do—the actual features and behaviors users interact with.
- Lesson 9 — Functional vs Non-Functional: Core Distinction
G
- G-Counter
- (grow-only counter): Each replica maintains its own count; totals are summed
- Lesson 1384 — Conflict-Free Replicated Data Types (CRDTs)
- G-Set
- (grow-only set): Items can only be added, never removed
- Lesson 1384 — Conflict-Free Replicated Data Types (CRDTs)
- Game Day
- is a scheduled, controlled event where your team intentionally breaks things in a safe environment to practice incident response.
- Lesson 1345 — Starting with Game Days
- Gaming leaderboards
- Temporary score inconsistencies won't ruin the experience
- Lesson 137 — Write-Behind: Risks and Use Cases
- Gateway → Client (HTTP)
- Lesson 874 — Protocol Translation
- Gateway → Service (gRPC)
- Lesson 874 — Protocol Translation
- gauge
- is a metric type that represents a **point-in-time measurement** that can both increase and decrease.
- Lesson 1175 — Gauge MetricsLesson 1179 — Aggregation and Roll-UpsLesson 1184 — Gauge MetricsLesson 1200 — Grafana for Metrics Visualization
- Gauges
- Current memory usage (fluctuates up and down)
- Lesson 1172 — What Are Metrics and Why They MatterLesson 1193 — Aggregation FunctionsLesson 1201 — StatsD and Metric Aggregation Daemons
- GDPR
- (Europe) and **CAN-SPAM** (US) require verifiable records showing when users opted out, what they opted out of, and that you stopped messaging them accordingly.
- Lesson 1728 — Opt-Out and Compliance Tracking
- Generate
- a correlation ID when the request enters your system (e.
- Lesson 1158 — Correlation IDs Across ServicesLesson 1512 — Random String Generation
- Generate client libraries
- in multiple languages automatically
- Lesson 1885 — API Documentation with OpenAPI/Swagger
- Generate once, cache aggressively
- Lesson 1539 — QR Code Generation
- Generating unique idempotency keys
- – Typically a UUID or similar globally unique identifier
- Lesson 1007 — Idempotency and Client Responsibilities
- Geo + Hash
- First shard by geographic region to keep data close to users, then hash-shard within each region for even distribution.
- Lesson 250 — Hybrid Sharding Strategies
- Geo-Distributed Replication
- Data is replicated across multiple nodes (typically 3+ copies) using a consensus protocol (Raft).
- Lesson 334 — CockroachDB and Distributed SQL
- GeoDNS
- is geography-aware DNS routing.
- Lesson 180 — DNS-Based Request RoutingLesson 1616 — Geographic Routing and DNS
- Geographic and Network Metrics
- show where requests originate and connection quality.
- Lesson 1628 — Usage Analytics and Metrics
- Geographic distribution
- Place servers closer to users in different regions
- Lesson 44 — What is Horizontal Scaling?Lesson 189 — DDoS Protection and Security at CDN EdgeLesson 195 — CDN for DDoS ProtectionLesson 198 — What is Database Replication?Lesson 352 — Redis Replication ArchitectureLesson 364 — Replication in Distributed Key-Value StoresLesson 639 — Consensus Cluster Sizing TradeoffsLesson 726 — Multi-Datacenter Replication (+5 more)
- Geographic latency
- Users far from the data center experience delays
- Lesson 1791 — Single Data Center vs Distributed SetupLesson 1862 — Why Distribute a Web Crawler
- Geographic location
- Derived from IP address (country, city).
- Lesson 1505 — Analytics and Tracking Requirements
- Geographic proximity
- Weight nearby servers higher for latency
- Lesson 86 — Weighted Round RobinLesson 180 — DNS-Based Request Routing
- Geographic redundancy
- means deploying your system components across multiple physical locations, often hundreds or thousands of miles apart, so that a disaster in one location doesn't destroy your entire service.
- Lesson 1334 — Geographic Redundancy and Multi-Region
- Geographic region
- (North America is typically cheaper than Asia-Pacific)
- Lesson 30 — CDN Bandwidth and Cost Estimation
- Geographic rollouts
- When expanding to a new region, warm the local PoPs with your most-accessed content.
- Lesson 184 — Cache Warming and Preloading
- Geolocation-aware crawling
- means routing requests through IP addresses physically near the target server.
- Lesson 1860 — IP Address Rotation and Geolocation
- Geolocation-based storage
- means distributing your original media files across multiple regional storage clusters, placing them nearest to where your primary audience lives.
- Lesson 1626 — Geolocation-Based Storage
- GET
- operation retrieves the value associated with a given key.
- Lesson 339 — Key-Value Store OperationsLesson 1000 — Idempotent vs Non-Idempotent OperationsLesson 1009 — HTTP Methods and Natural IdempotencyLesson 1832 — HTTP Request HandlingLesson 1875 — HTTP Methods: GET, POST, PUT, DELETE SemanticsLesson 1884 — Idempotency in RESTful APIs
- Get score ranges
- "All players between 1000-2000 points"
- Lesson 359 — Redis for Leaderboards and Counting
- GitOps workflows
- and infrastructure-as-code practices.
- Lesson 846 — Control Plane: API and User Interface
- Global alone
- Fair users suffer when bad actors consume all capacity.
- Lesson 991 — Hierarchical Rate Limiting
- Global Distributed Indexes (Term-Partitioned)
- Lesson 1455 — Secondary Indexes and Partitioning
- Global HTTP(S) Load Balancer
- Layer 7, distributes traffic across multiple regions automatically
- Lesson 114 — Cloud Load Balancers (GCP and Azure)
- Global Merge-Sort
- Combine results from all shards, re-rank by score, and return the global top-K
- Lesson 1780 — Distributed Query Coordination
- Global multi-region applications
- where users expect consistent data regardless of location
- Lesson 337 — When to Choose NewSQL
- Global query view
- across all Prometheus instances
- Lesson 1206 — Metrics Federation and Long-Term Storage
- Global Rate Limiting
- Use a centralized token bucket or distributed rate limiter to cap total fanout throughput across all workers.
- Lesson 1654 — Fanout Rate Limiting
- Global SSL Proxy/TCP Proxy
- Layer 4 with TLS termination
- Lesson 114 — Cloud Load Balancers (GCP and Azure)
- Global system limits
- act as a backstop when total traffic exceeds infrastructure capacity (e.
- Lesson 973 — Multi-Tier Rate Limiting
- Good (idempotent)
- `if transaction_id not in processed_set: balance += 100; mark transaction_id as processed`
- Lesson 679 — At-Least-Once Delivery
- Good labels
- Lesson 1214 — Tagging Strategy for Filtering
- Good shard key
- `user_id` — distributes users evenly and queries for one user hit one shard
- Lesson 232 — Shard Key Selection
- Google Cloud Trace
- , and commercial offerings like Datadog APM provide:
- Lesson 1251 — Choosing a Tracing System
- Google Kubernetes Engine (GKE)
- Lesson 1244 — Google Cloud Trace
- Google Pub/Sub with subscriptions
- , and **Kafka consumer groups**.
- Lesson 663 — Hybrid Patterns: Topic + Queue
- Google Spanner
- provides strict serializability across globally distributed data centers.
- Lesson 530 — Strong Consistency in PracticeLesson 576 — When 2PC is Used in Practice
- gossip protocol
- they periodically "chat" with random peers to share information about the entire cluster's state.
- Lesson 430 — Gossip Protocol and Failure DetectionLesson 983 — Gossip Protocols for Rate Limit SyncLesson 1805 — Gossip Protocols for Approximate Limits
- Governance challenges
- Hard to track what's sensitive, who owns it, or if it's compliant
- Lesson 764 — Data Governance and Quality
- Graceful degradation
- means the system continues functioning in a reduced capacity rather than failing completely.
- Lesson 105 — Graceful Degradation and Circuit BreakingLesson 120 — Caching Hierarchy OverviewLesson 266 — Shard Failure and Partial OutagesLesson 809 — Versioning and Backward CompatibilityLesson 859 — Rate Limiting at Service BoundariesLesson 1329 — Partial Availability and Graceful DegradationLesson 1330 — What is Fault Tolerance?Lesson 1336 — Graceful Degradation (+2 more)
- Graceful node changes
- When nodes join or leave, only adjacent replicas are affected
- Lesson 1466 — Replication with Consistent Hashing
- Gradual circuit breaker resets
- that slowly increase allowed traffic
- Lesson 1081 — Thundering Herd After Recovery
- Gradual migration
- old and new versions can briefly coexist
- Lesson 165 — Versioned Cache KeysLesson 1907 — Gateway-Level Version Routing
- Grandfather-Father-Son (GFS)
- approach:
- Lesson 1406 — Backup Retention PoliciesLesson 1431 — Backup Retention Policies
- Graph approach
- Your friend's contact card has their friends' phone numbers directly written on it.
- Lesson 476 — Graph Query Performance Characteristics
- graph database
- is a database system optimized for storing and querying data where **relationships between entities are just as important as the entities themselves**.
- Lesson 451 — What is a Graph Database?Lesson 452 — Graph Model: Nodes and Edges
- Graph/Time Series
- Line charts showing metrics over time (perfect for request rates, CPU usage)
- Lesson 1200 — Grafana for Metrics Visualization
- Graphite
- provides simpler but less flexible querying.
- Lesson 1208 — Choosing a Metrics System for Your Scale
- Graphite-web
- The web application for querying and rendering graphs
- Lesson 1202 — Graphite Time-Series Database
- GraphQL subscriptions
- establish a persistent, bidirectional connection between client and server—typically using **WebSockets**.
- Lesson 1915 — GraphQL Subscriptions for Real-Time Data
- GraphQL wins on
- Lesson 1911 — GraphQL vs REST: Tradeoffs
- GraphQL-style arguments
- Clients specify filters as structured objects:
- Lesson 1893 — Complex Filtering with Query Languages
- Gremlin
- (used by Apache TinkerPop-compatible databases) is more like giving step-by-step walking directions through your graph.
- Lesson 456 — Graph Query Languages: Cypher and Gremlin
- Grouping
- works similarly—you organize cache keys into logical collections (groups) so you can operate on the entire collection at once.
- Lesson 164 — Cache Tagging and GroupingLesson 1194 — Time-Series Queries and PromQL
- gRPC
- High-performance, binary, strongly-typed
- Lesson 874 — Protocol TranslationLesson 1113 — Cross-Protocol Deadline Handling
- GSSAPI/Kerberos
- Enterprise-grade, ticket-based authentication
- Lesson 727 — Kafka Security: Authentication and Encryption
- Guarantee
- You still write to N nodes, but they might not be the "right" N
- Lesson 366 — Sloppy Quorums and Hinted Handoff
H
- H.265
- involve repetitive mathematical operations across millions of pixels—perfect for GPU parallelization.
- Lesson 1607 — GPU Acceleration for Encoding
- HA pairs
- Two HAProxy instances using keepalived or similar for redundancy
- Lesson 112 — HAProxy Overview
- Half-Open
- a cautious middle ground where it allows a limited number of test requests through.
- Lesson 1047 — The Three States: Half-OpenLesson 1050 — State Transition MechanicsLesson 1052 — Circuit Breaker Reset LogicLesson 1060 — Half-Open State TestingLesson 1081 — Thundering Herd After RecoveryLesson 1803 — Handling Redis Failures and Fallbacks
- Half-Open → Closed
- Send successful requests during half-open state, verify full recovery
- Lesson 1065 — Testing Circuit Breaker Behavior
- Half-Open → Open
- Send failures during half-open state, verify the breaker re-opens
- Lesson 1065 — Testing Circuit Breaker Behavior
- Half-open circuit
- Limited test requests (possibly with one retry) to check health
- Lesson 1030 — Combining Retries with Circuit Breakers
- Half-open state
- After cooldown period, try a few test requests to see if backend recovered
- Lesson 105 — Graceful Degradation and Circuit BreakingLesson 889 — Circuit Breaking and FallbacksLesson 1059 — Timeout Windows and Reset LogicLesson 1060 — Half-Open State Testing
- Half-open testing windows
- should align with your dependency's recovery patterns.
- Lesson 1066 — Tuning for Production Workloads
- Hamming distance
- (number of differing bits) between two Simhash values correlates with content similarity.
- Lesson 1855 — Near-Duplicate Detection with Simhash
- Handle collisions gracefully
- If taken, suggest alternatives (`nike2`, `nike-official`) or reject the request.
- Lesson 1531 — Custom Aliases and Vanity URLs
- Handle conflicts gracefully
- Use application logic or CRDTs to resolve discrepancies
- Lesson 583 — Alternative: Best Effort with Eventual ConsistencyLesson 1514 — Custom Short URL Support
- Handle rate limits gracefully
- Back off when you receive 429 (Too Many Requests) or 503 (Service Unavailable) responses.
- Lesson 1831 — Robots.txt and Crawl Etiquette
- Handle schema changes
- Coordinate table creation, deletion, and column family modifications
- Lesson 447 — Master Server and Metadata Management
- Handles client-specific error formatting
- Lesson 905 — BFF Implementation Patterns
- HAProxy
- (High Availability Proxy) is a free, open-source software load balancer that specializes in distributing traffic across multiple servers with exceptional performance and reliability.
- Lesson 112 — HAProxy Overview
- Hard purge
- Immediately deletes cached content; next request must go to origin
- Lesson 185 — Purging and Cache Invalidation Strategies
- Hard to audit
- Policy logic scattered everywhere
- Lesson 941 — Policy Decision Points (PDP) and Enforcement Points (PEP)
- hardware appliances
- and **software solutions**.
- Lesson 79 — Hardware vs Software Load BalancersLesson 108 — Hardware vs Software Load Balancers
- Hash approach
- Assign books by random number → shelves are equally full, but finding all mystery novels requires checking every shelf
- Lesson 1454 — Partitioning Tradeoffs: Distribution vs Query Efficiency
- Hash by hostname
- Extract the hostname from each URL and route it to a dedicated queue
- Lesson 1841 — Single-Host Queue Pattern
- Hash Matching
- Compare content hashes against databases of known malicious files (like VirusTotal)
- Lesson 1581 — Abuse Prevention and Content ModerationLesson 1629 — Content Moderation at Scale
- Hash Partitioning
- Lesson 703 — Partitioning Strategies and Key SelectionLesson 1453 — Composite Partitioning
- hash ring
- (imagine a circle with values 0 to 2³²-1).
- Lesson 90 — Consistent Hashing for Load BalancingLesson 1457 — The Hash Ring Concept
- Hash the key
- Apply your hash function to get a position on the ring (e.
- Lesson 1459 — Clockwise Key Assignment RuleLesson 1806 — Rate Limiting with Consistent Hashing
- Hash the long URL
- Apply MD5 (128 bits) or SHA-256 (256 bits)
- Lesson 1508 — Hash-Based Generation Approach
- Hash the URL
- Apply a hash function to the normalized URL: `hash("http://example.
- Lesson 1854 — Distributed URL DeduplicationLesson 1867 — Distributed Deduplication with Bloom Filters
- Hash-based
- excels at point lookups by shard key
- Lesson 253 — Evaluating Sharding Strategy TradeoffsLesson 1157 — Log Sampling and FilteringLesson 1164 — Sampling for High-Volume Logs
- Hash-based partitioning
- works by:
- Lesson 361 — Partitioning in Distributed Key-Value StoresLesson 1454 — Partitioning Tradeoffs: Distribution vs Query Efficiency
- hash-based sharding
- instead of ranges to scatter sequential IDs uniformly
- Lesson 248 — Avoiding Hotspots in ShardingLesson 1769 — Horizontal Scaling of Search Infrastructure
- Hashes
- (or maps) store field-value pairs within a single key.
- Lesson 341 — Data Types and Value ComplexityLesson 1808 — Redis Data Structures for Rate Limiting
- HBase
- is an open-source, distributed, column-oriented NoSQL database modeled after Google's BigTable.
- Lesson 433 — What is HBase?Lesson 450 — BigTable's Influence on Modern SystemsLesson 493 — CP Systems: Prioritizing ConsistencyLesson 518 — PC/EC Systems: Consistency AlwaysLesson 521 — PACELC Tradeoffs in Real Systems
- HBase advantages
- Lesson 441 — HBase vs BigTable Design Differences
- HDFS
- (Hadoop Distributed File System) for storing massive files across machines
- Lesson 743 — Batch Processing Frameworks
- head
- to **tail**, with each replica forwarding the write to the next.
- Lesson 1362 — Chain ReplicationLesson 1373 — Chain Replication
- Head-based sampling
- means making the decision to keep or discard a trace at the moment it begins—at the "head" or root span of the request.
- Lesson 1253 — Head-Based Sampling
- Header manipulation
- Add authentication tokens, remove sensitive client headers, inject tracing IDs
- Lesson 882 — Request and Response Transformation
- Header validation
- Are required headers (like `Content-Type`) present and correct?
- Lesson 886 — Request Validation
- Header-based routing
- lets you route based on metadata: send requests from your mobile app to `api-v2`, while web traffic stays on `api-v1`.
- Lesson 848 — Traffic Management and Routing
- Health checks
- are periodic tests that load balancers perform to verify each backend server is operational and ready to handle traffic.
- Lesson 98 — What Are Health Checks?Lesson 810 — Deployment ComplexityLesson 1335 — Failover MechanismsLesson 1339 — Health Checks and Failure DetectionLesson 1440 — DNS and Traffic Management in DRLesson 1536 — Horizontal Scaling of Redirect ServersLesson 1674 — Connection Management at Scale
- Health Checks (Pull Model)
- Lesson 1339 — Health Checks and Failure Detection
- Health Checks and Monitoring
- Lesson 81 — Single Point of Failure: Load Balancer HA
- Health information
- medical records, diagnoses
- Lesson 1145 — Sensitive Data in Structured LogsLesson 1163 — Avoid Logging Sensitive Data
- Health tracking
- Knowledge of which servers are healthy vs unavailable
- Lesson 83 — Client-Side Load Balancing
- Health-Aware Routing
- The mesh only routes to instances that have passed health checks, automatically excluding failed or unhealthy ones
- Lesson 832 — Service Discovery in a Mesh
- Healthcare records
- – patient data and prescriptions demand accuracy
- Lesson 318 — When to Choose ACID or BASE
- Heaps
- Both insert and extract-min are O(log n), balancing writes and reads beautifully.
- Lesson 1847 — Heap-Based Priority Queue Implementation
- Heartbeat mechanism
- Client sends periodic "I'm alive" pings (every 30–60 seconds) over the existing WebSocket/SSE connection
- Lesson 1676 — Presence Detection and User Status
- Heartbeats (Push Model)
- Lesson 1339 — Health Checks and Failure Detection
- Heterogeneous hardware
- A server with 64GB RAM might get 200 vnodes, while a 32GB server gets 100 vnodes
- Lesson 363 — Virtual Nodes and Load DistributionLesson 372 — Consistent Hashing in Dynamo
- Heterogeneous node weights
- solve this by varying the number of virtual nodes assigned to each physical server based on its capacity.
- Lesson 1465 — Heterogeneous Node Weights
- Heterogeneous partitioning
- Mix hash-based and range-based assignments freely
- Lesson 1476 — Directory Partitioning Fundamentals
- Heuristic-based detection
- analyzes code patterns, keywords, and structure.
- Lesson 1575 — Syntax Highlighting and Language Detection
- Hidden dependencies
- Discover undocumented service calls you didn't know existed.
- Lesson 1229 — Service Dependency Graphs
- Hide internal complexity
- by presenting simplified, consistent APIs
- Lesson 882 — Request and Response Transformation
- Hierarchical
- Combine multiple levels: `tenant:acme:user:42:payment:abc-123`
- Lesson 1017 — Idempotency Key Scope and Namespacing
- High accuracy critical
- Use **Sliding Window Log** (perfect tracking) or **Sliding Window Counter** (near-perfect with less memory)
- Lesson 975 — Algorithm Selection Criteria
- High accuracy, high latency
- Centralized counter with strong consistency guarantees (locks, transactions).
- Lesson 985 — Trade-offs: Accuracy vs Latency
- High availability
- (99.
- Lesson 14 — Availability and Reliability RequirementsLesson 73 — Leaderless Replication ModelLesson 317 — ACID vs BASE TradeoffsLesson 352 — Redis Replication ArchitectureLesson 373 — Replication and Quorum in DynamoLesson 394 — Replica Sets for High AvailabilityLesson 877 — The API Gateway Bottleneck RiskLesson 979 — Centralized vs Decentralized Approaches (+2 more)
- High availability requirements
- strongly favor horizontal scaling.
- Lesson 51 — When to Choose Horizontal Scaling
- High availability, low reliability
- The ATM is always powered on (available), but occasionally dispenses the wrong amount of cash or debits your account incorrectly.
- Lesson 1322 — Availability vs Reliability: Key Differences
- High burst tolerance needed
- Use **Token Bucket** (allows bursting up to bucket size) or **Leaky Bucket** (smooths bursts over time)
- Lesson 975 — Algorithm Selection Criteria
- High business impact
- For critical user journeys (checkout, payment processing), the cost of downtime far exceeds tracing expenses.
- Lesson 1260 — Cost-Benefit Analysis
- High cache hit ratio
- The same file serves millions of users
- Lesson 173 — Content Types Suited for CDNs
- High cardinality
- Many distinct values to spread data widely (e.
- Lesson 232 — Shard Key SelectionLesson 247 — Choosing the Right Shard KeyLesson 1180 — Time Series Data ModelLesson 1472 — Range Partition Key Selection
- High cohesion
- means everything inside a service is closely related and works toward the same purpose.
- Lesson 818 — High Cohesion, Low Coupling in Service Design
- High consistency
- W=3, R=1 (all writes succeed everywhere before returning)
- Lesson 373 — Replication and Quorum in Dynamo
- High failure rates
- even when closed (threshold might be misconfigured)
- Lesson 1064 — Monitoring and Metrics
- High latency
- A user in Tokyo requesting a video from your Virginia data center faces network round-trip delays of 150-200ms
- Lesson 1609 — Why CDNs Are Essential for Media Hosting
- High priority queue
- Direct messages, mentions, friend requests
- Lesson 1700 — Priority Queues and Urgency Levels
- High read throughput
- The tail can handle many concurrent reads without coordinating with other replicas
- Lesson 1362 — Chain Replication
- High reliability
- requires careful error handling, data validation, testing, and monitoring.
- Lesson 14 — Availability and Reliability Requirements
- High reliability, low availability
- When the ATM works, it's always accurate, but it's frequently offline for maintenance.
- Lesson 1322 — Availability vs Reliability: Key Differences
- High resolution, short retention
- 1-second intervals kept for 24 hours—perfect for debugging live incidents
- Lesson 1270 — Monitoring Resolution and Retention Tradeoffs
- High Team Coordination Overhead
- Lesson 823 — Signs You're Over-Decomposing Services
- high throughput
- , and **simple data**.
- Lesson 345 — Use Cases for Key-Value StoresLesson 443 — BigTable Overview and MotivationLesson 577 — 2PC vs Single-Node TransactionsLesson 699 — Event Streaming Platform RequirementsLesson 1721 — Preference Storage StrategyLesson 1807 — In-Memory vs Persistent Storage for Rate Limiting
- High traffic variance
- Black Friday traffic shouldn't starve routine operations
- Lesson 1076 — Bulkhead Tradeoffs: Complexity and Resource Overhead
- High write throughput
- IoT devices generate massive volumes of time-series data.
- Lesson 404 — Mobile and IoT Backend StorageLesson 640 — Performance Characteristics of Consensus
- High-availability systems
- More frequent checks with shorter timeouts
- Lesson 100 — Health Check Intervals and Timeouts
- High-frequency routine operations
- Don't log every successful cache hit or health check ping—you'll drown in noise and hurt performance.
- Lesson 1129 — What to Log vs What Not to Log
- High-lag replicas
- might still serve less critical queries or get fewer requests until they catch up
- Lesson 218 — Lag-Aware Load Balancing
- High-risk scenarios
- where random generation is critical:
- Lesson 1515 — Short URL Predictability Tradeoffs
- High-throughput transactional workloads
- that have outgrown single-server SQL databases
- Lesson 337 — When to Choose NewSQL
- High-volume streaming
- Kafka (covered later), NATS
- Lesson 676 — Choosing Between Message Broker Technologies
- higher
- proposal number.
- Lesson 611 — Proposal Numbers and OrderingLesson 615 — Handling Conflicts and Preemption
- Higher availability
- If 3 nodes are down, you can still read (R=2) and write (W=2) successfully
- Lesson 560 — Eventual Consistency with Quorums
- Higher complexity
- Self-hosted ActiveMQ, multi-datacenter setups
- Lesson 676 — Choosing Between Message Broker Technologies
- Higher CPU/memory cost
- on the client (maintaining time windows and sorting)
- Lesson 1186 — Summary Metrics
- Higher last term wins
- If the logs end with different terms, the one with the higher term is more up-to-date
- Lesson 628 — Election Restriction: Up-to-Date Check
- Higher operational complexity
- More servers to monitor, upgrade, and debug
- Lesson 639 — Consensus Cluster Sizing Tradeoffs
- Higher RTO
- Failover takes time (DNS updates, scaling up standbys)
- Lesson 1436 — Active-Passive vs Active-Active DR
- Higher throughput
- – The primary isn't bottlenecked by the slowest replica
- Lesson 1356 — Asynchronous Replication: Speed and Risk
- Hint storage
- The temporary nodes store a "hint" that this data belongs elsewhere
- Lesson 1372 — Sloppy Quorums and Hinted Handoff
- hinted handoff
- .
- Lesson 366 — Sloppy Quorums and Hinted HandoffLesson 370 — Distributed Key-Value Store Architectures in PracticeLesson 375 — Sloppy Quorum and Hinted HandoffLesson 561 — Sloppy Quorums and Hinted HandoffLesson 1372 — Sloppy Quorums and Hinted Handoff
- Histogram
- Combine buckets from multiple measurements
- Lesson 1179 — Aggregation and Roll-UpsLesson 1185 — Histogram Metrics
- Histograms
- Request latency distribution (buckets of response times)
- Lesson 1172 — What Are Metrics and Why They Matter
- Histograms and Summaries
- demand `quantile()` for latency analysis.
- Lesson 1193 — Aggregation Functions
- Historical Analytics
- Lesson 738 — Batch Processing Use Cases
- Historical reprocessing is expensive
- If replaying your entire event log through a stream processor would take days or weeks, Lambda's batch layer can handle full historical processing more efficiently.
- Lesson 755 — When to Choose Lambda vs Kappa
- HLS (HTTP Live Streaming)
- Breaks video into small segments (~10 seconds each), with a playlist file telling the player which quality versions exist
- Lesson 193 — CDN for Video StreamingLesson 1602 — Adaptive Bitrate Streaming (ABR)Lesson 1613 — HLS and DASH ProtocolsLesson 1625 — Adaptive Bitrate Streaming
- HMAC
- with a secret key only your application and CDN know.
- Lesson 1627 — Access Control and Signed URLs
- Horizontal expansion
- means adding more worker instances rather than making one worker bigger (vertical scaling).
- Lesson 1708 — Scalability and Horizontal Expansion
- Horizontal Partitioning (Sharding)
- means splitting your table by **rows**.
- Lesson 231 — Vertical Partitioning vs Horizontal Partitioning
- Horizontal partitioning/sharding
- helps when:
- Lesson 231 — Vertical Partitioning vs Horizontal Partitioning
- Horizontal scalability
- by adding more machines
- Lesson 331 — What NewSQL IsLesson 336 — NewSQL TradeoffsLesson 661 — Competing Consumers PatternLesson 708 — Consumer Groups and Parallel ConsumptionLesson 761 — Data Lake Storage TechnologiesLesson 1806 — Rate Limiting with Consistent Hashing
- Horizontal scaling
- means increasing your system's capacity by adding more servers (machines) to distribute the workload across multiple instances.
- Lesson 44 — What is Horizontal Scaling?Lesson 45 — Comparing Cost StructuresLesson 66 — Why Partition Data?Lesson 76 — What Is a Load Balancer?Lesson 95 — Geographic/Proximity-Based RoutingLesson 332 — The NewSQL Value PropositionLesson 435 — HBase Regions and Region ServersLesson 701 — Topics and Partitions (+5 more)
- Horizontal Scaling Becomes Trivial
- You can spin up 5, 50, or 500 gateway instances instantly.
- Lesson 878 — Stateless Gateway Design
- Horizontal scaling complexity
- Need sticky sessions or distributed session store
- Lesson 916 — Session vs Token Tradeoffs
- Host extraction
- Parse the URL to identify its hostname (e.
- Lesson 1845 — Back Queue: Politeness Enforcement
- Host-based routing
- Different domains routed to different backends
- Lesson 113 — Cloud Load Balancers (AWS ELB/ALB)
- Host-Based Sharding
- Assign each host to a specific worker using consistent hashing.
- Lesson 1868 — Coordinating Politeness Across Workers
- Hot spots
- High request volumes on certain edges indicate load concentration.
- Lesson 1229 — Service Dependency Graphs
- hot standby
- is a fully-running, continuously synchronized backup system that can take over almost instantly when the primary fails.
- Lesson 1417 — Hot Standby vs Cold StandbyLesson 1443 — DR Cost Optimization
- Hot storage
- (0-7 days): Fast SSDs for active debugging
- Lesson 1135 — Log Retention and Volume ManagementLesson 1428 — Backup Storage TiersLesson 1557 — Hot vs Cold Storage TieringLesson 1572 — Storage Tier MigrationLesson 1589 — Storage Tiering Strategy
- Hot storage (0-7 days)
- Keep all sampled traces with full fidelity.
- Lesson 1246 — Trace Data Retention Policies
- Hot tier
- Recent logs (hours to days old) stored on fast SSDs with full indexing.
- Lesson 1156 — Indexing Strategies and RetentionLesson 1620 — Storage Tiering for Cost OptimizationLesson 1663 — Hot and Cold Timeline Data
- Hot tier (0-7 days)
- Fast SSD-backed search indexes (Elasticsearch, Splunk).
- Lesson 1165 — Log Retention Policies
- Hotspot cascades
- Overloaded partitions trigger the problems you learned earlier
- Lesson 1491 — Data Skew and Cardinality Issues
- Hotspot mitigation
- by splitting hot ranges into smaller partitions
- Lesson 1480 — Hybrid Partitioning Approaches
- hotspots
- (one shard handling most traffic), and **inefficient queries** (needing to check every shard).
- Lesson 232 — Shard Key SelectionLesson 234 — Data Distribution and HotspotsLesson 240 — Hash- Based ShardingLesson 241 — Range-Based ShardingLesson 256 — Hotspots and Uneven Data DistributionLesson 396 — Sharding in MongoDBLesson 397 — Shard Key SelectionLesson 1468 — Bounded Loads Extension (+1 more)
- Hotspots emerge
- Server A might handle 40% of your data while Server B handles only 5%
- Lesson 1462 — The Uneven Distribution Problem
- how
- to resolve it.
- Lesson 539 — Vector Clocks and CausalityLesson 1078 — Cascading Failure Propagation MechanicsLesson 1109 — Context Propagation MechanismsLesson 1161 — Context-Rich Logging
- How to measure effectively
- Lesson 40 — Measure Before Optimizing
- HTML parser
- (like BeautifulSoup, jsoup, or native DOM parsers) to identify all `<a>` tags with `href` attributes, plus other link sources like `<img>`, `<script>`, and `<link>` tags.
- Lesson 1829 — URL Discovery and Extraction
- HTTP
- Custom headers like `X-Timeout-Ms` or `Request-Timeout`
- Lesson 1113 — Cross-Protocol Deadline Handling
- HTTP → WebSocket
- Upgrading connections or proxying events from backend message streams
- Lesson 881 — Protocol Translation
- HTTP headers
- Mobile user-agents route to mobile-optimized servers
- Lesson 110 — Layer 7 (Application) Load BalancingLesson 854 — Request-Level Authorization
- HTTP methods
- `POST` requests might go to write-heavy servers
- Lesson 110 — Layer 7 (Application) Load Balancing
- HTTP-native
- Direct CDN integration and pre-signed URL support
- Lesson 1588 — Object Storage vs Block Storage
- HTTP/2
- for transport, delivering significantly better performance.
- Lesson 1917 — gRPC: Protocol Buffers and Binary RPC
- HTTP/2 or HTTP/3
- with multiplexing to reduce round-trip overhead
- Lesson 1618 — Optimizing for Mobile Networks
- HTTP/REST → gRPC
- Gateway receives JSON over HTTP, marshals it into Protocol Buffer format, makes a gRPC call
- Lesson 881 — Protocol Translation
- Hub-and-Spoke
- Regional clusters replicate to a central aggregation cluster
- Lesson 726 — Multi-Datacenter Replication
- Human intervention time
- If automated retries fail, a human might investigate and manually retry within a business day
- Lesson 1012 — Idempotency Key Expiration Strategy
- Human-readable
- means clear messages, logical ordering, and context that makes sense at a glance:
- Lesson 1166 — Human-Readable vs Machine-Parseable
- Human-Readable Message
- A clear explanation for developers debugging the issue (e.
- Lesson 1883 — Error Response Structure and Consistency
- Hybrid
- Azure Service Bus, which offers both
- Lesson 676 — Choosing Between Message Broker TechnologiesLesson 1397 — Bounded Staleness ConsistencyLesson 1658 — Fanout Strategy Selection Criteria
- Hybrid Approach
- Combine short TTLs with selective deny lists—only revoke explicitly when immediate termination is critical (security incidents), accepting brief validity windows for routine logouts.
- Lesson 948 — Token Revocation at ScaleLesson 969 — Sliding Window CounterLesson 1316 — The SRE Org Model: Embedding vs. ConsultingLesson 1357 — Semi-Synchronous ReplicationLesson 1480 — Hybrid Partitioning ApproachesLesson 1575 — Syntax Highlighting and Language DetectionLesson 1577 — Paste Editing and Version HistoryLesson 1640 — Celebrity Problem in Push Models (+5 more)
- hybrid approaches
- cache tokens in Redis with database backup, providing speed for cache hits and durability for misses.
- Lesson 1040 — Idempotency Token Storage StrategiesLesson 1667 — Real-Time vs Precomputed RankingLesson 1752 — Index Compression Techniques
- Hybrid approaches work
- Many systems use **strong consistency for critical writes** (payment processing) while accepting **eventual consistency for reads** (product reviews).
- Lesson 553 — Choosing Consistency Levels
- Hybrid Feed (Most Common)
- Lesson 1634 — Feed Scope: What Content to Show
- Hybrid Model
- recognizes that not all users are equal: some have millions of followers (celebrities), others are highly active, and some log in rarely.
- Lesson 1639 — Hybrid (Pull-Push) Feed ModelLesson 1644 — Feed Personalization and Ranking RequirementsLesson 1645 — What is Fanout in Social Media Systems
- Hybrid models
- exist where platform teams provide BFF frameworks and templates, but feature teams customize their instances independently.
- Lesson 906 — BFF Ownership and Team Structure
- Hybrid needs
- → Consider using both (polyglot persistence)
- Lesson 319 — Decision Framework: Data Model First
- Hybrid patterns
- let you use sessions where they work best (user-facing web apps with browsers) and tokens where *they* excel (microservices, mobile apps, API access).
- Lesson 919 — Hybrid Session-Token Patterns
- Hybrid sharding
- means applying different sharding techniques together—either in layers or combined into a composite key—to balance multiple goals simultaneously.
- Lesson 250 — Hybrid Sharding Strategies
- Hybrid: Push + Pull
- Lesson 1802 — Synchronization Strategies for Local Caches
- HyperLogLog
- is Redis's probabilistic data structure that estimates cardinality (unique counts) with ~0.
- Lesson 359 — Redis for Leaderboards and Counting
- Hystrix
- (Netflix's now-deprecated library) and **Resilience4j** (the modern alternative).
- Lesson 1075 — Implementing Bulkheads in Practice: Hystrix and Resilience4j
I
- I/O (Input/Output)
- Disk read/write speeds and network bandwidth also plateau.
- Lesson 46 — Hardware Limits of Vertical Scaling
- I/O load
- Reading entire datasets can impact production system performance
- Lesson 1421 — Full Backup Strategy
- ID token
- alongside OAuth2's access token.
- Lesson 928 — OpenID Connect (OIDC) OverviewLesson 929 — ID Tokens and the UserInfo Endpoint
- Idempotency
- means an operation can be applied multiple times without changing the result beyond the first application.
- Lesson 686 — Idempotency and DeduplicationLesson 998 — What is Idempotency?Lesson 1568 — Scheduled Cleanup Job DesignLesson 1605 — Distributed Worker ArchitectureLesson 1656 — Fanout Failure HandlingLesson 1710 — Why Exactly-Once Is Hard in Notifications
- Idempotency is critical
- Since both requests might complete, your operation must be idempotent (remember idempotency keys from earlier lessons)
- Lesson 1031 — Hedged Requests and Speculative Execution
- idempotency key
- is a unique identifier that a client generates and sends with each request.
- Lesson 1003 — Idempotency KeysLesson 1010 — Idempotency Keys for POST RequestsLesson 1711 — Idempotency Keys for NotificationsLesson 1884 — Idempotency in RESTful APIs
- idempotency keys
- , **server-side state tracking**, and **time windows** (concepts you've already learned) to transform non-idempotent operations into idempotent ones.
- Lesson 1006 — Natural Idempotency vs Engineered IdempotencyLesson 1009 — HTTP Methods and Natural IdempotencyLesson 1033 — Idempotency Keys in Payment SystemsLesson 1711 — Idempotency Keys for Notifications
- Idempotency time windows
- limit how long the server remembers an idempotency key.
- Lesson 1005 — Idempotency Time Windows
- idempotency token
- (or key) is a unique identifier the client generates and includes with each request.
- Lesson 1027 — Idempotency Tokens in Retry LogicLesson 1036 — Request Token Generation and Management
- idempotent
- task produces the same result when run multiple times with the same inputs.
- Lesson 777 — Workflow Orchestration PatternsLesson 1000 — Idempotent vs Non-Idempotent OperationsLesson 1008 — What Makes an API IdempotentLesson 1875 — HTTP Methods: GET, POST, PUT, DELETE Semantics
- Idempotent operations
- (reprocessing produces the same result)
- Lesson 680 — Exactly-Once DeliveryLesson 998 — What is Idempotency?Lesson 1037 — Idempotency in Distributed WorkflowsLesson 1441 — Runbooks and Automation
- Identical behavior
- means any load balancer can route requests randomly
- Lesson 57 — Scaling Stateless Services Horizontally
- Identifies best channels
- per segment (some users never open email but always read SMS)
- Lesson 1729 — Analytics-Driven Optimization
- Identify
- a slice of functionality to migrate first (often starting small)
- Lesson 822 — The Strangler Fig Pattern for Migration
- Identify candidates
- Media older than 6-12 months with minimal access
- Lesson 1623 — Cold Storage and Archival
- Identify High-Impact Targets
- Lesson 1312 — Measuring and Reducing Toil
- Identify noisy logs
- Which log statements fire constantly but provide little value during incidents?
- Lesson 1171 — Log Review and Alert Fatigue
- Identify the bottleneck
- Review your QPS estimate, storage growth, or latency targets.
- Lesson 35 — Iterate Based on Constraints
- Identify the latest version
- using timestamps or version vectors
- Lesson 559 — Strong Consistency with Quorums
- Identify your critical path
- (what was taught in lesson 1082): determine which features are absolutely essential.
- Lesson 1083 — Graceful Degradation Strategies
- Identify yourself
- Set a proper User-Agent header so site owners know who's crawling and can contact you if needed.
- Lesson 1831 — Robots.txt and Crawl Etiquette
- Identity
- Who requested access (user ID, service account, API key)
- Lesson 944 — Auditing and Compliance for Authorization
- Identity & Ownership
- Lesson 1642 — Post Metadata and Schema Design
- Identity federation
- means delegating authentication to external identity providers (IdPs) while maintaining your own authorization layer.
- Lesson 932 — Multi-Tenant OAuth2 and Identity Federation
- Idle timeout
- How long a connection can sit unused in the pool before being closed.
- Lesson 272 — Connection Timeouts and Limits
- Idle users
- Optional middle tier—defer updates briefly, then pull
- Lesson 1676 — Presence Detection and User Status
- If
- Partition, choose Availability or Consistency; **Else** (no partition), choose Latency or Consistency.
- Lesson 516 — The 'Else' Clause: Normal Operation TradeoffsLesson 1015 — Conditional Writes for Idempotency
- If a coordinator crashes
- after sending PREPARE but before COMMIT, it reads its log on restart.
- Lesson 574 — Recovery Protocols and Logs
- If a participant crashes
- after voting YES, it reads its log, sees it's waiting for a decision, and contacts the coordinator (or other participants) to learn the outcome.
- Lesson 574 — Recovery Protocols and Logs
- If any fail
- transition back to **open** (still broken, reset timer)
- Lesson 1060 — Half-Open State Testing
- If duplicate
- , skip processing; **if new**, index and store the fingerprint
- Lesson 1852 — Content Fingerprinting with Hashing
- If everyone said "yes"
- Coordinator tells all participants to commit permanently
- Lesson 568 — Two-Phase Commit (2PC) Overview
- If it exists
- Skip storage entirely, just create a new metadata record pointing to the existing file
- Lesson 1622 — Deduplication Strategies
- If no
- Store the file once and record its hash and location
- Lesson 1591 — Deduplication Using Content Hashing
- If not cached
- (cache miss), the edge fetches it once from your origin, caches it, then serves it
- Lesson 192 — CDN for Static Asset Delivery
- If the bucket overflows
- (queue is full), new requests are rejected
- Lesson 965 — Leaky Bucket Algorithm
- If they fail
- The service is still unhealthy, so the breaker returns to **Open** and waits longer before trying again
- Lesson 1047 — The Three States: Half-Open
- If they succeed
- The service appears healthy, so the breaker transitions back to **Closed**
- Lesson 1047 — The Three States: Half-Open
- If yes
- Don't store the file again—just create a new metadata record pointing to the existing storage location
- Lesson 1591 — Deduplication Using Content Hashing
- Images
- (JPEG, PNG, SVG, WebP)
- Lesson 173 — Content Types Suited for CDNsLesson 1608 — Post-Processing and Metadata Extraction
- Immediate consistency
- The paste URL is instantly valid
- Lesson 1559 — Write Path: Synchronous vs Asynchronous Storage
- Immediate requeue
- Put the message back at the front of the queue
- Lesson 684 — Negative Acknowledgments and Redelivery
- Immediate response required
- The client needs data right now (e.
- Lesson 654 — When to Use Async vs Sync
- immediately
- returns success to the application
- Lesson 204 — Asynchronous Replication ExplainedLesson 1414 — RPO Zero: Synchronous Replication
- Immediately after
- , application deletes `user:123:profile` from cache
- Lesson 157 — Active Invalidation on Write
- Immutability
- Logs cannot be altered or deleted by users
- Lesson 944 — Auditing and Compliance for Authorization
- Impact Assessment
- Duration of outage, users affected, revenue lost, SLO/error budget burn
- Lesson 1350 — What is a Postmortem?
- Impact Metrics
- Quantifiable damage—error rate, affected users, revenue loss, MTTR.
- Lesson 1352 — Postmortem Structure and Action Items
- Implement fallback logic
- If push delivery fails (user offline, device unreachable), automatically retry or escalate to SMS.
- Lesson 1689 — Multi-Channel Delivery
- Implement scrubbing
- Use logging middleware or filters that detect and redact sensitive patterns before writing logs.
- Lesson 1163 — Avoid Logging Sensitive Data
- Implementation
- Store all short codes in lowercase in your database, then lowercase user input during redirects: `SELECT long_url FROM urls WHERE short_code = LOWER($input)`
- Lesson 1518 — Case Sensitivity ConsiderationsLesson 1534 — Rate Limiting for URL Creation
- Implementation approach
- Lesson 1565 — Expiration Requirements and TTL Basics
- Important but not critical
- Product recommendations, related items, reviews
- Lesson 1082 — Critical Path Identification
- Important requests
- (browsing, search): moderate throttling during spikes
- Lesson 995 — Graceful Degradation Through Throttling
- Improve iteratively
- learning from incidents through blameless postmortems
- Lesson 1307 — What is Site Reliability Engineering (SRE)?
- Improved Throughput
- With dedicated resources for each operation type, you can serve more total requests per second.
- Lesson 220 — Read-Write Splitting Fundamentals
- Improves user experience
- by catching limits early
- Lesson 1789 — Client-Side vs Server-Side Rate Limiting
- In transit
- means data moving across networks—from your database server to backup storage, or when retrieving backups for restoration.
- Lesson 1409 — Backup Encryption and Security
- In-app
- Rich notification object with action buttons and images
- Lesson 1692 — Channel-Specific Formatting
- in-memory
- (RAM) or **on-disk** (persistent storage).
- Lesson 340 — In-Memory vs Persistent Key-Value StoresLesson 358 — Redis for Rate LimitingLesson 743 — Batch Processing FrameworksLesson 768 — Apache Spark OverviewLesson 910 — Session Storage OptionsLesson 976 — Rate Limiting State StorageLesson 1242 — Zipkin Architecture and DesignLesson 1807 — In-Memory vs Persistent Storage for Rate Limiting
- In-memory buffer
- – New documents first go into a small, fast in-memory index structure.
- Lesson 1754 — Real-Time Indexing and Updates
- in-memory caches
- for high-volume, low-value operations where brief inconsistency is acceptable.
- Lesson 1040 — Idempotency Token Storage StrategiesLesson 1712 — Deduplication Windows and Storage
- In-Memory Caches (e.g., Redis)
- Lesson 1040 — Idempotency Token Storage Strategies
- In-memory stores
- like Redis keep all data in RAM.
- Lesson 340 — In-Memory vs Persistent Key-Value Stores
- In-Sync Replica (ISR)
- set is Kafka's list of replicas—including the leader—that are fully caught up with the latest messages.
- Lesson 707 — In-Sync Replicas (ISR)
- Inability to Scale Horizontally
- Lesson 77 — Why Load Balancers Are Necessary
- Inbound policies
- handle authentication, rate limiting, and request transformation before reaching backends
- Lesson 899 — Azure API Management Features
- Incident Command System
- is a structured framework borrowed from emergency response that assigns specific roles to coordinate your response effectively.
- Lesson 1300 — Incident Command System (ICS)
- Incident commander
- or executive (for severe outages)
- Lesson 1292 — Alert Routing and EscalationLesson 1301 — War Rooms and Communication Channels
- Incident Commander (IC)
- The single decision-maker who owns the incident.
- Lesson 1300 — Incident Command System (ICS)
- Include in calculations (stricter)
- Lesson 1328 — Scheduled Maintenance and Availability Accounting
- Include Units Explicitly
- Lesson 1209 — Metric Naming Conventions
- Inconsistency
- Authorization decisions vary across services
- Lesson 941 — Policy Decision Points (PDP) and Enforcement Points (PEP)Lesson 1377 — What Are Replication Conflicts?
- Inconsistency Risk
- Even with careful engineering, subtle differences in floating-point math, aggregation order, or time zone handling can make the two layers produce slightly different results.
- Lesson 751 — Lambda Architecture Tradeoffs
- Increase costs
- dramatically (you're storing and querying far more data)
- Lesson 1258 — Cardinality Explosion
- Increase MTBF
- (make failures rarer) — better hardware, redundancy, graceful degradation
- Lesson 1325 — Availability Formula: MTBF and MTTR Relationship
- Increased complexity
- that makes the system harder to maintain
- Lesson 36 — YAGNI: You Aren't Gonna Need It
- Increased latency
- All requests wait for slow database responses
- Lesson 159 — Cache Stampede ProblemLesson 336 — NewSQL Tradeoffs
- Increased operational complexity
- requiring more expertise
- Lesson 1413 — The Cost-Availability Tradeoff
- Increased propagation delay
- – Changes must traverse multiple hops; a 3-level tree means 3 sequential replication steps
- Lesson 1374 — Tree Replication Topology
- Incremental backfill
- Process data in small, controlled batches
- Lesson 777 — Workflow Orchestration Patterns
- incremental backup
- captures only the data that has changed since the *last backup* — whether that was a full or another incremental.
- Lesson 1403 — Incremental BackupsLesson 1422 — Incremental Backup StrategyLesson 1424 — Backup Scheduling and Frequency
- Incremental suffix
- Append an increasing counter: try `hash(url)`, then `hash(url + "1")`, `hash(url + "2")`, etc.
- Lesson 1509 — Handling Hash Collisions
- Incremental updates
- Build delta indexes for new documents, merge periodically
- Lesson 1746 — Index Construction at ScaleLesson 1772 — Real-Time Index Updates
- Incrementing counters
- `UPDATE account SET balance = balance + 100` — repeating adds $100 each time
- Lesson 1006 — Natural Idempotency vs Engineered Idempotency
- Independence
- Subscribers consume at their own pace without affecting others
- Lesson 656 — Pub-Sub Pattern Fundamentals
- independent deployability
- promise of microservices.
- Lesson 789 — The Distributed Monolith Anti-PatternLesson 794 — Team Autonomy and OwnershipLesson 821 — When to Transition from Monolith to MicroservicesLesson 822 — The Strangler Fig Pattern for Migration
- Independent Deployment
- You can deploy Service B's new version without touching Service A.
- Lesson 648 — Decoupling Through MessagingLesson 781 — What are Microservices?Lesson 796 — Faster Development Cycles
- Independent Development
- Teams can build, test, and modify their services without coordinating schedules.
- Lesson 648 — Decoupling Through Messaging
- Independent evolution
- Teams can modify their services without organizational bottlenecks
- Lesson 788 — Organizational Alignment: Conway's LawLesson 904 — BFF vs Single Gateway Tradeoffs
- Independent scaling
- Configure more connections for reads if needed
- Lesson 221 — Application-Level Connection ManagementLesson 648 — Decoupling Through MessagingLesson 650 — Temporal DecouplingLesson 663 — Hybrid Patterns: Topic + QueueLesson 697 — Push vs Pull Consumption ModelsLesson 730 — Apache Pulsar ArchitectureLesson 821 — When to Transition from Monolith to Microservices
- Independent Testing
- Lesson 796 — Faster Development Cycles
- Independent timeline caches
- for faster local reads
- Lesson 1682 — Scaling to Billions of Daily Active Users
- independently
- based on their actual demand:
- Lesson 795 — Independent ScalingLesson 1568 — Scheduled Cleanup Job Design
- index
- creates a sorted lookup structure that points directly to the data, letting the database jump straight to relevant rows.
- Lesson 307 — Indexes and Query OptimizationLesson 623 — Log Structure and EntriesLesson 628 — Election Restriction: Up-to-Date Check
- Index alignment matters
- Combine filter columns and sort columns in composite indexes: `INDEX(genre, published_date, id)` supports filtering by genre, sorting by date, and stable ordering by ID.
- Lesson 1896 — Combining Pagination, Filtering, and Sorting
- Index efficiently
- Log management systems can index each field separately
- Lesson 1137 — What is Structured Logging
- Index Merge
- Use separate indexes on `user_id` and `status`, fetch matching rows from each, then intersect the results.
- Lesson 280 — Index Merge and Multi-Column Indexes
- Index Node Cache
- Even at the index level, individual shard results for common terms can be cached to avoid disk I/O.
- Lesson 1771 — Query Caching Strategies
- Index replication
- creates copies of each index shard on different physical servers.
- Lesson 1770 — Index Replication for Availability
- Index Size Metrics
- track the growth of your inverted index and posting lists.
- Lesson 1777 — Query Performance Monitoring
- Index structure
- What indexes already exist or can be added efficiently?
- Lesson 1895 — Default Sorting and Index Alignment
- Index-free adjacency
- eliminates this overhead by storing direct references (pointers) from each node to its connected neighbors, similar to how linked lists work in programming.
- Lesson 455 — Index-Free AdjacencyLesson 458 — Use Cases: Fraud Detection and Knowledge GraphsLesson 460 — Neo4j Architecture OverviewLesson 464 — Traversal Queries: Friends of FriendsLesson 471 — Why Graphs: Relationship-Heavy Data ModelsLesson 472 — Social Networks and Friend-of-Friend QueriesLesson 473 — Recommendation Engines with Graph TraversalLesson 476 — Graph Query Performance Characteristics (+1 more)
- Index-friendly
- Leverages existing indexes on the ordering column(s), making queries efficient even with millions of rows
- Lesson 1890 — Keyset Pagination
- Indexed queries
- Ensure `expires_at` is indexed for fast lookups
- Lesson 1568 — Scheduled Cleanup Job Design
- Indexers
- continuously process crawled documents, perform tokenization and text analysis, build inverted indexes with posting lists, and push completed index segments to storage.
- Lesson 1742 — Search System Architecture Overview
- indexing
- to optimize query speed.
- Lesson 762 — Query Performance TradeoffsLesson 1730 — What is a Search Engine?
- Indexing Speed vs Completeness
- Indexing every field makes queries fast but slows ingestion and bloats storage (indexes consume 30-50% of raw log size).
- Lesson 1159 — Log Aggregation Performance Considerations
- Individual metrics dashboards
- showing query latency, storage usage, and request rates
- Lesson 1492 — Operational Complexity of Partitioning
- Inference
- If "Alice mentors Bob" and "Bob mentors Carol," you can infer transitive knowledge relationships
- Lesson 475 — Knowledge Graphs and Semantic Networks
- Inflexibility
- Schema changes require ETL pipeline updates
- Lesson 759 — Schema-on-Write vs Schema-on-Read
- Influence analysis
- Measuring reach in social networks
- Lesson 464 — Traversal Queries: Friends of Friends
- InfluxDB
- offers strong write performance for IoT-scale ingestion.
- Lesson 1208 — Choosing a Metrics System for Your Scale
- INFO
- Normal operational events (user logged in, job completed)
- Lesson 1141 — Log Levels in Structured Logs
- Info (P4/P5)
- Informational alerts document noteworthy events for awareness and investigation, but require no immediate action.
- Lesson 1291 — Alert Severity Levels
- Information about the file
- (metadata) – owner, size, upload date, permissions – structured data perfect for relational databases
- Lesson 1590 — Metadata Database Design
- Infrastructure Burden
- Two separate systems mean double the monitoring, alerting, debugging tools, and operational expertise.
- Lesson 751 — Lambda Architecture Tradeoffs
- Infrastructure costs
- More disks, more cloud storage fees
- Lesson 409 — Data Size and Storage Considerations
- Infrastructure management
- More load balancers, more network policies, more databases, more certificates to rotate
- Lesson 803 — Operational Overhead
- Ingestion bottlenecks
- (metrics backend can't keep up)
- Lesson 1192 — Cardinality and Label ExplosionLesson 1207 — Metrics Cardinality and Performance Impact
- Ingestion servers
- receive the raw stream from the broadcaster (typically using RTMP or WebRTC protocols).
- Lesson 1630 — Live Streaming Architecture
- Ingests streaming data
- from message queues or event streams (Kafka, Kinesis)
- Lesson 749 — Lambda Architecture: Speed Layer
- Ingress (incoming)
- Data arriving when users request shortened URLs
- Lesson 1499 — Bandwidth Requirements for Redirects
- Ingress-Only Mesh
- Apply mesh features only at the cluster boundary—your ingress gateway.
- Lesson 869 — Alternatives to Full Service Mesh
- Initial scoping
- – "We have ~5 million DAU" is clearer than 4,847,293
- Lesson 32 — Rounding and Approximation Techniques
- Initiate the transaction
- The coordinator receives the transaction request from a client or application.
- Lesson 569 — The Coordinator Role in 2PC
- Initiation
- Client requests an upload ID from the server
- Lesson 1586 — Multipart Upload for Large Files
- Inject artificial delays
- at various points: add sleep statements in test environments, use proxy tools like Toxiproxy to introduce network latency, or leverage service mesh features to inject faults.
- Lesson 1125 — Timeout Testing and Chaos Engineering
- Inject synthetic failures
- Temporarily degrade a non-production service to cross an SLO threshold
- Lesson 1295 — Testing Alerts and Dry Runs
- Injecting latency
- adds artificial delays to network calls or internal operations.
- Lesson 1347 — Common Chaos Experiments
- Inline
- simplifies application logic but the cache becomes a critical component in the data path (potential bottleneck or failure point).
- Lesson 142 — Look-Aside vs Inline Cache Topologies
- Input/output data
- from each step for idempotency checks
- Lesson 597 — Saga State Management and Persistence
- INSERT IF NOT EXISTS
- Prevents duplicate record creation.
- Lesson 1015 — Conditional Writes for Idempotency
- Install on lagging followers
- If a follower is too far behind, the leader sends the snapshot via `InstallSnapshot RPC` instead of individual log entries
- Lesson 632 — Log Compaction: Snapshotting
- Instance termination
- Kill a primary server and verify the passive takes over
- Lesson 1342 — Testing Redundancy with Fault Injection
- Instant Cache Hits
- Popular URLs are ready immediately after cache restarts
- Lesson 1529 — Preloading Hot URLs into Cache
- Instant distribution
- No coordination needed—every server generates independently
- Lesson 1516 — Counter-Based vs UUID Approaches
- Instant Purge (Hard Invalidation)
- Lesson 1617 — Cache Invalidation and Purging
- Instant revocation
- Delete the session record, user is logged out immediately
- Lesson 916 — Session vs Token Tradeoffs
- Instantaneous effect
- Each operation takes effect at a single, atomic point in time during its execution
- Lesson 523 — Linearizability Defined
- instantly
- from temporary issues
- Lesson 1021 — Immediate Retry vs Delayed RetryLesson 1051 — Fast-Fail BehaviorLesson 1659 — Timeline Storage Requirements
- Int32/Int64
- Explicit integer types (JSON only has generic "number")
- Lesson 390 — BSON Format and Data Types
- Integration testing
- requires multiple services running simultaneously.
- Lesson 806 — Testing Complexity
- Integration with notification processing
- Lesson 1728 — Opt-Out and Compliance Tracking
- Intelligent routing
- based on time, severity, and team schedules
- Lesson 1305 — On-Call Tooling and Automation
- Interaction data (click tracking)
- Lesson 1779 — Search Analytics and Click Tracking
- Interchangeable servers
- Any server instance can handle any request because there's no "sticky" data tied to a specific server
- Lesson 55 — What Makes a Service Stateless
- Intermediate node failure risk
- – If a middle node fails, its entire subtree loses updates until recovery
- Lesson 1374 — Tree Replication Topology
- Internal Load Balancer
- Private load balancing within VPCs
- Lesson 114 — Cloud Load Balancers (GCP and Azure)
- Internal responder channel
- Technical details, debugging output, hypotheses
- Lesson 1301 — War Rooms and Communication Channels
- Intersection logic
- that combines multiple active filters using AND/OR operations
- Lesson 1775 — Faceted Search and Filters
- Invalid
- Gateway returns `401 Unauthorized` immediately—request never reaches backend
- Lesson 883 — Authentication at the Gateway
- Invalidate
- the cache entry, forcing next read to rebuild (simpler, slightly slower)
- Lesson 1664 — Timeline Caching StrategiesLesson 1722 — Real-Time Preference Updates
- Invalidate the application cache
- (delete the key from Redis/Memcached)
- Lesson 163 — Multi-Level Cache Invalidation
- Invalidate the CDN cache
- (purge or mark the edge content stale)
- Lesson 163 — Multi-Level Cache Invalidation
- Invalidate the database cache
- (clear query result buffers)
- Lesson 163 — Multi-Level Cache Invalidation
- Invalidation
- When data changes, mark cached copies as invalid (delete them).
- Lesson 128 — Cache Coherence Across Layers
- Invalidation strategy
- When permissions change, proactively invalidate affected cache entries rather than waiting for TTL expiration.
- Lesson 951 — Caching Authorization Decisions
- Inventory management
- Stock levels that must stay accurate
- Lesson 322 — Transaction Requirements and Trade-offs
- inventory service
- reserves items (written in Java)
- Lesson 812 — Developer Cognitive LoadLesson 817 — Identifying Service Boundaries by Data Ownership
- Inventory updates
- Decrementing stock twice causes incorrect counts
- Lesson 1001 — Side Effects and Idempotency
- Inverse Document Frequency (IDF)
- How rare is this term across all documents?
- Lesson 1740 — TF-IDF Scoring Fundamentals
- inverted index
- is essentially a giant lookup table that flips the traditional document-to-words relationship upside down.
- Lesson 1735 — Inverted Index StructureLesson 1740 — TF-IDF Scoring FundamentalsLesson 1743 — What Is an Inverted Index
- Inverted indexes
- It creates lookup tables mapping terms to document locations (like a book's index)
- Lesson 1150 — The ELK Stack: Elasticsearch
- IP address
- Routes based on the client's IP (similar to IP Hash algorithm we covered earlier)
- Lesson 94 — Session Affinity (Sticky Sessions)Lesson 1783 — Functional Requirements for Rate Limiter
- IP address rotation
- distributes requests across multiple source IPs, making your crawler appear as many independent clients and spreading the load.
- Lesson 1860 — IP Address Rotation and Geolocation
- IP addresses
- potentially thousands or more
- Lesson 1178 — Metric Cardinality and LabelsLesson 1211 — Avoiding High-Cardinality Labels
- IP Hash
- can create imbalances if client IPs aren't evenly distributed.
- Lesson 96 — Algorithm Selection Tradeoffs
- IP Hash algorithm
- is a load balancing strategy that uses the client's IP address to determine which backend server handles their request.
- Lesson 89 — IP Hash AlgorithmLesson 90 — Consistent Hashing for Load Balancing
- Isolate failures
- Don't let one component's failure cascade to others
- Lesson 1336 — Graceful Degradation
- Isolated data stores
- Each service owns its data (no shared databases)
- Lesson 791 — Independent Deployability
- Isolated failures
- Problems don't cascade to healthy shards
- Lesson 266 — Shard Failure and Partial Outages
- Isolation
- keeps concurrent transactions from interfering with each other.
- Lesson 309 — ACID Properties OverviewLesson 312 — Isolation Levels and Concurrent TransactionsLesson 470 — Transaction Model and ACID in Neo4jLesson 708 — Consumer Groups and Parallel ConsumptionLesson 1330 — What is Fault Tolerance?Lesson 1489 — Cross-Partition TransactionsLesson 1817 — Multi-Tenant Rate Limiting ArchitectureLesson 1822 — Scaling Rate Limiter Horizontally (+1 more)
- Isolation level implementations
- (what "serializable" actually means varies)
- Lesson 582 — Transaction Isolation Across Systems
- Iterate Based on Constraints
- when the requirements actually change.
- Lesson 36 — YAGNI: You Aren't Gonna Need It
J
- Jaeger
- and **Zipkin** offer full control, zero licensing costs, and community support.
- Lesson 1251 — Choosing a Tracing System
- Jaeger Agent
- acts as a local daemon on each host (typically a sidecar).
- Lesson 1241 — Jaeger Architecture and Components
- Jaeger Collector
- receives traces from multiple agents, validates them, runs processing pipelines (enrichment, indexing), and writes them to persistent storage.
- Lesson 1241 — Jaeger Architecture and Components
- Jaeger Query Service
- provides the API and UI for retrieving and visualizing traces.
- Lesson 1241 — Jaeger Architecture and Components
- Jaeger wins on scalability
- its separated components (agent, collector, query) provide better horizontal scaling and fault isolation for large-scale systems.
- Lesson 1242 — Zipkin Architecture and Design
- Java Message Service (JMS)
- specification—a Java API standard for messaging middleware.
- Lesson 671 — ActiveMQ and Traditional Enterprise Messaging
- JavaScript files
- Lesson 173 — Content Types Suited for CDNs
- Jitter
- is controlled randomization added to retry delays.
- Lesson 1024 — Adding Jitter to Prevent Thundering HerdLesson 1122 — Timeout Jitter to Prevent Thundering HerdsLesson 1695 — Fallback and Retry LogicLesson 1715 — Retry Strategies for Failed Deliveries
- Jittered reconnection delays
- so clients don't all retry at once
- Lesson 1081 — Thundering Herd After Recovery
- Job distribution
- happens automatically: each worker polls the queue and claims the next available job.
- Lesson 1605 — Distributed Worker Architecture
- JobManager
- acts as the conductor, coordinating the entire data flow application.
- Lesson 770 — Apache Flink Architecture
- Join operations
- Combining preferences with user profiles or subscription tiers.
- Lesson 1721 — Preference Storage Strategy
- JSON:API filter specification
- More flexible JSON structures passed as query values or request bodies.
- Lesson 1893 — Complex Filtering with Query Languages
- JWT claims
- itself (for stateless verification)
- Lesson 934 — RBAC Implementation PatternsLesson 990 — Tiered Rate Limits for Different User Classes
K
- Kafka
- requires you to provision, configure, monitor, and scale broker clusters yourself.
- Lesson 729 — Kinesis vs Kafka TradeoffsLesson 735 — Choosing a Streaming PlatformLesson 1675 — Pub- Sub for Real-Time Distribution
- Kafka is self-managed infrastructure
- (even on AWS MSK, you manage clusters), while **Kinesis is a fully-managed AWS service**.
- Lesson 729 — Kinesis vs Kafka Tradeoffs
- Kafka Streams
- is a lightweight library that runs within your application (no separate cluster needed).
- Lesson 744 — Stream Processing Frameworks
- Keep cold content centralized
- Store rarely-accessed media in fewer regions to minimize costs
- Lesson 1631 — Multi-Region Replication Strategy
- Keep it lightweight
- Push complex logic to services when possible
- Lesson 877 — The API Gateway Bottleneck Risk
- Keeps hot URLs cached
- Frequently accessed links naturally get touched recently, staying in cache
- Lesson 1525 — Cache Eviction Policy for URL Shortener
- Key challenges
- Lesson 258 — Resharding and Data Migration
- Key characteristics
- Lesson 672 — Redis as a Lightweight Message Broker
- Key construction
- Create a key combining the user ID and current time window (e.
- Lesson 1794 — Redis-Based Rate Limiting with INCR
- Key defensive layers
- Lesson 189 — DDoS Protection and Security at CDN Edge
- Key difference
- HLS is Apple-native and simpler; DASH is codec-agnostic and more flexible.
- Lesson 1625 — Adaptive Bitrate Streaming
- Key elements
- Lesson 1353 — Learning from Postmortems at Scale
- Key features
- Lesson 776 — Change Data Capture Tools
- Key Prefixes
- Lesson 1821 — Tenant Isolation in Redis
- Keyframe Extraction Strategy
- Lesson 1603 — Thumbnail and Preview Generation
- Keys
- must be unique strings (like `"session:abc123"` or `"cart:user_456"`)
- Lesson 338 — What is a Key-Value Store?Lesson 362 — Consistent Hashing for Key-Value StoresLesson 1459 — Clockwise Key Assignment Rule
- Keyword/Pattern Detection
- Flag common spam phrases, known malware signatures, or phishing patterns
- Lesson 1581 — Abuse Prevention and Content Moderation
- Kill switches are mandatory
- Lesson 1346 — Blast Radius and Safety Controls
- Kinesis
- abstracts infrastructure entirely.
- Lesson 729 — Kinesis vs Kafka TradeoffsLesson 735 — Choosing a Streaming Platform
- Kinesis Data Streams
- that act like Kafka topics—ordered sequences of data records that multiple applications can read independently.
- Lesson 728 — AWS Kinesis Overview
- Knowledge gaps
- the expert who wrote the procedure is unavailable
- Lesson 1441 — Runbooks and Automation
- KRaft
- (Kafka Raft) is the newer approach that eliminates ZooKeeper by implementing consensus directly within Kafka itself, using a Raft-based protocol among controller brokers.
- Lesson 704 — Brokers and Cluster ArchitectureLesson 715 — ZooKeeper vs KRaft Mode
- KRaft mode
- Lesson 715 — ZooKeeper vs KRaft Mode
- Kubernetes-native
- chaos engineering platform built on custom resources and operators.
- Lesson 1348 — Chaos Engineering Tools
L
- L1 cache
- (~32-64 KB): Fastest, closest to CPU cores, accessed in ~1 nanosecond
- Lesson 127 — CPU and Disk Caching Layers
- L2 cache
- (~256 KB-1 MB): Slightly slower, often per-core or shared between pairs
- Lesson 127 — CPU and Disk Caching Layers
- L3 cache
- (~8-32 MB): Slowest CPU cache, shared across all cores, still 10-100x faster than RAM
- Lesson 127 — CPU and Disk Caching Layers
- Label
- ground truth relevance (explicit ratings or implicit signals like clicks, engagement)
- Lesson 1756 — Machine Learning for Ranking (Learning to Rank)
- Labels
- Categories like `Person`, `Product`, or `City`
- Lesson 452 — Graph Model: Nodes and EdgesLesson 1153 — Alternative: The Grafana Loki ApproachLesson 1225 — Span Attributes and Tags
- Lag-aware load balancing
- means your load balancer (or routing layer) actively monitors each replica's current lag and makes intelligent decisions about where to send read queries.
- Lesson 218 — Lag-Aware Load Balancing
- Lake wins for
- Lesson 762 — Query Performance Tradeoffs
- Lakes
- Highly compressed files (Parquet with Snappy/Gzip) minimize storage bills.
- Lesson 763 — Cost and Storage Efficiency
- Language freedom
- Teams choose the best tool for their domain without networking constraints
- Lesson 833 — Polyglot Microservices Support
- Language lock-in
- Libraries require per-language implementations; service meshes work with any language
- Lesson 830 — Service Mesh vs Library-Based Solutions
- Language-Specific Tokenization
- Different languages break words differently.
- Lesson 1768 — Typeahead for Multi-Language Support
- Large scale (10,000 req/sec)
- 180ms × 10,000 = 1,800 seconds (30 minutes!
- Lesson 276 — Why Query Optimization Matters at Scale
- Large videos
- (~hundreds of MB or GB): Asynchronous processing is essential.
- Lesson 1598 — Synchronous vs Asynchronous Processing
- Large-scale analytics workloads
- If you need both operational random access AND analytical batch processing on the same data, HBase/BigTable bridge that gap.
- Lesson 442 — When to Use HBase or BigTable
- Large-Scale ETL Jobs
- Lesson 738 — Batch Processing Use Cases
- Larger payload
- JWT can be hundreds of bytes vs small session ID
- Lesson 916 — Session vs Token Tradeoffs
- Last fetch time
- Timestamp of the most recent successful request
- Lesson 1848 — Politeness Table and Per-Host State
- last-write-wins
- for disposable or easily-regenerated data where simplicity matters.
- Lesson 368 — Conflict Resolution StrategiesLesson 370 — Distributed Key-Value Store Architectures in PracticeLesson 1389 — Conflict Resolution in Practice
- Last-Write-Wins (LWW)
- is the simplest: each write carries a timestamp, and during conflict detection, the system keeps the write with the most recent timestamp and discards the others.
- Lesson 1380 — Last-Write-Wins (LWW) Strategy
- Latency
- is the time it takes for a single request to complete — from when a user clicks "submit" to when they see a response.
- Lesson 12 — Performance Requirements: Latency and ThroughputLesson 22 — Throughput vs LatencyLesson 120 — Caching Hierarchy OverviewLesson 509 — Latency: The Hidden Cost of CAPLesson 515 — PACELC Framework ExplainedLesson 516 — The 'Else' Clause: Normal Operation TradeoffsLesson 517 — PA/EL Systems: Availability and Latency FirstLesson 518 — PC/EC Systems: Consistency Always (+18 more)
- Latency delays
- Add 5 seconds to 10% of requests to a specific service
- Lesson 858 — Fault Injection for Testing
- Latency is invisible
- CAP treats availability as binary—either you respond or you don't—but says nothing about *how fast* you respond.
- Lesson 492 — Limitations of CAP as a Framework
- Latency Overhead
- Distributed transactions require coordination across nodes.
- Lesson 336 — NewSQL TradeoffsLesson 865 — Performance Overhead: Latency and ThroughputLesson 916 — Session vs Token Tradeoffs
- Latency patterns
- Sample more slow requests to investigate performance degradation
- Lesson 1255 — Adaptive Sampling
- Latency reduction
- The SSL/TLS handshake requires multiple round-trips between client and server.
- Lesson 187 — SSL/TLS Termination at the EdgeLesson 1539 — QR Code GenerationLesson 1609 — Why CDNs Are Essential for Media HostingLesson 1616 — Geographic Routing and DNSLesson 1626 — Geolocation-Based Storage
- Latency requirements
- How quickly must insights or results be available?
- Lesson 746 — Choosing Batch vs Stream
- Latency SLO
- measures how fast requests complete.
- Lesson 1278 — Multiple SLOs for Comprehensive Coverage
- Latency spike
- The celebrity's post takes much longer to complete than a normal user's
- Lesson 1640 — Celebrity Problem in Push Models
- Latency Targets
- Users expect feed content instantly.
- Lesson 1633 — Non-Functional Requirements: Scale and Performance
- Latency Tradeoffs
- Cross-region network calls can add 50-200ms depending on distance.
- Lesson 1435 — Multi-Region Architecture for DR
- Latency vs Consistency
- Centralized stores add network hops but guarantee single source of truth
- Lesson 947 — Distributed Session Management
- Launch events
- Before releasing a new video, product page, or software update, push it to all relevant edge servers so the flood of initial requests hits warm caches.
- Lesson 184 — Cache Warming and Preloading
- Layer 4
- when you need raw speed and don't need content-based routing.
- Lesson 80 — Layer 4 vs Layer 7 Load BalancingLesson 109 — Layer 4 (Transport) Load BalancingLesson 112 — HAProxy OverviewLesson 113 — Cloud Load Balancers (AWS ELB/ALB)
- Layer 4 (Transport Layer)
- and **Layer 7 (Application Layer)** of the OSI model.
- Lesson 80 — Layer 4 vs Layer 7 Load Balancing
- Layer 7
- when you need sophisticated routing logic based on request content.
- Lesson 80 — Layer 4 vs Layer 7 Load BalancingLesson 112 — HAProxy OverviewLesson 113 — Cloud Load Balancers (AWS ELB/ALB)
- Layer 7 capable
- can inspect HTTP headers, URLs, and cookies for intelligent routing
- Lesson 111 — NGINX as a Load Balancer
- lazy deletion
- approach prevents runtime overhead during redirects while keeping the database lean.
- Lesson 1532 — Expiration and Time-to-LiveLesson 1567 — Lazy vs Eager Deletion StrategiesLesson 1812 — Lazy Deletion and Background Cleanup
- Lazy Deletion (Delete-on-Access)
- The system checks expiration only when someone tries to read a paste.
- Lesson 1567 — Lazy vs Eager Deletion Strategies
- Lazy evaluation
- Only format log messages if they'll actually be written (check log level first).
- Lesson 1133 — Logging Performance Impact
- Lazy expiration
- Let old keys expire naturally rather than actively cleaning them
- Lesson 977 — Algorithm Implementation Patterns
- Lazy loading
- cold branches from disk
- Lesson 1759 — Trie Space Optimization TechniquesLesson 1861 — Robots.txt Caching and Parsing
- LCS
- Better read performance, use when reads dominate and disk space is limited
- Lesson 428 — Compaction Strategies
- leader
- accepts all write operations.
- Lesson 71 — Single-Leader Replication ModelLesson 528 — Single-Leader Replication for Strong ConsistencyLesson 619 — Server States: Leader, Follower, CandidateLesson 634 — etcd: Distributed Key- Value Store with RaftLesson 635 — Consul: Service Discovery with Raft ConsensusLesson 638 — Configuration Management with Consensus
- Leader Completeness Property
- is Raft's fundamental safety rule: *if a log entry is committed in a given term, that entry will be present in the logs of all leaders for all higher-numbered terms.
- Lesson 627 — Safety: Leader Completeness PropertyLesson 628 — Election Restriction: Up-to-Date CheckLesson 629 — Log Inconsistencies and Repair
- Leader election
- Which node should coordinate operations?
- Lesson 599 — What Is Distributed Consensus?Lesson 616 — Multi-Paxos for Log ReplicationLesson 618 — Raft Overview: Understandability as a Design GoalLesson 627 — Safety: Leader Completeness PropertyLesson 707 — In-Sync Replicas (ISR)Lesson 1366 — Leader Election and Failover
- Leader election complexity
- Multi-Paxos requires efficient leader election, which isn't fully specified in basic Paxos
- Lesson 617 — Why Paxos Is Difficult in Practice
- Leader fails
- → Term effectively ends, new election starts a new term
- Lesson 620 — Terms: Logical Time in Raft
- Leader logs the change
- – The write is recorded (often as a replication log entry)
- Lesson 1365 — Single-Leader Replication Topology
- Leader propagates changes
- → Sends the update to all followers
- Lesson 71 — Single-Leader Replication Model
- Leader replica
- One broker holds the primary copy and handles all reads and writes for that partition
- Lesson 705 — Replication and Fault Tolerance
- Leader-based replication
- Like Raft, one server is the leader handling writes; followers replicate
- Lesson 633 — ZooKeeper: Coordination Service Built on Consensus
- Leaderless (Dynamo-style) replication
- Any replica can accept writes directly
- Lesson 1377 — What Are Replication Conflicts?
- leaderless replication
- model (also called peer-to-peer replication), there is no designated leader node.
- Lesson 73 — Leaderless Replication ModelLesson 1371 — Leaderless Replication (Dynamo-Style)
- Leading column matters
- index `(A, B)` helps queries filtering on `A` alone, but not `B` alone
- Lesson 278 — Index Strategy for Large Tables
- Leaf nodes
- Each key-value pair (or range of keys) is hashed
- Lesson 376 — Anti-Entropy with Merkle Trees
- Leaky bucket
- *prevents bursts* entirely.
- Lesson 966 — Token Bucket vs Leaky BucketLesson 975 — Algorithm Selection CriteriaLesson 1691 — Rate Limits per Channel
- Learn your domain
- Early in a project, service boundaries are unclear.
- Lesson 825 — Starting with a Modular Monolith
- Learners
- discover which value was chosen once a quorum of Acceptors agrees.
- Lesson 610 — The Three Roles in Paxos
- Learning to Rank (LTR)
- uses supervised machine learning to train models on historical click data, optimizing for the ranking order that maximizes user satisfaction rather than adhering to fixed mathematical formulas.
- Lesson 1756 — Machine Learning for Ranking (Learning to Rank)
- Learning-to-rank models
- Historical clicks become training labels—clicked results are positive signals, skipped ones may be negative
- Lesson 1779 — Search Analytics and Click Tracking
- Least Connections
- checks which desk has the fewest people waiting and sends the next customer there
- Lesson 87 — Least Connections AlgorithmLesson 88 — Weighted Least ConnectionsLesson 98 — What Are Health Checks?Lesson 226 — Load Distribution Across ReplicasLesson 880 — Request Routing and Load Balancing
- Least Connections Algorithm
- makes routing decisions based on *real-time server load*.
- Lesson 87 — Least Connections Algorithm
- Least Frequently Used (LFU)
- eviction policy removes cache items based on how *often* they're accessed, not how *recently*.
- Lesson 147 — Least Frequently Used (LFU)
- Least Recently Used (LRU)
- removes the item that hasn't been accessed (read or written) for the longest time.
- Lesson 146 — Least Recently Used (LRU)Lesson 1525 — Cache Eviction Policy for URL Shortener
- Least Response Time
- algorithm routes traffic to the server that can respond fastest, combining both response speed and active load.
- Lesson 92 — Least Response Time AlgorithmLesson 96 — Algorithm Selection Tradeoffs
- Leave
- remaining functionality in the monolith temporarily
- Lesson 822 — The Strangler Fig Pattern for Migration
- Legitimate traffic continues flowing
- Because the CDN has cache hits and established routing patterns, real users still access your content from edge caches while attack traffic gets filtered.
- Lesson 189 — DDoS Protection and Security at CDN Edge
- Lends
- an available connection to a request that needs database access
- Lesson 267 — What is Connection Pooling
- Length penalty
- Standard UUIDs are 36 characters; even truncated versions need ~16-22 characters to maintain low collision probability
- Lesson 1516 — Counter-Based vs UUID Approaches
- Less duplication
- Write authentication, rate limiting, and logging logic once
- Lesson 904 — BFF vs Single Gateway Tradeoffs
- Level discipline
- Use DEBUG/TRACE sparingly; keep INFO logs minimal in hot paths.
- Lesson 1133 — Logging Performance Impact
- Leveled Compaction (LCS)
- Lesson 428 — Compaction Strategies
- LFU
- tracks *how many times* each item has been accessed.
- Lesson 147 — Least Frequently Used (LFU)Lesson 152 — Adaptive Replacement Cache (ARC)Lesson 154 — Implementation Tradeoffs
- LFU (Least Frequently Used)
- Removes rarely-accessed items—better for popularity-based caching
- Lesson 355 — Redis as a CacheLesson 1525 — Cache Eviction Policy for URL Shortener
- Library-Based Solutions
- Instead of sidecar proxies, embed retry logic, circuit breaking, and observability directly into application code using libraries (like Resilience4j, Hystrix, or Polly).
- Lesson 869 — Alternatives to Full Service Mesh
- Lifecycle policies
- Automate transitions after N days without access
- Lesson 1623 — Cold Storage and Archival
- Lightweight
- Don't overload the system with expensive checks
- Lesson 1339 — Health Checks and Failure Detection
- Lightweight and fast
- handles tens of thousands of concurrent connections efficiently
- Lesson 111 — NGINX as a Load Balancer
- Limit high-cardinality dimensions
- Never use unbounded labels like `user_id`, `session_id`, or `ip_address`.
- Lesson 1210 — Cardinality Management
- Limit non-critical features
- disable recommendation engines, advanced search filters, or analytics tracking while keeping core purchase flows operational.
- Lesson 963 — Graceful Degradation with Rate Limits
- Limit tag sets
- to meaningful dimensions for debugging (service, endpoint, status_code)
- Lesson 1258 — Cardinality Explosion
- Limitations
- Cache exists per application instance only—doesn't scale across multiple servers, memory- constrained, data lost on restart
- Lesson 122 — Application-Level In-Memory CachingLesson 1373 — Chain ReplicationLesson 1791 — Single Data Center vs Distributed Setup
- Limited Engineering Resources
- Lesson 239 — When Not to Shard
- Limited impact
- If you have 8 shards and one fails, roughly 1/8 of users are affected
- Lesson 266 — Shard Failure and Partial Outages
- Limited intelligence
- Cannot route based on URLs, headers, or content
- Lesson 109 — Layer 4 (Transport) Load Balancing
- Limited network traffic
- Only one node receives data during removal
- Lesson 1461 — Removing Nodes Gracefully
- Limited Operations Team
- Service meshes require expertise to configure, troubleshoot, and operate.
- Lesson 835 — When You Don't Need a Service Mesh
- Limited scalability
- More replicas = longer write times
- Lesson 1355 — Synchronous Replication: Guarantees and Costs
- Limited scale
- One data center's capacity is your ceiling
- Lesson 1791 — Single Data Center vs Distributed Setup
- Limited test requests
- are permitted (often just one, or a small percentage)
- Lesson 1047 — The Three States: Half-Open
- Lineage tracking
- Clear documentation of where data came from and how it was transformed
- Lesson 764 — Data Governance and Quality
- Linear scaling
- becomes possible: double the instances, roughly double the capacity
- Lesson 57 — Scaling Stateless Services HorizontallyLesson 1854 — Distributed URL Deduplication
- linearizability
- .
- Lesson 484 — Consistency in CAP ContextLesson 523 — Linearizability DefinedLesson 524 — Sequential Consistency vs LinearizabilityLesson 525 — Strict SerializabilityLesson 528 — Single-Leader Replication for Strong ConsistencyLesson 541 — The Consistency SpectrumLesson 559 — Strong Consistency with QuorumsLesson 607 — Consensus vs Consistency Models
- Linearizability or strong consistency
- Every read reflects the most recent write across all nodes
- Lesson 493 — CP Systems: Prioritizing Consistency
- Linearizable reads
- from the leader: requires a heartbeat round trip to confirm leadership
- Lesson 640 — Performance Characteristics of Consensus
- Link Analysis
- Extract URLs and check them against blacklists
- Lesson 1581 — Abuse Prevention and Content Moderation
- Lists
- maintain ordered collections.
- Lesson 341 — Data Types and Value ComplexityLesson 672 — Redis as a Lightweight Message Broker
- Live notifications
- Publish user-specific events to channels like `user:123:notifications`
- Lesson 357 — Redis Pub/Sub for Real-Time Messaging
- liveness
- .
- Lesson 604 — Safety vs Liveness PropertiesLesson 609 — Paxos Safety and Liveness Guarantees
- load balancer
- ).
- Lesson 6 — Components of a System Design SolutionLesson 44 — What is Horizontal Scaling?Lesson 76 — What Is a Load Balancer?Lesson 1536 — Horizontal Scaling of Redirect ServersLesson 1552 — Initial Architecture Diagram
- load balancers
- , or **edge services**—the first components that handle incoming traffic from clients or external systems.
- Lesson 1239 — Root Span and Entry PointsLesson 1321 — Redundancy and Parallel AvailabilityLesson 1674 — Connection Management at Scale
- Load balancing
- Which server handles which request?
- Lesson 49 — Application Complexity Trade-offsLesson 832 — Service Discovery in a MeshLesson 839 — Data Plane: Proxy ResponsibilitiesLesson 840 — Data Plane: Envoy Proxy FundamentalsLesson 880 — Request Routing and Load BalancingLesson 1764 — Distributed Trie ArchitectureLesson 1864 — Stateless Worker Design
- Load balancing strategies
- Round-robin, least-connections, etc.
- Lesson 842 — Control Plane: Configuration Management
- Load Distribution
- Your primary database no longer handles both reads and writes—it focuses solely on writes.
- Lesson 220 — Read-Write Splitting FundamentalsLesson 1616 — Geographic Routing and DNSLesson 1708 — Scalability and Horizontal ExpansionLesson 1822 — Scaling Rate Limiter Horizontally
- Load Leveling
- The queue acts as a shock absorber, smoothing out bursts of incoming messages so consumers can process them at a steady, sustainable rate.
- Lesson 647 — Message Queue BasicsLesson 659 — Queue Use Cases: Work Distribution
- Load shedding
- at the recovering service (accept only partial load initially)
- Lesson 1081 — Thundering Herd After Recovery
- Load smoothing
- means using a message queue as a buffer between producers (incoming requests) and consumers (processing services).
- Lesson 649 — Load Smoothing and Backpressure
- local cache
- of recently received notification IDs (typically a hash set or sliding window buffer).
- Lesson 1714 — Client-Side DeduplicationLesson 1801 — Local Caching for Performance
- Local caching
- Cache rate limit checks for microseconds to reduce storage round-trips
- Lesson 977 — Algorithm Implementation PatternsLesson 1784 — Non-Functional Requirements: Latency and AvailabilityLesson 1793 — Centralized vs Distributed Rate Limiting
- Local Communication
- Sidecar-to-service communication uses localhost, avoiding physical network latency
- Lesson 841 — Data Plane: Performance and Latency Overhead
- Local development
- Pretty-printed or line-formatted output for human consumption
- Lesson 1166 — Human-Readable vs Machine-Parseable
- Local disk
- Fast append-only log files or snapshots
- Lesson 1849 — URL Frontier Persistence and Recovery
- Local numbering
- Better delivery using local phone numbers vs international
- Lesson 1685 — SMS Notifications
- Local Transaction 1
- Reserve hotel room → **Compensation**: Cancel hotel
- Lesson 589 — Saga Fundamentals: Local Transactions and Compensations
- Local Transaction 2
- Book flight → **Compensation**: Cancel flight
- Lesson 589 — Saga Fundamentals: Local Transactions and Compensations
- Local Transaction 3
- Charge credit card → **Compensation**: Refund card
- Lesson 589 — Saga Fundamentals: Local Transactions and Compensations
- local transactions
- , where each local transaction updates data within a single service.
- Lesson 585 — Alternative: Saga Pattern IntroductionLesson 594 — Saga Isolation and AnomaliesLesson 688 — Transactional Semantics
- Localization
- Automatically serve the right language based on user preferences without worker-level logic.
- Lesson 1701 — Template Service for Content
- Localization variants
- (English, Spanish, French versions of the same template)
- Lesson 1701 — Template Service for Content
- Localized logging
- means writing logs directly to disk on the server where your application runs.
- Lesson 1169 — Centralized vs Localized Logging
- locally
- using shared public keys, eliminating the need to call the auth service for every request.
- Lesson 950 — Auth Service Single Point of FailureLesson 1804 — Multi-Region Rate Limiting Challenges
- Location
- Typing "pizza" near downtown might prioritize "Pizza Palace on 5th Street" over generic "pizza recipes," using GPS coordinates or IP geolocation.
- Lesson 1767 — Personalized Typeahead
- Lock contention
- when multiple threads need to update the same tracking structures
- Lesson 154 — Implementation TradeoffsLesson 509 — Latency: The Hidden Cost of CAP
- Lock resources
- if voting YES, the participant promises not to roll back unilaterally
- Lesson 570 — Phase 1: Prepare Phase
- Lock timeout policies
- (one system might abort while another waits)
- Lesson 582 — Transaction Isolation Across Systems
- locks
- to prevent conflicts.
- Lesson 470 — Transaction Model and ACID in Neo4jLesson 1649 — The Celebrity Problem in Fanout
- log
- an ordered sequence of entries.
- Lesson 623 — Log Structure and EntriesLesson 1158 — Correlation IDs Across Services
- Log audits
- Periodically scan stored logs for sensitive patterns
- Lesson 1163 — Avoid Logging Sensitive Data
- Log buffering
- acts like a holding area between your application and the aggregator.
- Lesson 1155 — Log Buffering and Backpressure
- Log in background
- Write click metadata to a **message queue** (Kafka, RabbitMQ) or fast write buffer
- Lesson 1530 — Analytics and Click Tracking
- Log length as tiebreaker
- Longer log = more up-to-date
- Lesson 627 — Safety: Leader Completeness Property
- Log Matching Property
- is a crucial safety guarantee in Raft that ensures consistency across replicated logs.
- Lesson 625 — Log Matching PropertyLesson 630 — Safety Argument: Committing Entries from Current TermLesson 634 — etcd: Distributed Key-Value Store with Raft
- Log positions
- Each consensus decision corresponds to a numbered slot in the replicated log (slot 1, slot 2, etc.
- Lesson 616 — Multi-Paxos for Log Replication
- Log references, not content
- Lesson 1131 — Logging Sensitive Data: Security Concerns
- Log replication
- How the leader distributes entries to followers
- Lesson 618 — Raft Overview: Understandability as a Design Goal
- Log sanitization pipelines
- Process logs through a scrubbing layer before they reach storage systems.
- Lesson 1145 — Sensitive Data in Structured Logs
- Log Shippers/Agents
- Lightweight processes running on each service host that collect logs and forward them (e.
- Lesson 1148 — Centralized Logging Architecture
- Log the incoming timeout
- what budget did this service receive?
- Lesson 1106 — Timeout Propagation Observability
- Log the outgoing timeout
- what budget did we pass to downstream services?
- Lesson 1106 — Timeout Propagation Observability
- Log4j
- (Java): The veteran framework with hierarchical loggers, multiple appenders (file, console, syslog), and extensive configuration options.
- Lesson 1136 — Logging Libraries and Standards
- Logging & Monitoring
- The gateway captures request/response metadata, timing, and errors consistently across all services without each team implementing their own logging format.
- Lesson 876 — API Gateway as a Cross-Cutting Concern Hub
- Logging and analytics
- Losing a few log entries during a crash is acceptable
- Lesson 137 — Write-Behind: Risks and Use Cases
- Logical shards
- are data partitions defined by ranges or hash buckets of your shard key—they exist as a concept independent of physical hardware.
- Lesson 235 — Logical vs Physical Shards
- Logs
- Structured access logs capturing individual request details
- Lesson 845 — Control Plane: Telemetry CollectionLesson 1173 — Metrics vs Logs vs TracesLesson 1220 — The Problem Tracing SolvesLesson 1249 — Integrating Traces with Logs and MetricsLesson 1268 — Monitoring Data Sources: Metrics, Logs, Traces
- Long intervals
- (every 30-60 seconds) reduce overhead but mean users might hit a dead server for longer before the load balancer notices.
- Lesson 100 — Health Check Intervals and Timeouts
- Long retention periods
- Often 7 years for financial compliance
- Lesson 944 — Auditing and Compliance for Authorization
- Long timeouts
- (10s) are more forgiving but delay detection of truly failed servers.
- Lesson 100 — Health Check Intervals and Timeouts
- Long TTL possible
- Files can be cached for hours, days, or weeks
- Lesson 173 — Content Types Suited for CDNs
- Long-running processes
- Transactions span minutes to hours (e.
- Lesson 598 — Saga Frameworks and Real-World Adoption
- Long-running transactions or workflows
- that span multiple steps over minutes or hours—like filling out a multi-page form with real-time validation, or a collaborative editing session—benefit from keeping that context alive in memory rather than constantly retrieving it from external st...
- Lesson 62 — When Stateful Services Are Necessary
- Long-term quota tracking
- Decrement counters across larger windows (daily/monthly)
- Lesson 994 — Quota Management and Burst Allowances
- Long-term storage
- (years, not days)
- Lesson 1206 — Metrics Federation and Long-Term StorageLesson 1208 — Choosing a Metrics System for Your Scale
- Longer intervals
- (1–5s): Lower load, risk of burst violations
- Lesson 1802 — Synchronization Strategies for Local Caches
- Longer TTLs (15-60 minutes)
- Better performance but higher security risk if permissions change frequently.
- Lesson 942 — Caching Authorization Decisions
- Look-Aside
- gives you full control but requires more application code.
- Lesson 142 — Look-Aside vs Inline Cache Topologies
- Loose coordination
- Accept 10-20% over-limit as acceptable error in exchange for <10ms local decisions.
- Lesson 1804 — Multi-Region Rate Limiting Challenges
- Loose coupling
- means components don't know (or care) about each other's internal details.
- Lesson 38 — Design for ChangeLesson 791 — Independent Deployability
- LOUDS
- encodes the trie structure as two bit vectors: one for tree shape (using level-order traversal), another for labels.
- Lesson 1759 — Trie Space Optimization Techniques
- Low coupling
- means services depend minimally on each other, communicating through well-defined interfaces rather than sharing internal details.
- Lesson 818 — High Cohesion, Low Coupling in Service Design
- Low Latency
- End-to-end latency should remain in milliseconds, even under load.
- Lesson 699 — Event Streaming Platform RequirementsLesson 744 — Stream Processing FrameworksLesson 979 — Centralized vs Decentralized ApproachesLesson 1772 — Real-Time Index UpdatesLesson 1791 — Single Data Center vs Distributed Setup
- Low Latency Priority
- Results appear in milliseconds or seconds, not hours.
- Lesson 737 — What is Stream Processing?
- Low network overhead
- Each health check consumes bandwidth and server resources
- Lesson 100 — Health Check Intervals and Timeouts
- Low priority queue
- Marketing emails, digests, recommendations
- Lesson 1700 — Priority Queues and Urgency Levels
- Low resolution, long retention
- 5-minute or hourly aggregates kept for 1+ years—captures long-term trends cheaply
- Lesson 1270 — Monitoring Resolution and Retention Tradeoffs
- Low-latency inference
- Models must score documents in milliseconds
- Lesson 1781 — Machine Learning for Ranking
- Low-resolution (1 hour)
- Keep 1-2 years for long-term capacity planning
- Lesson 1213 — Metric Retention Policies
- Lower accuracy, lower latency
- Local counters with periodic sync via gossip protocols.
- Lesson 985 — Trade-offs: Accuracy vs Latency
- Lower cost
- Often free (open-source) or pay-per-use
- Lesson 108 — Hardware vs Software Load BalancersLesson 1436 — Active-Passive vs Active-Active DR
- Lower latency
- Users connect to nearby datacenters
- Lesson 117 — Global Server Load Balancing (GSLB)Lesson 560 — Eventual Consistency with QuorumsLesson 682 — Producer AcknowledgmentsLesson 982 — Sticky Sessions and Rate LimitingLesson 1197 — Pull vs Push Metrics Collection Models
- Lower memory footprint
- No dedicated threads sitting idle
- Lesson 1070 — Semaphore-Based Bulkheads: Limiting Concurrent Requests
- Lower Operational Complexity
- Your operations team manages one deployment unit.
- Lesson 783 — Deployment Simplicity: Monolith Advantage
- Lower priority
- Sub-millisecond global consistency (nice-to-have)
- Lesson 18 — Prioritizing Requirements Under Constraints
- Lower storage cost
- on the monitoring backend (only sending a few percentile values)
- Lesson 1186 — Summary Metrics
- Lower write latency
- – Users see faster response times
- Lesson 1356 — Asynchronous Replication: Speed and Risk
- Lowercasing
- "Search" → "search" (so "Search" matches "search")
- Lesson 1733 — Document Processing Pipeline
- Lowest latency path
- Writes propagate directly between any two leaders without routing through intermediaries
- Lesson 1369 — Multi-Leader Topologies: All-to-All
- LRU
- (Least Recently Used) tracks *when* items were last used, **LFU** tracks *how many times* each item has been accessed.
- Lesson 147 — Least Frequently Used (LFU)Lesson 152 — Adaptive Replacement Cache (ARC)Lesson 154 — Implementation Tradeoffs
- LRU (Least Recently Used)
- Removes items not accessed recently—great for hot data
- Lesson 355 — Redis as a Cache
- LRU-K
- tracks the last **K accesses** for each cache entry, not just one.
- Lesson 151 — LRU-K and Advanced LRU Variants
- LSH tables
- to quickly find candidates with low Hamming distance.
- Lesson 1855 — Near-Duplicate Detection with Simhash
- Lua Scripts
- Batch check-and-increment logic in a single Redis Lua script (atomic execution).
- Lesson 981 — Race Conditions in Distributed CountersLesson 1800 — Race Conditions and Concurrency ControlLesson 1811 — Batch Operations to Reduce Network Calls
M
- Machine Learning Model Training
- Lesson 738 — Batch Processing Use Cases
- Machine-parseable
- means consistent field names, predictable types, and structured formats (JSON, structured text).
- Lesson 1166 — Human-Readable vs Machine-Parseable
- Main index on disk
- – The larger, durable index already built from existing documents.
- Lesson 1754 — Real-Time Indexing and Updates
- Maintain backward compatibility
- when service contracts change
- Lesson 882 — Request and Response Transformation
- Maintain consistency
- one source of truth for ownership and permissions, even if storage tiers change
- Lesson 1590 — Metadata Database Design
- Maintain leadership
- Proves the leader is operational
- Lesson 624 — AppendEntries RPC: Replication Mechanism
- Maintains open connections
- that preserve state (WebSocket connections, database transactions)
- Lesson 56 — What Makes a Service Stateful
- Maintains performance
- – prevents individual partitions from becoming bottlenecks
- Lesson 1475 — Dynamic Range Splitting
- Maintenance burden
- Libraries need updates in every service; proxies update centrally through control plane
- Lesson 830 — Service Mesh vs Library-Based Solutions
- Maintenance window limits
- Often capped at X hours per month or quarter
- Lesson 1328 — Scheduled Maintenance and Availability Accounting
- Maintenance windows
- Restart a consumer service without blocking producers
- Lesson 650 — Temporal Decoupling
- Major compaction
- merges all SSTables for a tablet into one file, removing deleted entries and old versions
- Lesson 449 — Read Path and Compaction
- majority
- (quorum) of acceptors
- Lesson 613 — The Prepare PhaseLesson 614 — The Accept PhaseLesson 626 — Commitment and the Commit Index
- majority agreement
- if more than half the nodes agree, that's enough.
- Lesson 605 — Quorums and Majority AgreementLesson 636 — Consensus for Leader Election
- majority quorum
- continues operating.
- Lesson 501 — Distributed Locking Services (CP)Lesson 606 — The CAP Trade-off in Consensus
- Make additive changes first
- Add new columns as nullable or with defaults
- Lesson 265 — Schema Changes in Sharded Environments
- Making optional fields required
- Requests missing the now-mandatory field get rejected
- Lesson 1905 — Breaking vs Non-Breaking Changes
- Making required fields optional
- Relaxing requirements never breaks callers
- Lesson 1905 — Breaking vs Non-Breaking Changes
- Managed control plane
- You don't operate the control plane infrastructure yourself.
- Lesson 864 — AWS App Mesh and Cloud-Native Meshes
- Managed gateways
- offload this burden to the cloud provider.
- Lesson 900 — Open-Source vs Managed Gateway Tradeoffs
- Managed services
- provide polished, production-ready features out-of-the-box—often with better integrations for their cloud ecosystem (serverless functions, IAM, logging).
- Lesson 900 — Open-Source vs Managed Gateway Tradeoffs
- Managed/serverless
- (zero ops): SQS, SNS, Google Cloud Pub/Sub, Azure Service Bus
- Lesson 676 — Choosing Between Message Broker Technologies
- manifest file
- that tells the player: "Here are all available quality options—pick the best one based on current bandwidth.
- Lesson 1602 — Adaptive Bitrate Streaming (ABR)Lesson 1625 — Adaptive Bitrate Streaming
- Manifest generation
- Build HLS/DASH file listing all qualities with bandwidth requirements
- Lesson 1602 — Adaptive Bitrate Streaming (ABR)
- manual
- (a human promotes a replica) or **automatic** (software detects failure and promotes automatically).
- Lesson 207 — Replica Promotion and Failover BasicsLesson 1311 — Toil: The Enemy of Scale
- Manual commit
- Your application explicitly tells Kafka "I've processed up to offset 150.
- Lesson 710 — Offsets and Commit Strategies
- Manual failover
- requires human operators to detect the failure and promote a replica.
- Lesson 201 — Why Replicate: Availability and FailoverLesson 1366 — Leader Election and FailoverLesson 1437 — Failover and Failback Procedures
- Manual instrumentation
- means you explicitly create spans in your code to track custom business logic, internal functions, or application-specific workflows that frameworks can't automatically detect.
- Lesson 1224 — Automatic vs Manual Instrumentation
- Manual review storage
- Engineers can investigate patterns in failures
- Lesson 1705 — Retry and Dead Letter Queues
- Manual-ack
- Consumer explicitly sends acknowledgment after processing (safer—enables at-least-once delivery)
- Lesson 681 — Acknowledgment Mechanisms
- Manual-ack (post-processing)
- The consumer explicitly confirms after successfully processing the message
- Lesson 683 — Consumer Acknowledgment Timing
- Many applications tolerate
- brief inconsistency (social feeds, caches, recommendations)
- Lesson 532 — Why Eventual Consistency Exists
- map
- (like a dictionary or hash table), but with three coordinates instead of one:
- Lesson 444 — Data Model: Sparse, Distributed, Multi-Dimensional MapLesson 743 — Batch Processing Frameworks
- Map to Ring Position
- This hash value corresponds to a point on the ring
- Lesson 1854 — Distributed URL Deduplication
- MapReduce
- (the original framework) splits work into two phases:
- Lesson 743 — Batch Processing Frameworks
- Marketing analytics
- RPO of hours or days (historical trends less time-sensitive)
- Lesson 1411 — Defining Recovery Point Objective (RPO)
- Massive bandwidth capacity
- CDN networks handle petabytes of traffic daily across hundreds of PoPs.
- Lesson 189 — DDoS Protection and Security at CDN EdgeLesson 195 — CDN for DDoS Protection
- match
- patterns within your data—all using a JSON-like syntax that mirrors the document structure itself.
- Lesson 393 — MongoDB Query Language BasicsLesson 461 — Cypher Query Language Fundamentals
- Materialized aggregates
- When the same aggregations run repeatedly and slight delays are acceptable
- Lesson 284 — Aggregation Query Optimization
- Materialized views
- (covered in lesson 291) that handle updates automatically
- Lesson 294 — Aggregation TablesLesson 297 — Denormalization in PracticeLesson 760 — Data Warehouse Architecture
- Max lifetime
- The absolute maximum age of any connection before forced retirement and replacement.
- Lesson 272 — Connection Timeouts and Limits
- Max retries
- 3-5 attempts before giving up
- Lesson 1564 — Retrieval Error Handling and FallbacksLesson 1604 — Message Queue for Processing JobsLesson 1695 — Fallback and Retry Logic
- Max retry count
- How many times to attempt redelivery
- Lesson 684 — Negative Acknowledgments and Redelivery
- Maximum availability during failures
- → Leaderless with sloppy quorums
- Lesson 1376 — Topology Selection Tradeoffs
- Maximum compactness
- A counter of 1 billion encoded in Base62 is just 6 characters (`15ftgG`)
- Lesson 1516 — Counter-Based vs UUID Approaches
- Maximum consistency
- Set W=N, R=1 (becomes synchronous replication)
- Lesson 1361 — Quorum-Based Replication
- Maximum flexibility
- You can assign partitions arbitrarily based on access patterns, data size, or node capacity
- Lesson 1476 — Directory Partitioning Fundamentals
- Maximum Retry Attempts
- caps the number of tries:
- Lesson 1025 — Maximum Retry Attempts and Timeout Budgets
- MaxScale
- automatically inspect incoming SQL queries and route them intelligently:
- Lesson 222 — Proxy-Based Read-Write Splitting
- Maybe retry with caution
- 500 Internal Server Error (could be transient *or* a bug)
- Lesson 1026 — Retry on Which Errors
- Meaningful
- Verify real functionality, not just "process is running"
- Lesson 1339 — Health Checks and Failure Detection
- Measure actual recovery time
- from failure detection to full service restoration
- Lesson 1419 — Measuring and Testing RPO/RTO Compliance
- Measure actual restore times
- to validate your RTO assumptions
- Lesson 1430 — Backup Verification and Testing
- Measure everything
- using metrics, logs, and traces to understand system behavior
- Lesson 1307 — What is Site Reliability Engineering (SRE)?
- Medium priority
- Handle 10x growth (scalability)
- Lesson 18 — Prioritizing Requirements Under Constraints
- Medium priority queue
- Likes, comments, follower updates
- Lesson 1700 — Priority Queues and Urgency Levels
- Medium resolution, medium retention
- 1-minute intervals kept for 30 days—balances detail with historical analysis
- Lesson 1270 — Monitoring Resolution and Retention Tradeoffs
- Medium scale (1,000 req/sec)
- 180ms × 1,000 = 180 seconds wasted/second
- Lesson 276 — Why Query Optimization Matters at Scale
- Medium-resolution (1-5 minutes)
- Keep 30-90 days for recent trend analysis
- Lesson 1213 — Metric Retention Policies
- Memcached
- are the two most popular distributed cache stores:
- Lesson 123 — Distributed Cache Layer (Redis/Memcached)Lesson 1523 — Caching Layer ArchitectureLesson 1664 — Timeline Caching StrategiesLesson 1702 — User Preferences Lookup
- Memory
- 50-100 MB at idle, scaling up with connection count and configuration complexity
- Lesson 867 — Resource Consumption at ScaleLesson 1264 — USE Method: Utilization, Saturation, ErrorsLesson 1796 — Sliding Window Log in RedisLesson 1815 — Sharding Rate Limit Data Across Redis InstancesLesson 1827 — Crawler Architecture Overview
- Memory Access (~100 nanoseconds)
- Lesson 21 — Latency Numbers Every Engineer Should Know
- Memory allocations
- for intermediate string buffers
- Lesson 1143 — Performance Impact of Structured Logging
- Memory available
- Use **Sliding Window Log** (stores every request timestamp)
- Lesson 975 — Algorithm Selection Criteria
- Memory efficiency
- Only needs to track the current path, not all discovered URLs at a level
- Lesson 1830 — Breadth-First vs Depth-First Crawling
- Memory exhaustion
- in your metrics system (Prometheus stores every series in RAM)
- Lesson 1211 — Avoiding High-Cardinality LabelsLesson 1887 — Why Pagination Is Essential at Scale
- Memory footprint
- Each sidecar process requires its own memory allocation, usually 50–100MB per pod at minimum.
- Lesson 865 — Performance Overhead: Latency and Throughput
- Memory limit reached
- The cache has used its allocated RAM
- Lesson 145 — What Are Cache Eviction Policies?
- Memory limits
- Maximum RAM allocation (e.
- Lesson 1072 — CPU and Memory Bulkheads: Resource QuotasLesson 1862 — Why Distribute a Web Crawler
- Memory overhead
- to track access patterns (timestamps, counters, history lists)
- Lesson 154 — Implementation Tradeoffs
- Memory per server
- – The difference between 8 GB and 16 GB affects instance type selection
- Lesson 32 — Rounding and Approximation Techniques
- Memory pressure
- Indexes and active series held in RAM grow massive
- Lesson 1207 — Metrics Cardinality and Performance ImpactLesson 1550 — Object Storage for Paste Content
- Memory quotas
- (per-tenant resource limits)
- Lesson 1067 — Bulkhead Pattern: Isolating Resources to Prevent Total Failure
- Memory Safety
- Rust's compile-time guarantees eliminate entire classes of bugs common in C/C++, resulting in fewer crashes and security vulnerabilities.
- Lesson 862 — Linkerd: Lightweight Service Mesh
- memory usage
- , **accuracy**, and **implementation complexity**.
- Lesson 970 — Fixed vs Sliding Window TradeoffsLesson 1175 — Gauge MetricsLesson 1184 — Gauge Metrics
- Memory write (MemTable)
- Writes go immediately to an in-memory structure (often a sorted tree).
- Lesson 415 — Write Path and LSM Trees
- Memory-mapped indexes
- Load the entire compressed trie into memory using mmap, enabling sub-millisecond lookups without serialization overhead.
- Lesson 1776 — Typeahead Index Optimization
- MemStore
- After the WAL confirms the write, data is written to the MemStore, an in-memory buffer for each column family.
- Lesson 436 — HBase Write Path and WAL
- Memtable
- (in-memory) — fastest
- Lesson 416 — Read Path and Bloom FiltersLesson 426 — Write Path and Commit Log
- Memtable (In-Memory Buffer)
- Simultaneously, the write is also stored in an in-memory structure called a **memtable**.
- Lesson 426 — Write Path and Commit LogLesson 448 — Write Path: MemTable and Commit Logs
- Merge
- Combine and rank both sources in real-time before returning the feed
- Lesson 1655 — Celebrity Follower CachingLesson 1772 — Real-Time Index Updates
- Merge in recent posts
- from celebrities they follow (pull operation)
- Lesson 1639 — Hybrid (Pull-Push) Feed Model
- Merge results
- combine data from multiple sources, keeping the most recent version
- Lesson 429 — Read Path and Bloom FiltersLesson 449 — Read Path and Compaction
- merged
- intelligently.
- Lesson 1383 — Application-Level Conflict ResolutionLesson 1389 — Conflict Resolution in PracticeLesson 1754 — Real-Time Indexing and Updates
- Merges
- the results before returning them to the user
- Lesson 750 — Lambda Architecture: Serving LayerLesson 1769 — Horizontal Scaling of Search Infrastructure
- Merkle tree
- is a hash tree where:
- Lesson 369 — Anti-Entropy and Merkle TreesLesson 376 — Anti-Entropy with Merkle Trees
- Merkle trees
- come in.
- Lesson 369 — Anti-Entropy and Merkle TreesLesson 370 — Distributed Key-Value Store Architectures in Practice
- Message durability
- means writing messages to disk storage before acknowledging receipt, ensuring they persist beyond memory and process failures.
- Lesson 651 — Message Durability
- Message Expiration
- Set time-to-live on messages
- Lesson 671 — ActiveMQ and Traditional Enterprise Messaging
- Message Groups
- Ensure related messages go to the same consumer
- Lesson 671 — ActiveMQ and Traditional Enterprise Messaging
- message queue
- is a data structure that stores messages from producing services and holds them until consuming services retrieve and process them.
- Lesson 647 — Message Queue BasicsLesson 1530 — Analytics and Click TrackingLesson 1595 — Thumbnail and Preview Generation TriggerLesson 1651 — Asynchronous Fanout ProcessingLesson 1698 — Message Queue for Decoupling
- Message queues
- are *task-oriented*.
- Lesson 698 — Streaming vs Message QueuesLesson 1113 — Cross-Protocol Deadline HandlingLesson 1827 — Crawler Architecture Overview
- Message Selectors
- Filter messages using SQL-like syntax
- Lesson 671 — ActiveMQ and Traditional Enterprise Messaging
- metadata
- about your sharded cluster:
- Lesson 398 — Config Servers and mongos RoutersLesson 1549 — Database Schema DesignLesson 1559 — Write Path: Synchronous vs Asynchronous Storage
- Metadata checks
- Ensuring backup catalogs are complete and accurate
- Lesson 1408 — Backup Verification and Testing
- Metadata database
- stores small, indexed records (paste ID, creation time, expiration, user ID)
- Lesson 1552 — Initial Architecture DiagramLesson 1596 — Upload Rate Limiting and QuotasLesson 1608 — Post-Processing and Metadata ExtractionLesson 1622 — Deduplication StrategiesLesson 1870 — Content Storage and Deduplication
- Metadata drift
- Object storage count diverging from database metadata count (indicates orphaned files)
- Lesson 1574 — Monitoring Expiration and Storage Health
- Metadata is shared
- All nodes know about exchanges, bindings, users, and permissions
- Lesson 668 — RabbitMQ Clustering and High Availability
- Metadata linking
- Store references in your metadata database so retrieval services know where to find each size
- Lesson 1624 — Thumbnail and Preview Generation
- Metadata requirements
- (email headers, push categories, deep link schemes)
- Lesson 1692 — Channel-Specific Formatting
- Metadata tracking
- Update your database to flag archived status
- Lesson 1623 — Cold Storage and Archival
- Metadata-rich
- Store content-type, encoding, resolution tags with each file
- Lesson 1588 — Object Storage vs Block Storage
- Metric documentation
- solves this by creating a searchable catalog where each metric includes:
- Lesson 1216 — Metric Documentation and Discovery
- Metric retention policies
- define how long you keep data at different resolutions.
- Lesson 1213 — Metric Retention Policies
- Metrics
- Request counts, latency percentiles, error rates, connection pool stats—typically exported in formats like Prometheus
- Lesson 845 — Control Plane: Telemetry CollectionLesson 890 — Logging and Metrics CollectionLesson 1173 — Metrics vs Logs vs TracesLesson 1220 — The Problem Tracing SolvesLesson 1249 — Integrating Traces with Logs and MetricsLesson 1268 — Monitoring Data Sources: Metrics, Logs, Traces
- Metrics SDK
- specifically handles creating, collecting, and exporting metrics using a vendor-agnostic API.
- Lesson 1205 — OpenTelemetry Metrics SDK
- Microsecond latency
- Critical when you're checking limits on every request
- Lesson 1807 — In-Memory vs Persistent Storage for Rate Limiting
- Microservice Decoupling
- Lesson 660 — Pub-Sub Use Cases: Event Broadcasting
- Microservices
- enable independent scaling but increase operational overhead
- Lesson 39 — Trade-offs Over Best Practices
- Microservices authorization
- Service A needs to call Service B on behalf of a user with specific scopes/permissions.
- Lesson 920 — OAuth2 Fundamentals and Use Cases
- Microservices-friendly
- Pass token between services without shared state
- Lesson 916 — Session vs Token Tradeoffs
- Middle ground
- Redis with short TTLs and accept slight over-limit bursts.
- Lesson 985 — Trade-offs: Accuracy vs Latency
- Migration phases
- Lesson 1908 — Database Schema Evolution with API Versions
- milliseconds
- that's roughly 100,000 times faster.
- Lesson 349 — Redis In-Memory Storage ModelLesson 1668 — Machine Learning for Feed Ranking
- Minimal data movement
- Only affected key ranges move, not entire partitions
- Lesson 372 — Consistent Hashing in Dynamo
- Minimal protocol overhead
- lightweight TCP connections with simple text-based commands
- Lesson 673 — NATS and Lightweight Messaging
- Minimal reshuffling
- Adding/removing nodes only affects adjacent ranges
- Lesson 1854 — Distributed URL Deduplication
- Minimal Resource Footprint
- The Linkerd proxy typically uses 10-20MB of memory per instance (compared to Envoy's 50- 100MB), making it dramatically cheaper to run at scale.
- Lesson 862 — Linkerd: Lightweight Service Mesh
- Minimal RTO
- Traffic instantly reroutes to healthy region
- Lesson 1436 — Active-Passive vs Active-Active DR
- MINOR
- New backward-compatible features—clients *can* benefit but aren't forced to change
- Lesson 1906 — Semantic Versioning for APIs
- Minutes (5-15)
- For transient errors and immediate retries
- Lesson 1712 — Deduplication Windows and Storage
- Mirrored queues
- (classic feature) or **quorum queues** (modern, Raft-based) replicate queue contents across multiple nodes:
- Lesson 668 — RabbitMQ Clustering and High Availability
- MirrorMaker 2
- (MM2) is Kafka's built-in tool for replicating topics between clusters.
- Lesson 726 — Multi-Datacenter Replication
- Miss rate
- is the opposite—requests that went to the backend.
- Lesson 166 — Monitoring Cache Performance
- Missing dependencies
- like encryption keys or configuration files
- Lesson 1430 — Backup Verification and Testing
- Mission-critical applications
- where data correctness cannot be compromised, even under scale
- Lesson 337 — When to Choose NewSQL
- Mitigation Procedures
- Lesson 1299 — Runbooks and Playbooks
- Mitigation strategies
- Lesson 1571 — Cache Invalidation on Update or Delete
- Mitigation techniques
- Lesson 1823 — Hot Tenant Problem
- Mobile app
- requests `/user/profile` → Gateway fetches full user data but returns only `{id, name, avatarUrl}`
- Lesson 875 — Client-Specific API Composition
- Mobile BFF
- Optimizes for limited bandwidth, touch interfaces, and offline capabilities
- Lesson 902 — Backend-for-Frontend (BFF) Pattern Overview
- Mobile/SPA clients
- Native apps and single-page applications that can't securely store credentials use OAuth2 flows to obtain tokens.
- Lesson 920 — OAuth2 Fundamentals and Use Cases
- Moderate complexity
- RabbitMQ clusters, Redis
- Lesson 676 — Choosing Between Message Broker Technologies
- Modern Codecs
- Lesson 1621 — Compression and Format Optimization
- Modern Protocol Support
- Layer 7 load balancing for HTTP/2, gRPC, and WebSocket connections, plus advanced features like automatic retries and traffic shadowing.
- Lesson 115 — Envoy Proxy Architecture
- Modern Protocols
- Native HTTP/2, gRPC, and WebSocket support
- Lesson 840 — Data Plane: Envoy Proxy Fundamentals
- modular monolith
- is a single deployable application that's internally organized into well-defined, loosely-coupled modules with clear boundaries.
- Lesson 790 — Modular Monoliths as Middle GroundLesson 825 — Starting with a Modular Monolith
- Modular reuse
- through references (`{{ ref('other_model') }}`)
- Lesson 774 — dbt for Analytics Engineering
- Monetizes
- your API by making higher limits a paid feature
- Lesson 990 — Tiered Rate Limits for Different User Classes
- MongoDB
- for product catalogs (flexible schema, read-heavy)
- Lesson 327 — Polyglot Persistence PatternLesson 521 — PACELC Tradeoffs in Real Systems
- Monitor actively
- Track when you're running degraded so you can fix root causes
- Lesson 1336 — Graceful Degradation
- Monitor cardinality itself
- Lesson 1207 — Metrics Cardinality and Performance Impact
- Monitor expiration
- Verify older backups in your retention period still work
- Lesson 1408 — Backup Verification and Testing
- Monitor health
- Detect when tablet servers fail and reassign their tablets to healthy servers
- Lesson 447 — Master Server and Metadata Management
- Monitor health signals
- queue depth, response times, error rates, circuit breaker states
- Lesson 1084 — Load Shedding Under Cascading Failure
- Monitor partition size
- – track row count, disk space, or request load per partition
- Lesson 1475 — Dynamic Range Splitting
- Monitoring
- Watch gateway health metrics obsessively
- Lesson 877 — The API Gateway Bottleneck RiskLesson 1262 — What is Monitoring and Why It Matters
- Monitoring and alerting
- Detect shard failures quickly to minimize user impact
- Lesson 266 — Shard Failure and Partial OutagesLesson 1656 — Fanout Failure Handling
- Monitoring and Alerting Isolation
- Lesson 1790 — Multi-Tenancy Considerations
- Monitoring and observability tools
- must now track hundreds of metrics across dozens of services instead of one application.
- Lesson 811 — Infrastructure and Tooling Costs
- Monitoring complexity
- Instead of watching one database's CPU, memory, disk I/O, and query performance, you must monitor all shards independently.
- Lesson 264 — Operational Complexity of Sharded Systems
- Monitoring happens silently
- – The breaker tracks successes and failures in the background
- Lesson 1045 — The Three States: Closed
- Monitoring signal
- High DLQ volume indicates systemic problems
- Lesson 1705 — Retry and Dead Letter Queues
- Monitoring sprawl
- You need distributed tracing to follow requests across services, aggregated logging to debug issues, and service-specific metrics
- Lesson 803 — Operational Overhead
- Monolithic Applications
- A mesh is designed for inter-service communication.
- Lesson 835 — When You Don't Need a Service Mesh
- Monotonic clocks
- never go backward.
- Lesson 1114 — Clock Skew and Time SynchronizationLesson 1799 — Handling Clock Skew Across Nodes
- Monotonic Read Consistency
- guarantees that once a client reads a particular version of data, all subsequent reads will return that version or a newer one—never an older one.
- Lesson 1391 — Monotonic Read Consistency
- Monotonic read violations
- Reading from different replicas shows inconsistent timelines
- Lesson 1358 — Replication Lag in Async Systems
- Monotonic Reads
- is a consistency guarantee that ensures: **once you've read a piece of data at a certain state, you'll never see an older version of that data in future reads.
- Lesson 210 — Monotonic Reads GuaranteeLesson 215 — Sticky Sessions and Replica AffinityLesson 535 — Monotonic ReadsLesson 537 — Writes-Follow-Reads ConsistencyLesson 541 — The Consistency SpectrumLesson 546 — Session ConsistencyLesson 1360 — Monotonic Reads Across ReplicasLesson 1364 — Choosing a Replication Mode (+1 more)
- Monotonic Reads Consistency
- guarantees that once a client reads a particular version of data, all future reads by that same client will return that version or a newer one—never an older version.
- Lesson 543 — Monotonic Reads Consistency
- Monotonic write consistency
- ensures that if a client performs write W1 followed by write W2, any replica that applies W2 has already applied W1.
- Lesson 1392 — Monotonic Write Consistency
- Monotonic writes
- is a consistency guarantee stating that if a single client performs multiple write operations, those writes will be applied to all replicas *in the same order* they were issued.
- Lesson 536 — Monotonic WritesLesson 537 — Writes-Follow-Reads ConsistencyLesson 544 — Monotonic Writes Consistency
- Monotonic Writes Consistency
- ensures that a single client's write operations are applied to all replicas in the exact order they were issued, preventing out-of-order updates.
- Lesson 544 — Monotonic Writes Consistency
- Monthly cost
- Lesson 30 — CDN Bandwidth and Cost Estimation
- More network traffic
- Leader sends AppendEntries to all followers
- Lesson 639 — Consensus Cluster Sizing Tradeoffs
- MTBF
- = how long it stays on before burning out (e.
- Lesson 1325 — Availability Formula: MTBF and MTTR Relationship
- MTTR
- = how long it takes you to replace it (e.
- Lesson 1325 — Availability Formula: MTBF and MTTR Relationship
- Multi-backend support
- Export to multiple platforms simultaneously
- Lesson 1205 — OpenTelemetry Metrics SDK
- Multi-burn-rate
- means calculating *how fast* you're burning your error budget relative to your SLO target.
- Lesson 1289 — Multi-Window and Multi-Burn-Rate Alerting
- Multi-Column (Composite) Index
- A single index on `(user_id, status)` together.
- Lesson 280 — Index Merge and Multi-Column Indexes
- multi-datacenter replication
- , provides tunable consistency knobs, and uses vector clocks (or dotted version vectors in newer versions) for conflict resolution.
- Lesson 370 — Distributed Key-Value Store Architectures in PracticeLesson 726 — Multi-Datacenter Replication
- Multi-entity atomicity
- Operations affecting several related records (orders + inventory + payments)
- Lesson 322 — Transaction Requirements and Trade-offs
- Multi-hop reasoning
- "Find medications that treat diseases caused by viruses discovered after 2000"
- Lesson 475 — Knowledge Graphs and Semantic Networks
- Multi-leader replication
- breaks this constraint—multiple nodes can independently accept writes at the same time, then synchronize changes with each other.
- Lesson 1367 — Multi-Leader Replication FundamentalsLesson 1377 — What Are Replication Conflicts?
- Multi-Provider
- Route high-priority emails through one vendor, bulk through another
- Lesson 1690 — Channel Provider Abstraction
- Multi-Region Deployment
- Lesson 950 — Auth Service Single Point of FailureLesson 1331 — Redundancy FundamentalsLesson 1334 — Geographic Redundancy and Multi-Region
- Multi-Shard Transactions
- Lesson 238 — Query Limitations in Sharded Systems
- Multi-Subscriber Support
- Unlike point-to-point queues, multiple independent consumers must read the same stream without interfering.
- Lesson 699 — Event Streaming Platform Requirements
- Multi-tenancy
- where CustomerA and CustomerB share infrastructure but must never access each other's data
- Lesson 860 — Multi-Cluster and Multi-Tenancy
- Multi-Tenancy Built-In
- Pulsar natively supports hierarchical organization:
- Lesson 730 — Apache Pulsar Architecture
- Multi-tenant app
- Shard key = `(tenant_id, user_id)` → keeps tenant data together while balancing users within.
- Lesson 245 — Composite Shard Keys
- Multi-tier caching
- introduces multiple levels of cache between users and your origin storage, each with a specific purpose:
- Lesson 1611 — Multi-Tier Caching Architecture
- Multi-tier setups
- HAProxy instances at different layers (edge, internal services)
- Lesson 112 — HAProxy Overview
- Multi-version maintenance
- You must maintain parallel codebases or clever abstraction layers
- Lesson 1899 — URI Versioning (Path-Based)
- Multi-window
- means observing your error budget consumption across *several* time periods simultaneously (e.
- Lesson 1289 — Multi-Window and Multi-Burn-Rate Alerting
- Multiple client types
- (web, mobile, IoT) needing different data shapes or protocols from the same backend services
- Lesson 879 — When to Introduce an API Gateway
- Multiple clusters
- in different AWS regions for disaster recovery
- Lesson 860 — Multi-Cluster and Multi-Tenancy
- Multiple independent systems
- need to react to the same event
- Lesson 664 — Choosing Between Queue and Pub-Sub
- Multiple perspectives
- Different consumers can process the same events at different times for different purposes—one for real-time alerts, another for weekly reports.
- Lesson 695 — Stream Retention and Replay
- Multiple Script Support
- Store romanized versions (transliterations) alongside native scripts.
- Lesson 1768 — Typeahead for Multi-Language Support
- multiple servers
- , each node only sees its own requests.
- Lesson 978 — Why Distributed Rate Limiting Is HardLesson 1793 — Centralized vs Distributed Rate Limiting
- Multiple sizes
- Create several variants (small, medium, large) to support different UI contexts—grid views, detail pages, mobile screens
- Lesson 1624 — Thumbnail and Preview Generation
- Multiple Thumbnail Sizes
- Lesson 1603 — Thumbnail and Preview Generation
- must
- update the card catalog entry *and* place the book on the shelf before confirming the donation is complete.
- Lesson 134 — Write-Through Caching PatternLesson 495 — Consistency vs Availability in PracticeLesson 571 — Phase 2: Commit PhaseLesson 614 — The Accept PhaseLesson 1309 — Error Budgets: Balancing Reliability and VelocityLesson 1789 — Client-Side vs Server-Side Rate Limiting
- Must-have
- Users can upload and view photos (functional requirement)
- Lesson 18 — Prioritizing Requirements Under Constraints
- Mutations
- Write operations for creating or modifying data (e.
- Lesson 1912 — GraphQL Schema and Resolvers
- Mutual TLS (mTLS)
- extends standard TLS by requiring *both* client and server to present valid certificates and verify each other's identity.
- Lesson 851 — Mutual TLS (mTLS) AuthenticationLesson 953 — Service-to-Service Authentication
N
- N replicas
- , you typically define:
- Lesson 555 — What is a Quorum?Lesson 1371 — Leaderless Replication (Dynamo-Style)
- N+1 query problem
- multiplies network round-trips and database operations.
- Lesson 405 — When Joins Are Required
- N+1 redundancy
- means if you need **N** components to handle your workload, you provision **N+1** — one extra.
- Lesson 1333 — N+1 and N+2 Redundancy
- N+2 redundancy
- takes it further: you provision **N+2** components, surviving up to two simultaneous failures.
- Lesson 1333 — N+1 and N+2 Redundancy
- Namespace Reuse
- A 6-character base62 code gives you 56 billion combinations.
- Lesson 1504 — Link Expiration and Retention Policies
- Namespaces and Hierarchy
- Lesson 1209 — Metric Naming Conventions
- Namespacing
- means combining the idempotency key with additional scope identifiers to create a unique composite key:
- Lesson 1017 — Idempotency Key Scope and Namespacing
- Naming conventions
- (already established) ensure consistency
- Lesson 1216 — Metric Documentation and Discovery
- Native format storage
- Data stays in its original form (Parquet, JSON, logs, images)
- Lesson 758 — Data Lake Fundamentals
- native graph database
- , meaning graphs aren't simulated on top of tables or documents—they're the fundamental storage structure.
- Lesson 460 — Neo4j Architecture OverviewLesson 477 — Index-Free Adjacency and Physical Storage
- Native Graph Storage Engine
- stores nodes, relationships, and properties as separate, fixed-size records on disk.
- Lesson 460 — Neo4j Architecture Overview
- Native integration
- App Mesh integrates directly with AWS CloudMap for service discovery, AWS Certificate Manager for mTLS certificates, and CloudWatch for metrics—no separate components to wire together.
- Lesson 864 — AWS App Mesh and Cloud-Native Meshes
- NATS
- focuses on simplicity and performance for cloud-native applications, offering both request-reply and pub-sub with minimal overhead.
- Lesson 665 — Overview of Message Broker LandscapeLesson 673 — NATS and Lightweight Messaging
- Natural data locality
- Related keys (timestamps, alphabetical names) cluster together
- Lesson 1451 — Range-Based Partitioning
- Natural keys
- (`user_id`, `order_id`): Per-key ordering, risk of skew
- Lesson 703 — Partitioning Strategies and Key Selection
- Natural ordering
- Applications that need sorted iteration (leaderboards, time-series analysis, pagination) benefit enormously.
- Lesson 1471 — Range Partitioning Fundamentals
- Natural testing
- Both regions constantly validated under real load
- Lesson 1436 — Active-Passive vs Active-Active DR
- Natural TTL support
- Keys expire automatically, no manual cleanup needed
- Lesson 1807 — In-Memory vs Persistent Storage for Rate Limiting
- naturally idempotent
- they're safe to repeat by their very nature.
- Lesson 1006 — Natural Idempotency vs Engineered IdempotencyLesson 1009 — HTTP Methods and Natural Idempotency
- Need user-priority differentiation
- **Multi-Tier** or **Priority Queues**
- Lesson 975 — Algorithm Selection Criteria
- Nest resources
- when the sub-resource **cannot exist without** or is **tightly owned by** the parent:
- Lesson 1878 — Nested Resources and Sub-Resources
- Nested structures
- Documents naturally support hierarchies.
- Lesson 380 — Document Structure and Schema FlexibilityLesson 404 — Mobile and IoT Backend Storage
- Netflix Conductor
- is an orchestration engine that defines sagas as JSON workflows with tasks.
- Lesson 598 — Saga Frameworks and Real-World Adoption
- Network
- 60% bandwidth used, retransmit queue growing (saturation), packet drops (errors)
- Lesson 1264 — USE Method: Utilization, Saturation, Errors
- Network Bandwidth
- The mesh control plane continuously pushes configuration updates to sidecars.
- Lesson 834 — Service Mesh Performance OverheadLesson 1252 — Sampling Strategies OverviewLesson 1403 — Incremental BackupsLesson 1421 — Full Backup Strategy
- Network calls are unreliable
- What was once a guaranteed in-memory function call can now timeout, fail mid-request, or succeed but never return a response.
- Lesson 802 — Distributed System Complexity
- Network conditions
- Lesson 176 — Geographic Routing and Anycast
- Network congestion
- Transferring gigabytes of JSON over HTTP takes minutes, not milliseconds.
- Lesson 1887 — Why Pagination Is Essential at Scale
- Network delay
- Changes must travel over the network from primary to replicas.
- Lesson 208 — Replication Lag: What It Is and Why It Happens
- Network delays
- CDN purges can take seconds to propagate globally
- Lesson 163 — Multi-Level Cache Invalidation
- Network Dependency
- Unlike standalone SQL databases, NewSQL systems rely heavily on network reliability and bandwidth.
- Lesson 336 — NewSQL Tradeoffs
- Network failures
- Communication between shards can fail mid-transaction
- Lesson 261 — Distributed Transactions Across ShardsLesson 271 — Connection Validation and Stale ConnectionsLesson 566 — What is a Distributed Transaction?
- network latency
- .
- Lesson 641 — Consensus in Multi-Region DeploymentsLesson 979 — Centralized vs Decentralized Approaches
- Network locality
- – Child nodes can be geographically grouped near their parent
- Lesson 1374 — Tree Replication Topology
- Network overhead
- Shipping spans to collectors saturates bandwidth
- Lesson 1228 — Trace Sampling FundamentalsLesson 1791 — Single Data Center vs Distributed Setup
- network partition
- occurs (communication breaks between nodes), you must choose: wait for consistency (sacrificing availability) or respond immediately with potentially stale data (sacrificing consistency).
- Lesson 481 — What CAP Theorem StatesLesson 482 — Why Partitions Are InevitableLesson 486 — Partition Tolerance ExplainedLesson 506 — CAP in Normal Operation vs PartitionLesson 526 — The Cost of Strong ConsistencyLesson 565 — Quorum Trade-offs and Failure ScenariosLesson 1340 — Split-Brain Problem
- Network partition recovery
- Even extended outages typically resolve within hours
- Lesson 1012 — Idempotency Key Expiration Strategy
- Network partition tolerance
- If your primary region loses internet connectivity, you can still restore from a remote backup location.
- Lesson 1429 — Geographic Backup Distribution
- Network partitions
- Nodes may be unable to communicate with each other
- Lesson 608 — The Problem Paxos SolvesLesson 988 — Testing Distributed Rate LimitersLesson 1342 — Testing Redundancy with Fault InjectionLesson 1377 — What Are Replication Conflicts?
- Network Serialization
- Data must be marshaled from your service into the proxy, then unmarshaled, then re-marshaled to send to the next hop.
- Lesson 841 — Data Plane: Performance and Latency Overhead
- Network timeouts
- Request takes too long, but the service is actually working
- Lesson 1020 — Why Retries Are Necessary in Distributed Systems
- Network/resource heavy
- – reading and writing all data stresses infrastructure
- Lesson 1402 — Full Backups
- never
- have unlimited resources.
- Lesson 18 — Prioritizing Requirements Under ConstraintsLesson 437 — HBase Read Path and Bloom FiltersLesson 609 — Paxos Safety and Liveness GuaranteesLesson 925 — Client Credentials FlowLesson 1573 — Handling Never-Expiring Pastes
- Never break anything
- accumulate technical debt forever, supporting legacy behaviors that slow innovation
- Lesson 1898 — Why API Versioning Matters
- Never log
- passwords, authentication tokens, full credit card numbers, encryption keys, or session tokens.
- Lesson 1160 — Security and Access Control for Logs
- new
- executes the operation and stores the key with the result
- Lesson 1003 — Idempotency KeysLesson 1004 — Server-Side State for IdempotencyLesson 1010 — Idempotency Keys for POST Requests
- New entry requested
- Application tries to cache additional data
- Lesson 145 — What Are Cache Eviction Policies?
- News feed
- You can tolerate stale reads for speed.
- Lesson 520 — Practical PACELC Analysis for Design Decisions
- News feed generation
- Eventual consistency (W=ONE, R=ONE) prioritizes speed over freshness
- Lesson 563 — Tunable Consistency in Practice
- Next allowed fetch time
- Computed as `last_fetch_time + crawl_delay`, prevents fetching too soon
- Lesson 1848 — Politeness Table and Per-Host State
- no
- , either wait briefly for it to catch up, or route to the primary (or a more up-to-date replica)
- Lesson 216 — Timestamp-Based Consistency ChecksLesson 351 — Redis Persistence: AOF LogsLesson 957 — Rate Limiting vs Throttling
- No boundary problem
- A user can't exploit window edges to exceed limits (100 requests at 12:59:59, then 100 more at 13:00:00)
- Lesson 968 — Sliding Window Log
- No bursts allowed
- Use **Fixed Window Counter** or **Concurrency Limiter** for strict limits
- Lesson 975 — Algorithm Selection Criteria
- No cascade operations
- Deleting a customer won't automatically delete their orders on another shard
- Lesson 262 — Referential Integrity Across Shards
- No causality awareness
- LWW can't distinguish "B happened after seeing A" from "A and B happened independently"
- Lesson 1381 — Limitations of Last-Write-Wins
- No clear pattern
- → FIFO or Random (simpler, often "good enough")
- Lesson 153 — Choosing an Eviction Policy
- No complex routing
- pure subject-based pub-sub (like `orders.
- Lesson 673 — NATS and Lightweight Messaging
- No consistency
- (local memory): fastest, but limits multiply by server count
- Lesson 976 — Rate Limiting State Storage
- No coordination
- needed between servers
- Lesson 1512 — Random String GenerationLesson 1854 — Distributed URL Deduplication
- No coordination needed
- Consumers don't need to know about each other
- Lesson 661 — Competing Consumers Pattern
- No coordination overhead
- means no performance penalty for adding instances
- Lesson 57 — Scaling Stateless Services Horizontally
- No coordination tax
- Everyone works in the same codebase with shared context
- Lesson 820 — When a Monolith is the Right Choice
- No distributed coordination
- – Each node tracks its own users independently
- Lesson 982 — Sticky Sessions and Rate Limiting
- No distributed locking
- Each step commits immediately in its own database
- Lesson 585 — Alternative: Saga Pattern Introduction
- No distributed locks needed
- Since events are immutable and append-only, there's no cross-system coordination during writes
- Lesson 586 — Alternative: Event Sourcing for Consistency
- No duplicate cross-service work
- Each service processes the event exactly once (though retries within a service are possible)
- Lesson 663 — Hybrid Patterns: Topic + Queue
- No duplicates allowed
- Each identifier must be unique across all records
- Lesson 299 — Primary Keys and Entity Integrity
- No duplication
- of authentication code across dozens of services
- Lesson 883 — Authentication at the Gateway
- No enduring value
- the problem returns—fixing it once doesn't prevent future occurrences
- Lesson 1311 — Toil: The Enemy of Scale
- No expired entries
- TTL-based removal isn't enough to free space
- Lesson 145 — What Are Cache Eviction Policies?
- No fanout computation
- – no need to write to millions of feed timelines
- Lesson 1647 — Fanout-on-Read (Pull Model)
- No health awareness
- DNS doesn't know if a server is down; clients may receive IPs of failed servers
- Lesson 116 — DNS-Based Load Balancing
- No hotspots
- Unlike range-based partitioning, celebrity URLs don't concentrate on one shard
- Lesson 1541 — Sharding and Database Scaling
- No library maintenance
- No need to update SDKs in five different languages when you change retry policy
- Lesson 833 — Polyglot Microservices Support
- No manual expiration needed
- The algorithm self-adjusts based on actual access patterns
- Lesson 1525 — Cache Eviction Policy for URL Shortener
- No message persistence
- by default—messages exist only while in flight
- Lesson 673 — NATS and Lightweight Messaging
- No metadata to synchronize
- Every server can independently compute `jump_hash(key, num_nodes)` and get the same answer.
- Lesson 1467 — Jump Hash: Stateless Alternative
- No nulls allowed
- Every record must have a valid identifier
- Lesson 299 — Primary Keys and Entity Integrity
- No perfect failure detection
- You can't tell if a node crashed or is just slow
- Lesson 608 — The Problem Paxos Solves
- No predefined schema
- You don't declare columns upfront.
- Lesson 380 — Document Structure and Schema Flexibility
- No query optimization
- Document stores lack the sophisticated query planners found in relational databases that can rewrite queries, choose optimal join strategies, or parallelize operations efficiently.
- Lesson 408 — Query Performance Limitations
- No random page access
- You can't jump directly to page 47; you must traverse sequentially
- Lesson 1890 — Keyset Pagination
- No shared memory
- means no conflicts between instances
- Lesson 57 — Scaling Stateless Services Horizontally
- No single coordinator
- Each shard is a separate database that can't directly enforce atomicity with other shards
- Lesson 261 — Distributed Transactions Across Shards
- No stale data
- You'll never see outdated information, even during failures
- Lesson 493 — CP Systems: Prioritizing Consistency
- No stale reads
- You never read old data after a write has been acknowledged
- Lesson 484 — Consistency in CAP Context
- No synchronization overhead
- Direct in-memory counter updates
- Lesson 1791 — Single Data Center vs Distributed Setup
- No upfront validation
- Any format, any source, dumped into storage
- Lesson 764 — Data Governance and Quality
- No write amplification
- – When Bob posts, the system doesn't pre-generate feeds for his 1M followers
- Lesson 1637 — Pull (Read-Time) Feed Model
- Node added
- Existing nodes must reduce their quotas; the new node claims its share
- Lesson 984 — Quota Sharding Across Nodes
- Node C
- might lose network connection and not know what happened
- Lesson 567 — The ACID Problem in Distributed Systems
- Node removed
- Remaining nodes must increase their quotas to maintain the global limit
- Lesson 984 — Quota Sharding Across Nodes
- nodes
- onto a conceptual ring (imagine a clock face numbered 0 to 2^160).
- Lesson 362 — Consistent Hashing for Key-Value StoresLesson 451 — What is a Graph Database?Lesson 452 — Graph Model: Nodes and EdgesLesson 453 — Property Graphs vs RDF TriplesLesson 458 — Use Cases: Fraud Detection and Knowledge GraphsLesson 462 — Creating Nodes and RelationshipsLesson 938 — Relationship-Based Access Control (ReBAC)Lesson 1459 — Clockwise Key Assignment Rule
- Non-breaking changes
- allow old clients to continue functioning without modification while new clients can adopt enhancements.
- Lesson 1905 — Breaking vs Non-Breaking Changes
- Non-critical data
- (user timelines, recommendations, view counts): Use AP strategies.
- Lesson 502 — Mixed Strategies: Hybrid Systems
- Non-critical requests
- (analytics, recommendations, notifications): aggressive throttling or temporary rejection
- Lesson 995 — Graceful Degradation Through ThrottlingLesson 1084 — Load Shedding Under Cascading Failure
- Non-functional
- "Serve food within 15 minutes," "Handle 100 customers simultaneously," "Stay open 24/7," "Keep food costs under budget"—these describe *how well* it performs
- Lesson 9 — Functional vs Non-Functional: Core Distinction
- Non-functional requirements
- describe *how well* the system performs its job—the quality attributes that matter but aren't features themselves.
- Lesson 9 — Functional vs Non-Functional: Core DistinctionLesson 13 — Scalability Requirements: Growth Expectations
- non-idempotent
- .
- Lesson 1000 — Idempotent vs Non-Idempotent OperationsLesson 1002 — The Double-Charge Problem
- Non-idempotent operations
- Lesson 998 — What is Idempotency?
- Non-retriable failures
- 400 Bad Request, 401 Unauthorized—these are *your* fault, not the downstream service's.
- Lesson 1057 — Failure Detection and Counting
- Normal operation
- Your distributed database can be both consistent *and* available
- Lesson 504 — Why 'Choose Two' is OversimplifiedLesson 517 — PA/EL Systems: Availability and Latency First
- Normalization
- reduces data redundancy but may slow down read-heavy queries
- Lesson 39 — Trade-offs Over Best PracticesLesson 1738 — Query Processing Flow
- Normalize status codes
- into standard states: `sent`, `delivered`, `read`, `failed`, `bounced`
- Lesson 1693 — Delivery Receipt Tracking
- Normalized
- Optimizes for writes and data integrity—one update affects one row
- Lesson 289 — Normalized vs Denormalized Schema Design
- NoSQL databases
- often require specialized knowledge.
- Lesson 326 — Operational Complexity ConsiderationsLesson 327 — Polyglot Persistence Pattern
- NoSQL requires
- Lesson 323 — Query Pattern Complexity Analysis
- NOT
- in this file.
- Lesson 437 — HBase Read Path and Bloom FiltersLesson 907 — BFF Anti-Patterns and PitfallsLesson 1750 — Boolean Queries and Operators
- Not CAP Available
- = A branch closes its doors (refuses requests) until the courier connection is restored
- Lesson 485 — Availability in CAP Context
- Not complete safety
- Still vulnerable if both primary and that one replica fail simultaneously
- Lesson 217 — Semi-Synchronous Replication Trade-offs
- Not idempotent
- `POST /users` creating a new user without idempotency keys — each call creates a duplicate user with a new ID.
- Lesson 1008 — What Makes an API IdempotentLesson 1875 — HTTP Methods: GET, POST, PUT, DELETE Semantics
- Not idempotent (naturally)
- `POST /wallet/charge` for $10 — each retry charges another $10.
- Lesson 1008 — What Makes an API Idempotent
- Not retry immediately
- the operation took too long, so instant retry will likely fail again
- Lesson 1115 — Deadline Exceeded Error Handling
- Not strictly RESTful
- Purists argue the resource (`/users/123`) shouldn't change identity based on representation
- Lesson 1899 — URI Versioning (Path-Based)
- Notification delivery
- Does PagerDuty/Slack/email actually arrive?
- Lesson 1295 — Testing Alerts and Dry Runs
- Notification payload
- Include deep-link data so tapping opens the specific post
- Lesson 1681 — Mobile Push Notification Integration
- notification service
- email logic (written in Python)
- Lesson 812 — Developer Cognitive LoadLesson 1067 — Bulkhead Pattern: Isolating Resources to Prevent Total Failure
- Notification state tracking
- means storing each notification's lifecycle status in a database so you can monitor, debug, and audit your entire notification pipeline.
- Lesson 1706 — Notification State Tracking
- Notifications
- are informational messages that provide awareness but don't require urgent response.
- Lesson 1285 — Alert vs Notification
- NTP synchronization
- Keep all servers synchronized using Network Time Protocol, reducing skew to milliseconds.
- Lesson 949 — Clock Skew and Token ValidationLesson 1114 — Clock Skew and Time Synchronization
O
- O(1)
- (constant-time) traversal performance per relationship.
- Lesson 455 — Index-Free AdjacencyLesson 477 — Index-Free Adjacency and Physical Storage
- O(1) memory per user
- instead of O(N) for every request, while maintaining ~99% accuracy of the full sliding window log.
- Lesson 1797 — Sliding Window Counter with Redis
- O(m)
- where m = prefix length, independent of total vocabulary size.
- Lesson 1758 — Trie Data Structure for Prefix Matching
- O(total_records)
- to **O(page_size)** — constant and predictable.
- Lesson 1887 — Why Pagination Is Essential at Scale
- OAUTHBEARER
- Token-based authentication for modern systems
- Lesson 727 — Kafka Security: Authentication and Encryption
- Object detection
- identifies weapons, drugs, or prohibited items
- Lesson 1629 — Content Moderation at Scale
- Object storage
- (like S3) holds the actual paste content—your large, blob data
- Lesson 1552 — Initial Architecture DiagramLesson 1588 — Object Storage vs Block StorageLesson 1593 — Distributed File System Considerations
- Object Storage (S3/Blob)
- stores the actual paste content using the paste ID as the key
- Lesson 1556 — Hybrid Storage: Metadata + Content References
- Object Storage (S3/GCS)
- Cheapest for long-term retention but slow query performance; good for compliance archives
- Lesson 1245 — Trace Storage Backends
- Object storage excels at
- Lesson 1556 — Hybrid Storage: Metadata + Content References
- ObjectId
- A 12-byte unique identifier MongoDB generates automatically for document `_id` fields
- Lesson 390 — BSON Format and Data Types
- Observability
- automatic metrics, logs, and traces
- Lesson 827 — What is a Service Mesh?Lesson 838 — Data Plane: Sidecar Proxy PatternLesson 1313 — Monitoring and Observability for SRE
- Observability Integration
- The mesh automatically emits metrics when timeouts fire, correlating them with service topology and request traces.
- Lesson 1126 — Timeout Configuration in Service Mesh
- Observability tools
- (monitoring, logging, alerting for their services)
- Lesson 794 — Team Autonomy and Ownership
- OData query syntax
- Microsoft's protocol uses URL-encoded expressions:
- Lesson 1893 — Complex Filtering with Query Languages
- Off-peak timing
- Maintenance during low-traffic hours minimizes user impact even if excluded from SLA
- Lesson 1328 — Scheduled Maintenance and Availability Accounting
- Offline users
- Skip real-time push, they'll pull on next app open
- Lesson 1676 — Presence Detection and User StatusLesson 1681 — Mobile Push Notification Integration
- On cache hit
- Return the target URL immediately (fast path!
- Lesson 1524 — Cache-Aside Pattern for URL Lookups
- On cache miss
- Fetch from database, populate the cache with the result, then return to user
- Lesson 1524 — Cache-Aside Pattern for URL Lookups
- On subsequent requests
- The cached entry serves future requests instantly
- Lesson 1524 — Cache-Aside Pattern for URL Lookups
- On write
- Update database immediately, then invalidate or update the cache entry.
- Lesson 1722 — Real-Time Preference Updates
- on-demand
- when data is actually requested.
- Lesson 131 — Cache-Aside (Lazy Loading) PatternLesson 183 — Pull vs Push CDN ModelsLesson 1539 — QR Code Generation
- Onboarding
- Assign one role instead of configuring dozens of permissions
- Lesson 933 — Role-Based Access Control (RBAC) Fundamentals
- once
- .
- Lesson 753 — Kappa Architecture: Single Processing PathLesson 1883 — Error Response Structure and Consistency
- one
- database, that database becomes your bottleneck—it hits hardware limits just like vertical scaling does.
- Lesson 65 — What is Data Partitioning?Lesson 160 — Preventing Cache StampedeLesson 425 — Tunable Consistency LevelsLesson 599 — What Is Distributed Consensus?Lesson 1527 — Handling Cache Stampede on Popular URLs
- One data center
- SPOF.
- Lesson 1331 — Redundancy FundamentalsLesson 1791 — Single Data Center vs Distributed Setup
- One dominates
- (all counters ≥ the other): The higher one happened after.
- Lesson 562 — Version Vectors and Conflict Detection
- One Primary database
- handles ALL write operations (INSERT, UPDATE, DELETE)
- Lesson 199 — Primary-Replica Architecture
- One-time reads
- Data accessed once but never again still gets treated as "recently used"
- Lesson 151 — LRU-K and Advanced LRU Variants
- Online resharding
- migrates data while the system continues serving traffic:
- Lesson 259 — Resharding Strategies: Stop-the-World vs Online
- Online users
- Send real-time WebSocket updates immediately
- Lesson 1676 — Presence Detection and User Status
- Only one succeeds
- Due to consensus, only one client can create a specific lock resource
- Lesson 637 — Distributed Locks via Consensus
- OPA (Open Policy Agent)
- Uses Rego language—more developer-friendly, JSON/YAML-compatible.
- Lesson 936 — ABAC Policy Engines
- Open
- for a configured timeout period, it doesn't immediately close and flood the downstream service with requests.
- Lesson 1047 — The Three States: Half-OpenLesson 1050 — State Transition MechanicsLesson 1052 — Circuit Breaker Reset LogicLesson 1060 — Half-Open State TestingLesson 1803 — Handling Redis Failures and Fallbacks
- Open → Half-Open
- Wait through the timeout window, confirm the breaker allows test requests
- Lesson 1065 — Testing Circuit Breaker Behavior
- Open circuit
- All requests fail immediately—**no retries attempted**
- Lesson 1030 — Combining Retries with Circuit Breakers
- Open Graph metadata
- is a protocol (popularized by Facebook) where websites embed meta tags in their HTML:
- Lesson 1538 — Link Preview and Metadata
- Open rates
- Email might cost 10× more than push but deliver 5× better engagement
- Lesson 1694 — Channel Costs and Economics
- Open state
- After threshold failures, stop sending requests entirely (fail fast)
- Lesson 105 — Graceful Degradation and Circuit BreakingLesson 889 — Circuit Breaking and FallbacksLesson 1046 — The Three States: OpenLesson 1051 — Fast-Fail BehaviorLesson 1064 — Monitoring and Metrics
- Open-source solutions
- offer deep customization through plugins, Lua scripting, or custom filters.
- Lesson 900 — Open-Source vs Managed Gateway Tradeoffs
- Opened
- The user interacted with the notification by opening or viewing it.
- Lesson 1724 — Notification Analytics Events
- Opens
- to stop the flow (rejects requests immediately)
- Lesson 1044 — The Electrical AnalogyLesson 1064 — Monitoring and Metrics
- OpenTelemetry
- with vendor backends minimizes self-hosting burden.
- Lesson 1208 — Choosing a Metrics System for Your ScaleLesson 1240 — OpenTelemetry Overview
- OpenTelemetry Logs
- An emerging standard that unifies logs with traces and metrics, providing consistent correlation IDs and context propagation across your observability stack.
- Lesson 1136 — Logging Libraries and Standards
- Operation status
- is it processing, complete, or failed?
- Lesson 1004 — Server-Side State for Idempotency
- Operation-based CRDTs (CmRDTs)
- Replicas send operations (add, remove) that are commutative—order doesn't matter.
- Lesson 538 — Conflict-Free Replicated Data Types (CRDTs)Lesson 1384 — Conflict-Free Replicated Data Types (CRDTs)
- Operational burden
- Routine tasks like checking replication lag, rotating credentials, tuning query performance, or investigating alerts multiply by the number of shards.
- Lesson 264 — Operational Complexity of Sharded SystemsLesson 803 — Operational Overhead
- Operational control
- Libraries require redeployment for policy changes; service meshes update configurations without redeploying apps
- Lesson 830 — Service Mesh vs Library-Based Solutions
- Operational knowledge
- Teams must understand Kubernetes, service meshes, API gateways, and other distributed system tools
- Lesson 803 — Operational Overhead
- Operational learnings
- After a few incidents, you realize your 99.
- Lesson 1284 — Iterating on SLIs and SLOs
- Operational needs
- How far back do you realistically investigate incidents?
- Lesson 1165 — Log Retention Policies
- Operational overhead
- handling support tickets
- Lesson 1002 — The Double-Charge ProblemLesson 1436 — Active-Passive vs Active-Active DR
- Operational Separation
- Rate limiting requires fast, shared state (who made how many requests?
- Lesson 1782 — Rate Limiter Service Overview
- Operational Transformation
- solves this by transforming each user's operation against concurrent operations from others, preserving everyone's *intent* even when the document state has changed.
- Lesson 1385 — Operational Transformation
- Operational Transformation (OT)
- Transforms operations based on concurrent changes.
- Lesson 1579 — Collaborative Editing and Real-Time Updates
- Operations
- are basic: set a value, get a value, delete a value
- Lesson 338 — What is a Key-Value Store?Lesson 672 — Redis as a Lightweight Message BrokerLesson 727 — Kafka Security: Authentication and Encryption
- operator
- is a template for a specific type of work.
- Lesson 767 — Airflow Operators and ExecutorsLesson 1892 — Filtering Query Parameters
- Oplog (Operations Log)
- Lesson 206 — Replication Logs and Mechanisms
- Opsgenie
- , and **VictorOps** centralize alerting, escalation, and communication—but the real power comes from *automation*.
- Lesson 1305 — On-Call Tooling and Automation
- Optimistic Locking
- Use Redis `WATCH` to detect concurrent modifications and retry.
- Lesson 981 — Race Conditions in Distributed Counters
- Optimistic UI update
- The client immediately shows your post in the feed (before server confirmation), then reconciles if the server response differs.
- Lesson 1678 — Read-After-Write Consistency
- Optimized Proxy Implementations
- Modern proxies like Envoy are written in C++ and highly optimized for throughput
- Lesson 841 — Data Plane: Performance and Latency Overhead
- Optimized Strategy
- Lesson 1749 — Query Processing and Term Intersection
- Option A (CP)
- Wait 5 seconds while the system ensures all 2 billion users see consistent data before confirming your post
- Lesson 497 — Social Media and Content Feeds (AP)
- Option B (AP)
- Your post succeeds instantly, and it gradually propagates to followers over the next few seconds
- Lesson 497 — Social Media and Content Feeds (AP)
- Optional
- Personalized recommendations, reviews, related products
- Lesson 1083 — Graceful Degradation Strategies
- Optional Features
- Expiration times, syntax highlighting, privacy controls
- Lesson 1542 — Pastebin System Overview
- Orchestration
- centralizes the workflow in one orchestrator service.
- Lesson 592 — Choreography vs Orchestration Tradeoffs
- Order Management
- Handles customer orders, validation, pricing
- Lesson 815 — Domain-Driven Design and Bounded Contexts
- Order processing acceptable later
- Eventual consistency meets your needs
- Lesson 654 — When to Use Async vs Sync
- Ordered
- New entries append to the end, improving B-tree index performance
- Lesson 1520 — Primary Key Selection: Auto-Increment vs UUID
- ordering
- and **durability**—if a replica crashes, it can resume from its last known position without missing changes.
- Lesson 206 — Replication Logs and MechanismsLesson 693 — The Commit Log Abstraction
- Ordering (often)
- Many queues preserve message order, delivering them in the sequence they were sent.
- Lesson 647 — Message Queue Basics
- Ordering and Prioritization
- Not all URLs are equal.
- Lesson 1838 — URL Frontier: Definition and Purpose
- Ordering guarantees
- Sequential nodes in ZooKeeper or versioned keys in etcd ensure fairness
- Lesson 637 — Distributed Locks via ConsensusLesson 699 — Event Streaming Platform RequirementsLesson 1671 — Real-Time Requirements for Social Feeds
- Orders of magnitude checks
- – Round to nearest power of 10 (100, 1,000, 10,000)
- Lesson 32 — Rounding and Approximation Techniques
- Orders Service
- owns order records—it's responsible for order lifecycle and status
- Lesson 817 — Identifying Service Boundaries by Data Ownership
- Orders Use SQL
- Lesson 330 — Real-World Decision Examples
- Organizational Features
- Allow users to tag pastes, create folders, or assign categories.
- Lesson 1578 — User Accounts and Paste Management
- Organizational Maturity
- Do you have mature DevOps practices?
- Lesson 826 — Decision Framework for Microservices Adoption
- Organizational trust
- (freedom to make decisions within guidelines)
- Lesson 794 — Team Autonomy and Ownership
- Origin cache
- (in-memory or SSD layer before object storage) handles the final backstop, reducing actual disk/object-store reads.
- Lesson 1611 — Multi-Tier Caching Architecture
- Origin caches
- protect your object storage from direct requests
- Lesson 1611 — Multi-Tier Caching Architecture
- Origin offloading
- The cryptographic operations (encryption, decryption, certificate validation) are CPU-intensive.
- Lesson 187 — SSL/TLS Termination at the EdgeLesson 1609 — Why CDNs Are Essential for Media Hosting
- Origin overload
- Millions of requests for popular media crush your origin infrastructure
- Lesson 1609 — Why CDNs Are Essential for Media Hosting
- Origin servers
- are your application's home base—they hold the original, authoritative version of your content (images, videos, JavaScript files, etc.
- Lesson 170 — CDN Architecture: Edge Servers and OriginLesson 1630 — Live Streaming Architecture
- Origin Shield
- is an additional caching layer positioned between your edge PoPs and your origin server.
- Lesson 179 — Origin Shield: Protecting Origin ServersLesson 182 — Cache Hierarchies and Tiered CachingLesson 1569 — CDN Integration for Paste DeliveryLesson 1614 — Origin Shield PatternLesson 1630 — Live Streaming Architecture
- Origin Shield Pattern
- the shield validates tokens before hitting origin storage, preventing unauthorized access from reaching deeper layers.
- Lesson 1615 — Signed URLs and Token-Based Access
- Orphaned records
- An order might reference a deleted customer
- Lesson 262 — Referential Integrity Across ShardsLesson 300 — Foreign Keys and Referential Integrity
- Other Applications
- Lesson 739 — Stream Processing Use Cases
- Outages and downtime
- Your server crashes because it can't handle the traffic spike
- Lesson 2 — Why System Design Matters
- Outbound policies
- transform responses, add headers, or cache results
- Lesson 899 — Azure API Management Features
- Outbox pattern
- Instead of publishing directly, write the message to an "outbox" table in the same database transaction as your business data.
- Lesson 688 — Transactional Semantics
- Outlier detection
- (also called "ejection") monitors success rates and response times.
- Lesson 852 — Circuit Breaking at the Mesh Level
- Over-isolation
- Creating 20 separate thread pools for similar internal services fragments your available resources
- Lesson 1076 — Bulkhead Tradeoffs: Complexity and Resource Overhead
- Over-provisioning required
- Must buy capacity for peak traffic, even if idle 99% of the time
- Lesson 108 — Hardware vs Software Load Balancers
- Overage buffers
- (from the previous lesson): Allow small temporary violations locally, knowing global state will converge.
- Lesson 987 — Multi-Region Rate Limiting Challenges
- Overwhelm database write capacity
- , causing latency spikes or timeouts
- Lesson 1654 — Fanout Rate Limiting
- Overwhelm storage backends
- with millions of unique span combinations
- Lesson 1258 — Cardinality Explosion
- Overwrite conflicts
- Once the divergence point is found, the leader sends all missing entries from that point forward.
- Lesson 629 — Log Inconsistencies and Repair
- Ownership Index
- Lesson 1563 — Indexing for Ownership and Search
- Ownership Tracking
- Add a `user_id` foreign key to your paste metadata table.
- Lesson 1578 — User Accounts and Paste Management
P
- P0 (Critical)
- Complete service outage or major revenue loss.
- Lesson 1298 — Incident Severity Levels and Escalation
- P1 (High)
- Significant degradation affecting many users.
- Lesson 1298 — Incident Severity Levels and Escalation
- P2 (Medium)
- Partial feature failure or isolated user impact.
- Lesson 1298 — Incident Severity Levels and Escalation
- P4 (Trivial)
- Cosmetic issues, documentation fixes.
- Lesson 1298 — Incident Severity Levels and Escalation
- P95 latency
- (not average response time) → reveals tail latencies affecting users
- Lesson 1215 — Avoiding Vanity Metrics
- P95 or P99 latencies
- from production metrics (as covered in Adaptive Timeouts Based on Historical Latency).
- Lesson 1118 — Per-Operation Timeout Configuration
- P99
- 99% complete faster — but 1 in 100 naturally takes longer
- Lesson 1093 — The P99 Problem with TimeoutsLesson 1188 — Percentiles and Tail Latencies
- PA/EC
- Some DNS systems (available during partition, but consistent when stable)
- Lesson 515 — PACELC Framework ExplainedLesson 519 — PA/EC Systems: Mixed Strategies
- PA/EL
- Cassandra, DynamoDB (available during partition, low-latency during normal ops)
- Lesson 515 — PACELC Framework ExplainedLesson 517 — PA/EL Systems: Availability and Latency First
- PACELC framework
- , a **PC/EC system** makes consistency its top priority in *both* scenarios:
- Lesson 518 — PC/EC Systems: Consistency Always
- PageRank
- measures importance by assuming that connections from important nodes carry more weight.
- Lesson 468 — Graph Algorithms: PageRank and CentralityLesson 1755 — Relevance Tuning: Boosting and SignalsLesson 1844 — Front Queue: Priority Management
- PageRank or link popularity
- (more incoming links = higher priority)
- Lesson 1839 — FIFO vs Priority-Based Frontier
- PagerDuty
- , **Opsgenie**, and **VictorOps** centralize alerting, escalation, and communication—but the real power comes from *automation*.
- Lesson 1305 — On-Call Tooling and Automation
- Parallel Execution
- Workers independently fetch each batch, then write the post reference to each follower's feed cache or storage.
- Lesson 1652 — Fanout Worker Parallelization
- Parallel processing
- Score multiple documents simultaneously across CPU cores
- Lesson 1741 — Search Latency and Response Time
- Parallel Upload
- Each part uploads with its part number and upload ID
- Lesson 1586 — Multipart Upload for Large Files
- Parallelism
- Multiple consumers can read from different partitions simultaneously, dramatically increasing throughput.
- Lesson 701 — Topics and Partitions
- Parent nodes
- Pairs of child hashes are combined and hashed together
- Lesson 376 — Anti-Entropy with Merkle Trees
- Parent-child links
- show synchronous dependencies (blocking calls)
- Lesson 1232 — Span Relationships and Hierarchy
- Pareto Principle
- or **80-20 rule**: roughly 80% of your requests hit only 20% of your URLs.
- Lesson 1502 — Cache Memory Requirements
- Parse
- the text into a syntax tree
- Lesson 286 — Prepared Statements and Query CachingLesson 1151 — The ELK Stack: LogstashLesson 1732 — Crawling and Document Collection
- Parse provider-specific payloads
- to extract status information
- Lesson 1693 — Delivery Receipt Tracking
- Parse the HTML
- for Open Graph tags (og:title, og:description, og:image)
- Lesson 1538 — Link Preview and Metadata
- Partial availability
- recognizes that systems can be partially functional.
- Lesson 1329 — Partial Availability and Graceful Degradation
- Partial Failure Handling
- If one shard times out or fails, decide whether to return incomplete results or fail the query
- Lesson 1780 — Distributed Query Coordination
- Partial failures
- What if the CDN invalidation fails but application cache succeeds?
- Lesson 163 — Multi-Level Cache InvalidationLesson 566 — What is a Distributed Transaction?Lesson 1065 — Testing Circuit Breaker BehaviorLesson 1651 — Asynchronous Fanout Processing
- Partial failures become normal
- In a monolith, the whole system is either up or down.
- Lesson 802 — Distributed System Complexity
- Partial restores
- Recover individual files or database records
- Lesson 1408 — Backup Verification and Testing
- Partial Result Collection
- Each shard returns its local top-K results with scores
- Lesson 1780 — Distributed Query Coordination
- Partial Success Handling
- Lesson 1656 — Fanout Failure Handling
- Partial success tracking
- mark completed tasks so retries skip them
- Lesson 777 — Workflow Orchestration Patterns
- Participant logs
- Lesson 574 — Recovery Protocols and Logs
- partition
- (network split), these systems continue accepting reads and writes from all sides of the split, even though different replicas might temporarily disagree.
- Lesson 517 — PA/EL Systems: Availability and Latency FirstLesson 1806 — Rate Limiting with Consistent HashingLesson 1865 — Distributed URL Frontier Architecture
- Partition by request type
- Route celebrity writes through a specialized write path
- Lesson 1483 — Celebrity User Problem
- Partition Configuration
- More partitions enable higher parallelism and throughput, but increase coordination overhead and rebalancing time.
- Lesson 724 — Kafka Performance Tuning
- Partition Followers
- When a post is created, the fanout service queries the follow graph and splits the follower list into chunks (e.
- Lesson 1652 — Fanout Worker Parallelization
- partition key
- (sometimes called the row key).
- Lesson 420 — Cassandra Overview and Data ModelLesson 421 — Partitioning with Partition KeysLesson 422 — Clustering Columns and Row OrderingLesson 423 — Primary Key ComponentsLesson 728 — AWS Kinesis OverviewLesson 1472 — Range Partition Key Selection
- Partition Keys
- Shard counters by node ID, aggregate periodically—trades accuracy for reduced contention.
- Lesson 981 — Race Conditions in Distributed Counters
- Partition pruning
- organizing data by time ranges so queries only scan relevant chunks
- Lesson 760 — Data Warehouse ArchitectureLesson 1473 — Range Partitioning Benefits
- Partition quotas
- Divide the 100 requests/minute into 40 for US, 30 for EU, 30 for Asia.
- Lesson 1804 — Multi-Region Rate Limiting Challenges
- partition tolerance
- (from CAP theorem context), but you lose automatic consistency guarantees.
- Lesson 377 — Eventual Consistency and Application ReconciliationLesson 486 — Partition Tolerance ExplainedLesson 491 — CAP Theorem's Original PaperLesson 493 — CP Systems: Prioritizing ConsistencyLesson 500 — DNS Systems (AP)
- Partition-Availability / Else-Consistency
- systems that choose availability when partitions occur, but prioritize consistency over low latency when the network is healthy.
- Lesson 519 — PA/EC Systems: Mixed Strategies
- Partition-aware backfill
- Only reprocess affected date/time partitions
- Lesson 777 — Workflow Orchestration Patterns
- Partition-specific alerts
- (Partition 7's disk is 90% full, but Partition 3 is fine)
- Lesson 1492 — Operational Complexity of Partitioning
- Partitioned
- into logical chunks (user IDs 1-1000, 1001-2000)
- Lesson 1447 — Partitioning vs Sharding vs Replication
- partitioning
- and **sharding** describe the same fundamental concept—splitting data across multiple storage locations.
- Lesson 67 — Partitioning vs Sharding TerminologyLesson 68 — What is Data Replication?Lesson 70 — Partitioning and Replication TogetherLesson 652 — Message Ordering ConsiderationsLesson 1447 — Partitioning vs Sharding vs ReplicationLesson 1519 — Database Schema for URL ShortenerLesson 1746 — Index Construction at Scale
- partitions
- (shards) the keyspace.
- Lesson 360 — What Makes a Key-Value Store DistributedLesson 700 — Kafka Overview and Core ComponentsLesson 701 — Topics and Partitions
- partitions are rare
- .
- Lesson 514 — Beyond CAP: The Need for PACELCLesson 519 — PA/EC Systems: Mixed Strategies
- partitions are rare events
- , not the normal state.
- Lesson 489 — CAP During Normal OperationLesson 504 — Why 'Choose Two' is Oversimplified
- Partner API
- requests `/user/profile` → Gateway maps internal field names (`user_id` → `externalId`) and adds partner-specific metadata
- Lesson 875 — Client-Specific API Composition
- Partner API BFF
- Enforces third-party rate limits and compliance requirements
- Lesson 902 — Backend-for-Frontend (BFF) Pattern Overview
- Pass reduced timeout
- Forward the smaller budget in request headers/context
- Lesson 1119 — Timeout Budget Management Across Service Chains
- Passive checks
- have zero overhead since they only monitor existing traffic.
- Lesson 99 — Active vs Passive Health Checks
- Passive health checks
- are more observational—the load balancer watches actual user requests as they happen.
- Lesson 99 — Active vs Passive Health Checks
- Password reset tokens
- Sequential IDs make brute-force attacks trivial
- Lesson 1515 — Short URL Predictability Tradeoffs
- PATCH
- Partially updates.
- Lesson 1009 — HTTP Methods and Natural IdempotencyLesson 1906 — Semantic Versioning for APIs
- Path or method
- "Service A can POST to /orders but not DELETE"
- Lesson 854 — Request-Level Authorization
- Path-based models
- Store resource paths like `/org/proj/repo` and match prefixes
- Lesson 939 — Permission Inheritance and Hierarchies
- Path-based routing
- Route `/api/*` to one set of servers, `/images/*` to another
- Lesson 113 — Cloud Load Balancers (AWS ELB/ALB)
- Pattern matching
- Use wildcards or regular expressions
- Lesson 658 — Topic Subscriptions and FilteringLesson 667 — RabbitMQ Exchange TypesLesson 1145 — Sensitive Data in Structured LogsLesson 1892 — Filtering Query Parameters
- Paxos
- and **Raft** are battle-tested algorithms that provide this agreement.
- Lesson 527 — Consensus and Strong ConsistencyLesson 636 — Consensus for Leader ElectionLesson 638 — Configuration Management with Consensus
- Pay-per-use
- (CloudFront, Fastly): charge per GB transferred and per request
- Lesson 191 — CDN Provider Feature Comparison
- Payment processing
- (CP): Block and wait for cross-region confirmation, even if it takes 500ms, to ensure no double- charges
- Lesson 510 — Real Systems: Multi-Region Trade-offsLesson 1001 — Side Effects and Idempotency
- payment service
- charges the card, and the **shipping service** creates a label.
- Lesson 576 — When 2PC is Used in PracticeLesson 812 — Developer Cognitive LoadLesson 816 — Single Responsibility at Service LevelLesson 1067 — Bulkhead Pattern: Isolating Resources to Prevent Total Failure
- Payments
- team can deploy their fraud detection improvements on Tuesday morning while the **User Profile** team deploys avatar updates Thursday afternoon—completely independently.
- Lesson 791 — Independent Deployability
- PC
- During a partition, it sacrifices **availability** to maintain consistency
- Lesson 518 — PC/EC Systems: Consistency Always
- PC/EC
- Traditional RDBMS with sync replication (consistent always, higher latency)
- Lesson 515 — PACELC Framework Explained
- PC/EC system
- makes consistency its top priority in *both* scenarios:
- Lesson 518 — PC/EC Systems: Consistency Always
- PCollections
- Immutable datasets (bounded for batch, unbounded for streams)
- Lesson 772 — Apache Beam Programming Model
- PDP evaluates policies
- using RBAC, ABAC, or other models
- Lesson 941 — Policy Decision Points (PDP) and Enforcement Points (PEP)
- PDP returns decision
- allow or deny
- Lesson 941 — Policy Decision Points (PDP) and Enforcement Points (PEP)
- Peak traffic multiplier
- Add 2-3× buffer for traffic spikes
- Lesson 1499 — Bandwidth Requirements for Redirects
- Peak traffic multipliers
- help you account for these surges.
- Lesson 24 — Peak Traffic MultipliersLesson 26 — Bandwidth Estimation from Data Size
- PEP enforces
- proceeds or rejects the request
- Lesson 941 — Policy Decision Points (PDP) and Enforcement Points (PEP)
- PEP extracts context
- user identity, requested resource, action
- Lesson 941 — Policy Decision Points (PDP) and Enforcement Points (PEP)
- PEP queries PDP
- "Can user X perform action Y on resource Z?
- Lesson 941 — Policy Decision Points (PDP) and Enforcement Points (PEP)
- Per-account alone
- Doesn't prevent one rogue user within that account from monopolizing shared quota.
- Lesson 991 — Hierarchical Rate Limiting
- Per-dependency circuit breakers
- maintain **separate, independent circuit breakers for each downstream service**.
- Lesson 1063 — Per-Dependency Circuit Breakers
- Per-endpoint limits
- The expensive `/search` endpoint allows 100 requests/minute, while `/health` is unlimited
- Lesson 885 — Rate Limiting and Throttling
- Per-IP
- Rate limit based on the client's source IP address
- Lesson 989 — Per-User vs Per-IP Rate Limiting
- Per-key sequential consistency
- means that all operations on a *single key* appear to execute in some sequential order that all clients agree on, but operations on *different keys* can be reordered independently.
- Lesson 552 — Per-Key Sequential Consistency
- Per-Operation-Type
- Different operations (payment, refund, transfer) use separate namespaces, allowing key reuse across different actions.
- Lesson 1017 — Idempotency Key Scope and Namespacing
- Per-partition ordering
- lets you scale horizontally—use multiple partitions with multiple consumers, each processing their partition in order.
- Lesson 685 — Message Ordering Guarantees
- Per-region quotas
- Divide the global limit by number of regions (1000 req/min → 333/region).
- Lesson 987 — Multi-Region Rate Limiting Challenges
- Per-Route Granularity
- Define different timeouts for specific endpoints (e.
- Lesson 1126 — Timeout Configuration in Service Mesh
- Per-service limits
- Control how many notifications each calling service can send (e.
- Lesson 1697 — API Layer and Rate Limiting
- Per-shard metrics
- Identify if specific database shards are slower
- Lesson 1657 — Measuring Fanout Performance
- Per-Tenant
- In multi-tenant systems, scope by tenant ID to isolate entire organizations.
- Lesson 1017 — Idempotency Key Scope and Namespacing
- Per-User
- Rate limit based on authenticated user credentials (user ID, API key, OAuth token)
- Lesson 989 — Per-User vs Per-IP Rate LimitingLesson 991 — Hierarchical Rate LimitingLesson 1017 — Idempotency Key Scope and Namespacing
- Per-user alone
- One wealthy account with 1,000 users could overwhelm your system if each user maxes out their limit simultaneously.
- Lesson 991 — Hierarchical Rate Limiting
- Per-Worker Rate Limiting
- Each fanout worker limits its own write throughput (e.
- Lesson 1654 — Fanout Rate Limiting
- Percentile analysis
- If 99% of users make <100 req/min but your limit is 50, you're blocking normal behavior
- Lesson 997 — Testing and Monitoring Rate Limiters
- Perfect accuracy
- Global limits are enforced precisely since all nodes see the same state
- Lesson 979 — Centralized vs Decentralized ApproachesLesson 1796 — Sliding Window Log in RedisLesson 1801 — Local Caching for Performance
- Perfect collision avoidance
- Sequential IDs never collide
- Lesson 1516 — Counter-Based vs UUID Approaches
- Perfect for horizontal scaling
- Any server can verify any token independently
- Lesson 916 — Session vs Token Tradeoffs
- performance
- requirements.
- Lesson 24 — Peak Traffic MultipliersLesson 112 — HAProxy OverviewLesson 118 — SSL/TLS Termination at Load BalancersLesson 155 — Cache Invalidation ProblemLesson 978 — Why Distributed Rate Limiting Is HardLesson 980 — Redis-Based Distributed Rate LimitingLesson 1164 — Sampling for High-Volume LogsLesson 1367 — Multi-Leader Replication Fundamentals (+1 more)
- Performance anomalies
- Slow queries, timeout warnings, resource exhaustion signals.
- Lesson 1129 — What to Log vs What Not to Log
- Performance benefits
- Not waiting for cross-region coordination means lower latency
- Lesson 532 — Why Eventual Consistency ExistsLesson 891 — SSL/TLS Termination
- Performance ceiling
- Limited by host OS and general-purpose CPU
- Lesson 108 — Hardware vs Software Load Balancers
- Performance cost
- Coordinating across shards requires multiple round-trips and locks, destroying the performance benefits you sharded for in the first place
- Lesson 261 — Distributed Transactions Across Shards
- performance degradation
- from coordination protocols
- Lesson 509 — Latency: The Hidden Cost of CAPLesson 1491 — Data Skew and Cardinality Issues
- Performance impact
- Migration consumes I/O, CPU, and network resources
- Lesson 258 — Resharding and Data Migration
- Performance Metrics
- Lesson 1825 — Monitoring and Analytics Per Tenant
- Performance optimization
- Route users to datacenters with available capacity
- Lesson 117 — Global Server Load Balancing (GSLB)Lesson 196 — Multi-CDN StrategiesLesson 795 — Independent ScalingLesson 1262 — What is Monitoring and Why It Matters
- Performance optimizations
- Naive implementations perform poorly; optimizations add layers of complexity
- Lesson 617 — Why Paxos Is Difficult in Practice
- Performance penalty
- Multi-partition transactions are 10-100x slower than single-partition operations
- Lesson 1489 — Cross-Partition Transactions
- performance requirements
- (which focus on *how fast* your system responds right now), scalability asks: *"What happens when we go from 100 users to 100,000 users?
- Lesson 13 — Scalability Requirements: Growth ExpectationsLesson 19 — Why Back-of-the-Envelope Estimation Matters
- Performance tuning
- Mobile BFF can implement aggressive caching; web BFF can prioritize real-time updates
- Lesson 904 — BFF vs Single Gateway Tradeoffs
- Performant
- (loads quickly, even for users following thousands of accounts)
- Lesson 1632 — Functional Requirements: Core Feed Features
- Periodic polling
- Every 30-60 seconds while app is active
- Lesson 1671 — Real-Time Requirements for Social Feeds
- Periodic Reports and Analytics
- Lesson 738 — Batch Processing Use Cases
- Periodically sync to Redis
- Flush accumulated counts every 100-500ms, or when local count reaches a threshold
- Lesson 1801 — Local Caching for Performance
- Permanent errors
- indicate something fundamentally wrong that won't fix itself:
- Lesson 1026 — Retry on Which Errors
- Permissions
- Specific actions on resources (e.
- Lesson 933 — Role-Based Access Control (RBAC) Fundamentals
- Persist results
- to return them for duplicate requests
- Lesson 1011 — Idempotency Key Storage and Lookup
- Persistence
- The coordinator writes its decision to durable storage **before** announcing it, ensuring recovery is possible if it crashes.
- Lesson 569 — The Coordinator Role in 2PCLesson 698 — Streaming vs Message Queues
- Persistent connections
- (database connection pools, chat servers)
- Lesson 87 — Least Connections AlgorithmLesson 893 — WebSocket and Long-Polling Support
- Persistent stores
- like RocksDB write data to disk first.
- Lesson 340 — In-Memory vs Persistent Key-Value Stores
- Physical shards
- (or nodes) are the actual database servers that host those logical shards.
- Lesson 235 — Logical vs Physical Shards
- Physical storage location
- (which server holds your data via partitioning)
- Lesson 413 — Row Keys and Clustering
- PII
- emails, phone numbers, addresses, SSNs
- Lesson 1145 — Sensitive Data in Structured LogsLesson 1163 — Avoid Logging Sensitive Data
- PLAIN
- Username/password (use only over SSL)
- Lesson 727 — Kafka Security: Authentication and Encryption
- Planned maintenance
- Lesson 1279 — Error Budgets: The Core Concept
- Planning maintenance windows
- – Schedule proactive work before predicted failure
- Lesson 1323 — Mean Time Between Failures (MTBF)
- Player logic
- Client measures network speed, requests appropriate quality segments
- Lesson 1602 — Adaptive Bitrate Streaming (ABR)
- Plural for accumulating metrics
- Counters that grow should use plural nouns: `http_requests_total`, not `http_request_total`.
- Lesson 1182 — Metric Naming Conventions
- plus
- a conflict resolution mechanism (like read-repair with version vectors) that always returns and propagates the latest value.
- Lesson 559 — Strong Consistency with QuorumsLesson 1140 — Contextual Fields
- PN-Counter
- (positive-negative counter): Separate increment and decrement counters
- Lesson 1384 — Conflict-Free Replicated Data Types (CRDTs)
- Point of Presence (PoP)
- is a physical data center location where a CDN provider has installed servers and networking equipment.
- Lesson 171 — Points of Presence (PoPs) and Edge Locations
- Point-in-time consistency
- across partitions requires coordination
- Lesson 1492 — Operational Complexity of Partitioning
- Pointer compression
- (use 32-bit offsets instead of 64-bit pointers)
- Lesson 1759 — Trie Space Optimization Techniques
- Points of Presence (PoPs)
- .
- Lesson 125 — CDN as Edge Caching LayerLesson 178 — Edge Server Networks and Points of Presence (PoPs)Lesson 191 — CDN Provider Feature Comparison
- Policy Decision Point (PDP)
- is a dedicated service that evaluates authorization policies and returns allow/deny decisions.
- Lesson 941 — Policy Decision Points (PDP) and Enforcement Points (PEP)
- Policy enforcement
- Translates high-level policies into Envoy-compatible configurations
- Lesson 861 — Istio: Architecture and Components
- Policy Enforcement Point (PEP)
- is the component in your application or gateway that intercepts requests, asks the PDP for a decision, and enforces it.
- Lesson 941 — Policy Decision Points (PDP) and Enforcement Points (PEP)
- Politeness
- Respect `robots.
- Lesson 1732 — Crawling and Document CollectionLesson 1830 — Breadth-First vs Depth-First Crawling
- Politeness Queues
- (Back-End): Per-host queues ensuring only one request per host happens concurrently.
- Lesson 1843 — Multi-Queue Frontier Architecture
- politeness table
- is an in-memory data structure that maintains crawling metadata for each host your crawler interacts with.
- Lesson 1848 — Politeness Table and Per-Host StateLesson 1849 — URL Frontier Persistence and Recovery
- Polyglot Microservices Environments
- Lesson 868 — When Service Mesh Adds Value
- Polyglot Persistence Pattern
- means using different types of databases within a single application, rather than forcing all your data into one database system.
- Lesson 327 — Polyglot Persistence Pattern
- Pool Exhaustion Events
- Lesson 273 — Connection Pool Monitoring
- Poor Key Distribution
- If your partition key is a timestamp and everyone writes at the current time, all writes hit one partition.
- Lesson 1482 — The Hot Partition Problem
- Poor scalability
- You can't reliably predict capacity when adding hardware
- Lesson 1462 — The Uneven Distribution Problem
- Poor user experience
- – Users wait seconds for feeds to load
- Lesson 1637 — Pull (Read-Time) Feed ModelLesson 1647 — Fanout-on-Read (Pull Model)
- Popular content
- Rank pastes by view count, creation date, or trending velocity (views per hour).
- Lesson 1583 — Analytics and Usage Metrics
- Popular data naturally cached
- Frequently requested data stays in cache, while rarely-used data doesn't waste cache space
- Lesson 131 — Cache-Aside (Lazy Loading) Pattern
- Popular high-authority sites
- Start with well-known domains like major news outlets, Wikipedia, or social media platforms.
- Lesson 1828 — Seed URLs and Starting Point
- Popular URLs Stay Hot
- Viral links remain in cache, never touching the database after initial load
- Lesson 1523 — Caching Layer Architecture
- Populate cache
- Store the fetched data in cache for next time
- Lesson 131 — Cache-Aside (Lazy Loading) PatternLesson 1558 — Read Path: Cache-Aside Pattern
- Positional information
- Where exactly in the document the term appears (optional, for phrase queries)
- Lesson 1736 — Posting Lists and Document IDs
- Positions
- List of word positions within the document (for phrase queries)
- Lesson 1745 — Posting Lists and Document IDs
- POST
- Typically creates new resources or triggers actions.
- Lesson 1000 — Idempotent vs Non-Idempotent OperationsLesson 1009 — HTTP Methods and Natural IdempotencyLesson 1875 — HTTP Methods: GET, POST, PUT, DELETE SemanticsLesson 1884 — Idempotency in RESTful APIs
- Post-hoc governance
- Tools like data catalogs and metadata layers added later to organize the chaos
- Lesson 764 — Data Governance and Quality
- Postgres-Compatible SQL
- You can use standard SQL queries, transactions, and tools.
- Lesson 334 — CockroachDB and Distributed SQL
- posting list
- the ordered sequence of document IDs where that term appears.
- Lesson 1743 — What Is an Inverted IndexLesson 1745 — Posting Lists and Document IDs
- Posting list cache
- Cache frequently accessed index segments
- Lesson 1742 — Search System Architecture Overview
- Pre-aggregated metrics
- "total revenue by region" when you've already rolled up the data
- Lesson 762 — Query Performance Tradeoffs
- Pre-computation strategy
- The ratio justifies pre-generating feeds asynchronously rather than computing them on-demand.
- Lesson 1636 — Capacity Estimation: Feed Reads vs Writes
- Pre-creates
- a set of database connections when your application starts
- Lesson 267 — What is Connection Pooling
- Pre-Generated Key Pools
- means creating millions of keys in advance and storing them in a dedicated table.
- Lesson 1551 — Key Generation Strategy
- Pre-screening
- Before creating the short URL, query the threat API with the destination URL
- Lesson 1540 — Spam and Malicious Link Detection
- Precomputed ranking
- means maintaining a pre-sorted feed in storage.
- Lesson 1667 — Real-Time vs Precomputed Ranking
- Predictability risk
- Users can guess `abc123` → `abc124`, revealing information about your traffic volume
- Lesson 1516 — Counter-Based vs UUID Approaches
- Predictable
- Deterministic eviction order
- Lesson 148 — First In First Out (FIFO)Lesson 1510 — Auto-Incrementing ID ApproachLesson 1515 — Short URL Predictability Tradeoffs
- Predictable freshness
- without paying the latency cost of synchronous replication
- Lesson 1397 — Bounded Staleness Consistency
- Predictable latency
- Dedicated hardware means consistent behavior
- Lesson 108 — Hardware vs Software Load BalancersLesson 349 — Redis In-Memory Storage ModelLesson 1362 — Chain Replication
- Predictable lookup
- Any node can calculate where replicas should be by performing the same clockwise walk
- Lesson 1466 — Replication with Consistent Hashing
- Predictable lookups
- Given a short code, you always know which shard to query
- Lesson 1541 — Sharding and Database Scaling
- Predictable performance
- Optimized storage layouts enable consistent query speeds
- Lesson 759 — Schema-on-Write vs Schema-on-ReadLesson 1473 — Range Partitioning BenefitsLesson 1529 — Preloading Hot URLs into Cache
- Predictable queries
- dashboards, monthly reports, KPIs you run repeatedly
- Lesson 762 — Query Performance Tradeoffs
- Predictable read latency
- same for 10 or 10 million followers viewing
- Lesson 1638 — Push (Write-Time) Feed Model
- Predictable read performance
- Read latency is constant regardless of how many people a user follows.
- Lesson 1646 — Fanout-on-Write (Push Model)
- Predictable traffic patterns
- If analytics show certain content becomes popular at specific times (news sites at 6am, streaming shows at 8pm), preload those assets beforehand.
- Lesson 184 — Cache Warming and Preloading
- Predictable Update Patterns
- Lesson 290 — When to Denormalize
- Preemption handling
- If a higher-numbered prepare arrives, the current leader steps down, and the protocol reverts to full Paxos
- Lesson 616 — Multi-Paxos for Log Replication
- Prefect
- and **Dagster** represent the next generation.
- Lesson 773 — Prefect and Dagster for Modern Workflows
- Preferences
- Stored user preferences (language, past clicks, categories of interest) influence which suggestions surface first.
- Lesson 1767 — Personalized Typeahead
- Prefetching
- means resolving DNS records *before* you actually need them.
- Lesson 1858 — DNS Prefetching and Batch Resolution
- Prefetching thumbnails
- and initial video segments during idle time
- Lesson 1618 — Optimizing for Mobile Networks
- Premature rejection
- Tokens rejected as expired when they're still valid
- Lesson 949 — Clock Skew and Token Validation
- Premium users
- 50% of capacity, generous token bucket (1000 requests/min)
- Lesson 974 — Rate Limiting with Priority QueuesLesson 990 — Tiered Rate Limits for Different User Classes
- Prepare phase
- It sends a `PREPARE` message to all participant nodes, asking "Can you commit this transaction?
- Lesson 569 — The Coordinator Role in 2PCLesson 575 — 2PC Performance CharacteristicsLesson 612 — The Two-Phase ProtocolLesson 613 — The Prepare Phase
- Prepared
- After Phase 1, the participant has voted "yes" and locked resources, but hasn't committed yet
- Lesson 572 — Participant State Transitions
- Presence detection
- lets you distinguish between "online and active" vs "offline" vs "connected but inactive," so you can optimize real-time delivery accordingly.
- Lesson 1676 — Presence Detection and User Status
- Presence service
- Tracks active user IDs in a fast key-value store (Redis) with TTL (time-to-live) expiration
- Lesson 1676 — Presence Detection and User StatusLesson 1681 — Mobile Push Notification Integration
- Present meaningful feedback
- to end users rather than cryptic error messages
- Lesson 1115 — Deadline Exceeded Error Handling
- Preserve capacity
- for critical operations or users that can still be served
- Lesson 1084 — Load Shedding Under Cascading Failure
- Prevent new elections
- Followers reset their election timeouts when receiving valid heartbeats
- Lesson 624 — AppendEntries RPC: Replication Mechanism
- Prevent resource exhaustion
- by freeing up threads/connections that would otherwise hang forever
- Lesson 1086 — What Timeouts Are and Why They Matter
- Preventing duplicate writes
- becomes possible when you can uniquely identify each entity
- Lesson 299 — Primary Keys and Entity Integrity
- Prevention
- Always close connections in `finally` blocks or use automatic resource management patterns to guarantee cleanup, even when errors occur.
- Lesson 275 — Common Pooling Anti-Patterns
- Prevents
- paying customers from being starved by free users
- Lesson 990 — Tiered Rate Limits for Different User Classes
- Prevents cascading failures
- – If Service A keeps hammering failing Service B, both services may collapse under the load
- Lesson 1046 — The Three States: Open
- Prevents orphans
- You can't insert an order with `user_id=999` if user 999 doesn't exist
- Lesson 300 — Foreign Keys and Referential Integrity
- Prevents overload
- by ensuring no single server handles too many requests
- Lesson 76 — What Is a Load Balancer?
- Prevents repeat failures
- By fixing underlying system issues rather than "fixing" people, you address the real problems— missing guardrails, inadequate testing, confusing interfaces, or knowledge silos.
- Lesson 1351 — Blameless Postmortem Culture
- Prevents Thundering Herd
- Avoids simultaneous cache misses for the same popular URLs
- Lesson 1529 — Preloading Hot URLs into Cache
- Preview Generation
- Lesson 1603 — Thumbnail and Preview Generation
- Previous log information
- To ensure consistency with the follower's log
- Lesson 624 — AppendEntries RPC: Replication Mechanism
- Previous window
- (10:00:00–10:00:59): 80 requests
- Lesson 969 — Sliding Window CounterLesson 1797 — Sliding Window Counter with Redis
- Pricing
- Pay per GB processed and per hourly rule configured, with no charge for the load balancer itself when idle.
- Lesson 114 — Cloud Load Balancers (GCP and Azure)Lesson 728 — AWS Kinesis Overview
- Primary
- Live production database on local SSDs
- Lesson 1407 — The 3-2-1 Backup RuleLesson 1444 — Communication Plans During Disasters
- Primary access patterns
- How do users typically need the data ordered?
- Lesson 1895 — Default Sorting and Index Alignment
- Primary connection pool
- A set of pre-established database connections to the primary server, used exclusively for INSERT, UPDATE, DELETE operations
- Lesson 221 — Application-Level Connection Management
- primary database
- Lesson 198 — What is Database Replication?Lesson 201 — Why Replicate: Availability and FailoverLesson 202 — Why Replicate: Geographic DistributionLesson 209 — Read-After-Write Consistency ProblemLesson 220 — Read-Write Splitting FundamentalsLesson 1522 — Read-Heavy Workload and Database ScalingLesson 1561 — Read Replicas for Retrieval Scaling
- Primary delivery pipeline
- – processes the notification and sends it
- Lesson 1725 — Analytics Pipeline Architecture
- primary key
- is a column (or set of columns) that uniquely identifies each row in a table.
- Lesson 299 — Primary Keys and Entity IntegrityLesson 423 — Primary Key Components
- Primary processes the write
- → Stores the data and records the change
- Lesson 199 — Primary-Replica Architecture
- Primary sends changes
- → Replicates updates to all Replicas asynchronously or synchronously
- Lesson 199 — Primary-Replica Architecture
- Primary-Replica architecture
- (also called Master-Slave), you have:
- Lesson 199 — Primary-Replica Architecture
- Primary/Secondary Rotation
- Lesson 1297 — On-Call Fundamentals and Rotation Models
- Prioritize Availability (AP)
- Allow transactions to proceed with uncertainty.
- Lesson 580 — CAP Theorem Impact on Distributed Transactions
- Prioritize Consistency (CP)
- Block and wait indefinitely until the partition heals.
- Lesson 580 — CAP Theorem Impact on Distributed Transactions
- Prioritize critical requests
- using request classification.
- Lesson 963 — Graceful Degradation with Rate Limits
- Prioritized
- Critical fixes first, nice-to-haves later
- Lesson 1352 — Postmortem Structure and Action Items
- Priority
- Crawl important or frequently-changing pages first
- Lesson 1732 — Crawling and Document Collection
- Priority handling
- Premium users or smaller images can jump the queue
- Lesson 1595 — Thumbnail and Preview Generation Trigger
- Priority level
- Critical alerts (security warnings) might override user preferences and use multiple channels.
- Lesson 1703 — Channel Routing Logic
- priority queue
- where each URL has a "next crawl time" calculated from these factors.
- Lesson 1835 — Crawl Freshness RequirementsLesson 1844 — Front Queue: Priority Management
- priority queues
- meet rate limiting.
- Lesson 974 — Rate Limiting with Priority QueuesLesson 975 — Algorithm Selection Criteria
- Priority queuing
- Process regular tenants' requests before hot tenants during contention
- Lesson 1823 — Hot Tenant Problem
- Priority sampling
- High-value transactions (premium users, checkout flows, critical B2B partners) carry a priority flag.
- Lesson 1256 — Priority and Debug Sampling
- Priority Signals
- Lesson 1835 — Crawl Freshness Requirements
- Priority tiers
- Premium clients get higher limits than free-tier users
- Lesson 885 — Rate Limiting and ThrottlingLesson 995 — Graceful Degradation Through Throttling
- Priority-based scheduling
- High-value or frequently-changing pages get recrawled more often.
- Lesson 1873 — Handling Recrawls and Freshness
- Priority-Based Throttling
- Apply stricter limits to celebrity fanouts while allowing normal users faster processing.
- Lesson 1654 — Fanout Rate Limiting
- Privacy compliance
- Sensitive data doesn't linger indefinitely
- Lesson 1565 — Expiration Requirements and TTL Basics
- privacy settings
- from lesson 1576.
- Lesson 1583 — Analytics and Usage MetricsLesson 1653 — Selective Fanout Optimization
- Pro
- Eliminates table lookups, faster reads
- Lesson 279 — Covering IndexesLesson 1518 — Case Sensitivity Considerations
- Problem
- If thousands of edge servers all miss cache simultaneously, they'd stampede your origin server with identical requests.
- Lesson 182 — Cache Hierarchies and Tiered Caching
- Process later
- Consumer services pull from the queue and persist to an analytics database
- Lesson 1530 — Analytics and Click Tracking
- Processes incrementally
- using stream processing frameworks (Storm, Flink, Spark Streaming)
- Lesson 749 — Lambda Architecture: Speed Layer
- Processing
- Generate thumbnails, transcode videos to multiple formats/bitrates
- Lesson 1584 — Image/Video Hosting: Problem Definition and Scale
- Processing Latency
- tracks end-to-end time from API submission to channel delivery.
- Lesson 1707 — Processing Pipeline Monitoring
- Processing load
- Analyzing every trace overwhelms backend systems
- Lesson 1228 — Trace Sampling Fundamentals
- Processing Timing
- Lesson 1603 — Thumbnail and Preview Generation
- Producer
- Creates and sends messages (events, tasks, data)
- Lesson 646 — The Producer-Consumer ModelLesson 694 — Producers and ConsumersLesson 702 — Producers and Message PublishingLesson 1604 — Message Queue for Processing Jobs
- Producer acknowledgments
- (or "acks") are the broker's way of saying "Yes, I received your message.
- Lesson 682 — Producer AcknowledgmentsLesson 707 — In-Sync Replicas (ISR)
- Producer registers schema
- Before sending data, the producer submits the schema (e.
- Lesson 725 — Schema Registry and Evolution
- producers
- create and send messages independently of **consumers** who receive and process them.
- Lesson 646 — The Producer-Consumer ModelLesson 671 — ActiveMQ and Traditional Enterprise MessagingLesson 694 — Producers and Consumers
- Product catalog API
- Cache for 5 minutes—inventory doesn't change every second
- Lesson 194 — CDN for API Acceleration
- Product catalogs
- – item descriptions rarely change instantly; eventual sync is fine
- Lesson 318 — When to Choose ACID or BASE
- Product Catalogs Use NoSQL
- Lesson 330 — Real-World Decision Examples
- Product teams
- traditionally prioritize velocity and customer features.
- Lesson 1282 — Error Budget as a Shared Currency
- Production aggregation
- Pure structured JSON sent to your centralized logging system
- Lesson 1166 — Human-Readable vs Machine-Parseable
- Progress tracking
- log each step for debugging
- Lesson 1441 — Runbooks and AutomationLesson 1586 — Multipart Upload for Large Files
- Progressive health checks
- that signal "ready but limited capacity"
- Lesson 1081 — Thundering Herd After Recovery
- Progressive rollout
- if canaries look healthy, expand to 25%, then 50%, then 100%
- Lesson 1314 — Release Engineering and Safe Deployment
- Progressive Technology Adoption
- means you can introduce new tools, languages, or platforms incrementally by creating new microservices or refactoring individual services—while the rest of your system continues running unchanged.
- Lesson 799 — Progressive Technology Adoption
- Projections compute state
- Different services read the event log and build their own view of current state
- Lesson 586 — Alternative: Event Sourcing for Consistency
- Prometheus
- excels at hundreds of thousands of time series but may need federation beyond single-datacenter deployments.
- Lesson 1208 — Choosing a Metrics System for Your Scale
- Prometheus's metadata API
- , internal wikis, or dedicated platforms like **Datadog's metric summaries** to maintain these catalogs.
- Lesson 1216 — Metric Documentation and Discovery
- promise
- "I won't accept any proposal lower than yours.
- Lesson 612 — The Two-Phase ProtocolLesson 613 — The Prepare Phase
- Promotion
- The selected replica is reconfigured to accept writes
- Lesson 207 — Replica Promotion and Failover Basics
- PromQL
- in Prometheus offers powerful ad-hoc queries, while **Graphite** provides simpler but less flexible querying.
- Lesson 1208 — Choosing a Metrics System for Your Scale
- Propagate
- it through HTTP headers, message queues, or RPC metadata
- Lesson 1158 — Correlation IDs Across Services
- Propagates automatically
- Each hop receives and can read this deadline
- Lesson 1104 — gRPC Timeout Propagation
- Propagation strategies
- determine how updates flow through cache layers:
- Lesson 128 — Cache Coherence Across Layers
- Properties
- Key-value pairs attached to both nodes and edges
- Lesson 451 — What is a Graph Database?Lesson 452 — Graph Model: Nodes and Edges
- Proportional to Node Count
- Maintain a fixed number of partitions per node (using virtual nodes in consistent hashing).
- Lesson 1485 — Rebalancing Partitions
- Pros
- Lesson 108 — Hardware vs Software Load BalancersLesson 237 — Sharding Architecture PatternsLesson 259 — Resharding Strategies: Stop-the-World vs OnlineLesson 417 — Compaction StrategiesLesson 916 — Session vs Token TradeoffsLesson 927 — Token Introspection and ValidationLesson 943 — Authorization in MicroservicesLesson 983 — Gossip Protocols for Rate Limit Sync (+17 more)
- Protection at Scale
- When you're handling millions of requests per second (like search engines, APIs, or notification systems), every service needs protection.
- Lesson 1782 — Rate Limiter Service Overview
- Protection from noisy neighbors
- in multi-tenant scenarios
- Lesson 859 — Rate Limiting at Service Boundaries
- Protects
- resources from free-tier abuse
- Lesson 990 — Tiered Rate Limits for Different User ClassesLesson 1044 — The Electrical Analogy
- Protects on delete
- By default, you can't delete a user if they have orders (you must handle the orders first)
- Lesson 300 — Foreign Keys and Referential Integrity
- Protocol Buffers
- (protobuf) for binary serialization and **HTTP/2** for transport, delivering significantly better performance.
- Lesson 1917 — gRPC: Protocol Buffers and Binary RPC
- Protocol Enhancements
- CDNs use optimized protocols between edge and origin (like HTTP/2, HTTP/3, or custom protocols) that reduce overhead and handle packet loss better than standard internet connections.
- Lesson 186 — Dynamic Content Acceleration
- Protocol overhead
- Account for TCP/IP headers (typically 5-10% extra)
- Lesson 1499 — Bandwidth Requirements for Redirects
- Protocol-agnostic
- Works with any TCP/UDP traffic (HTTP, HTTPS, FTP, databases, gaming)
- Lesson 109 — Layer 4 (Transport) Load Balancing
- Prove the concept first
- Focus on product-market fit, not infrastructure complexity
- Lesson 820 — When a Monolith is the Right Choice
- Provide predictable latency bounds
- for your service's response times
- Lesson 1086 — What Timeouts Are and Why They Matter
- Provisions certificates
- – When a new sidecar proxy starts, the control plane generates a unique X.
- Lesson 844 — Control Plane: Certificate Management
- Proxy-Based Read-Write Splitting
- moves that responsibility to a middleware layer—a database proxy that sits between your application and your databases.
- Lesson 222 — Proxy-Based Read-Write Splitting
- ProxySQL
- (for MySQL) or **MaxScale** automatically inspect incoming SQL queries and route them intelligently:
- Lesson 222 — Proxy-Based Read-Write Splitting
- Prune unnecessary logs
- Remove debug-level logs left in production code, redundant information, or logs that duplicate what metrics already capture.
- Lesson 1171 — Log Review and Alert Fatigue
- Pruning
- Ignore low-scoring documents early using approximate scoring
- Lesson 1741 — Search Latency and Response Time
- Psychological safety
- means people can admit errors without fear, knowing the team will analyze systems, not scapegoat individuals.
- Lesson 1317 — Blameless Culture and Learning from Failure
- PTransforms
- Operations that transform data (like `Map`, `Filter`, `GroupByKey`)
- Lesson 772 — Apache Beam Programming Model
- Pub-sub
- email service, analytics, CRM, and welcome workflow all need to know independently.
- Lesson 664 — Choosing Between Queue and Pub-Sub
- Pub/Sub
- implements the **pub-sub pattern** for event broadcasting.
- Lesson 672 — Redis as a Lightweight Message BrokerLesson 735 — Choosing a Streaming Platform
- Public
- pastes are indexed, searchable, and accessible to anyone with the URL.
- Lesson 1576 — Access Control and Privacy Settings
- Public marketing campaigns
- URLs are meant to be shared anyway
- Lesson 1515 — Short URL Predictability Tradeoffs
- Public URLs
- Fast path with full caching (no auth checks)
- Lesson 1533 — Access Control and Private URLs
- Publish-Subscribe
- (Pub-Sub) messaging solves this by introducing a central message broker.
- Lesson 1675 — Pub-Sub for Real-Time Distribution
- publisher
- sends a message to a channel using `PUBLISH channel message`.
- Lesson 357 — Redis Pub/Sub for Real-Time MessagingLesson 656 — Pub-Sub Pattern FundamentalsLesson 662 — Fan-Out with Pub-Sub
- Publishes
- outcome events that others can react to
- Lesson 590 — Choreography-Based SagasLesson 1675 — Pub-Sub for Real-Time Distribution
- pull
- (lazy loading) and **push** (pre-population).
- Lesson 183 — Pull vs Push CDN ModelsLesson 697 — Push vs Pull Consumption ModelsLesson 1197 — Pull vs Push Metrics Collection ModelsLesson 1267 — Push vs Pull Monitoring ModelsLesson 1647 — Fanout-on-Read (Pull Model)
- Pull (fanout-on-read)
- makes sense for **celebrities with >100K followers**.
- Lesson 1658 — Fanout Strategy Selection Criteria
- Pull (Lazy Caching)
- The CDN fetches content from your origin server only when a user requests it.
- Lesson 1610 — Push vs Pull CDN Models for Media
- Pull distributed state
- On cache miss or window expiration, fetch the authoritative count from Redis
- Lesson 1801 — Local Caching for Performance
- Pull for celebrities
- When a celebrity with 50 million followers posts, don't fan-out to everyone—that's too expensive.
- Lesson 1639 — Hybrid (Pull-Push) Feed Model
- Pull for inactive users
- Don't maintain pre-built feeds for users who haven't logged in for months; generate their feed on- demand if they return.
- Lesson 1639 — Hybrid (Pull-Push) Feed Model
- pull model
- , edge servers cache content **on-demand**.
- Lesson 183 — Pull vs Push CDN ModelsLesson 1181 — Push vs Pull Collection ModelsLesson 1267 — Push vs Pull Monitoring ModelsLesson 1647 — Fanout-on-Read (Pull Model)
- Pull Model (Read-Time Fanout)
- No immediate distribution.
- Lesson 1645 — What is Fanout in Social Media Systems
- Pull subscriptions
- Your application actively requests messages from Pub/Sub.
- Lesson 674 — Google Cloud Pub/Sub Architecture
- Pull-on-open
- Only when user actively opens the app (most common)
- Lesson 1671 — Real-Time Requirements for Social Feeds
- Purge APIs
- Lesson 185 — Purging and Cache Invalidation StrategiesLesson 1571 — Cache Invalidation on Update or Delete
- Purge Strategy
- When a paste expires or is deleted, you need to purge it from the CDN using their API to prevent serving stale content.
- Lesson 1569 — CDN Integration for Paste Delivery
- push
- (pre-population).
- Lesson 183 — Pull vs Push CDN ModelsLesson 697 — Push vs Pull Consumption ModelsLesson 1197 — Pull vs Push Metrics Collection ModelsLesson 1201 — StatsD and Metric Aggregation DaemonsLesson 1267 — Push vs Pull Monitoring ModelsLesson 1639 — Hybrid (Pull-Push) Feed ModelLesson 1692 — Channel-Specific FormattingLesson 1693 — Delivery Receipt Tracking
- Push (fanout-on-write)
- works best when users have **fewer than ~1,000-5,000 followers**.
- Lesson 1658 — Fanout Strategy Selection Criteria
- Push (Pre-Population)
- *You* upload content directly to CDN edge locations before anyone requests it.
- Lesson 1610 — Push vs Pull CDN Models for Media
- Push a lightweight notification
- to active/online users only (via WebSocket or SSE)
- Lesson 1679 — Hybrid Pull-Push Model
- Push Down Filters Early
- Lesson 281 — Query Rewriting for Performance
- Push for active users
- When a regular user posts, fan-out their content immediately to all followers' pre-built feeds (fast reads, manageable fan-out).
- Lesson 1639 — Hybrid (Pull-Push) Feed Model
- push model
- , you **proactively upload** content to edge servers before users request it.
- Lesson 183 — Pull vs Push CDN ModelsLesson 1181 — Push vs Pull Collection ModelsLesson 1202 — Graphite Time-Series DatabaseLesson 1204 — Cloud-Native Metrics: CloudWatch and StackdriverLesson 1267 — Push vs Pull Monitoring ModelsLesson 1640 — Celebrity Problem in Push ModelsLesson 1646 — Fanout-on-Write (Push Model)
- Push Model (Write-Time Fanout)
- When Alice publishes, the system immediately writes her post to all followers' feed storage.
- Lesson 1645 — What is Fanout in Social Media Systems
- push notifications
- to bring them back.
- Lesson 1681 — Mobile Push Notification IntegrationLesson 1694 — Channel Costs and Economics
- Push subscriptions
- Pub/Sub delivers messages via HTTP POST to your webhook endpoint.
- Lesson 674 — Google Cloud Pub/Sub Architecture
Q
- QPS
- (queries per second) and understand how much traffic a single server can handle, you need to figure out: *How many servers do I actually need?
- Lesson 28 — Server Count EstimationLesson 120 — Caching Hierarchy Overview
- Quality checks
- ETL processes validate, clean, and transform data upfront
- Lesson 764 — Data Governance and Quality
- Quality issues
- Garbage in, garbage out—discovered only when queried
- Lesson 764 — Data Governance and Quality
- Quality uncertainty
- Bad data only discovered when accessed
- Lesson 759 — Schema-on-Write vs Schema-on-Read
- Quality-Based Compression
- Lesson 1621 — Compression and Format Optimization
- Quarantine mechanism
- Temporarily disable suspicious short URLs pending manual review
- Lesson 1540 — Spam and Malicious Link Detection
- Quarantine system
- for suspicious content pending manual review
- Lesson 1581 — Abuse Prevention and Content Moderation
- queries per second (QPS)
- to size your infrastructure.
- Lesson 23 — QPS and Daily Active Users EstimationLesson 1731 — Search Requirements and Scale Estimation
- Query
- Can Bob read Document X?
- Lesson 938 — Relationship-Based Access Control (ReBAC)Lesson 1158 — Correlation IDs Across ServicesLesson 1572 — Storage Tier MigrationLesson 1756 — Machine Learning for Ranking (Learning to Rank)
- Query Alignment
- Select keys that match your access patterns.
- Lesson 1472 — Range Partition Key Selection
- Query Broadcasting
- Send the identical search request to all shards in parallel
- Lesson 1780 — Distributed Query Coordination
- Query complexity analysis
- assigns a cost to each field before execution.
- Lesson 1916 — Rate Limiting and Complexity Analysis in GraphQL
- Query Complexity Hurts Performance
- Lesson 290 — When to Denormalize
- Query efficiency
- Use clustering columns that match your read patterns, so related data sits together and sorted
- Lesson 423 — Primary Key ComponentsLesson 1480 — Hybrid Partitioning Approaches
- Query efficiently
- "Show all videos uploaded by user X" without scanning massive files
- Lesson 1590 — Metadata Database Design
- Query Engine
- Executes PromQL queries against stored data
- Lesson 1198 — Prometheus Architecture and Data Model
- Query examples
- How to use it in PromQL or other query languages
- Lesson 1216 — Metric Documentation and Discovery
- Query expansion
- rewrites or augments queries with synonyms.
- Lesson 1774 — Spell Correction and Query Expansion
- Query Flow
- User submits search → API queries search index → Index returns ranked paste IDs → Fetch paste metadata from database/cache → Return results
- Lesson 1582 — Search and Discovery
- Query Layer
- Lesson 1477 — Directory Service Architecture
- Query parameters
- Requests with `?
- Lesson 110 — Layer 7 (Application) Load BalancingLesson 1892 — Filtering Query Parameters
- Query Patterns
- Lesson 765 — Choosing Lake vs Warehouse
- Query Patterns Are Path-Based
- Lesson 459 — Graph vs Relational Trade-offs
- Query performance
- Even indexed queries slow down as tables grow massive
- Lesson 229 — What is Sharding?Lesson 409 — Data Size and Storage ConsiderationsLesson 413 — Row Keys and ClusteringLesson 1159 — Log Aggregation Performance ConsiderationsLesson 1192 — Cardinality and Label ExplosionLesson 1252 — Sampling Strategies OverviewLesson 1642 — Post Metadata and Schema Design
- Query precisely
- "Find all login failures from this IP in the last hour" becomes a simple filter on `event="login_failed"` and `ip_address` fields
- Lesson 1137 — What is Structured Logging
- Query result cache
- (lesson 124): Caches entire query results
- Lesson 126 — Database Internal Caching (Buffer Pool)Lesson 1742 — Search System Architecture Overview
- Query routing layer
- A lightweight service maps prefix ranges to server addresses.
- Lesson 1764 — Distributed Trie Architecture
- Query scope
- Queries run against a specific collection, not the entire database
- Lesson 383 — Collections and Databases
- Query servers
- (or "search servers") receive user queries, parse them using boolean operators, fetch relevant posting lists from indexes, compute TF-IDF or other scoring functions, rank results, and return the top matches—all within 100-300ms.
- Lesson 1742 — Search System Architecture Overview
- Query slowdowns
- Aggregations must scan thousands or millions of series
- Lesson 1207 — Metrics Cardinality and Performance Impact
- Query that partition's filter
- (via RPC or cache lookup)
- Lesson 1867 — Distributed Deduplication with Bloom Filters
- Query timeout
- Maximum time a query can run before being automatically killed
- Lesson 285 — Query Timeout and Statement Limits
- Query timeouts
- and **statement limits** are safety mechanisms that prevent individual queries from consuming excessive resources.
- Lesson 285 — Query Timeout and Statement Limits
- Query understanding
- Frequent misspellings guide spell correction; low-CTR queries highlight gaps in your index
- Lesson 1779 — Search Analytics and Click Tracking
- Query-aligned
- Matches your most common query patterns to minimize cross-shard lookups
- Lesson 232 — Shard Key Selection
- Query-level data
- Lesson 1779 — Search Analytics and Click Tracking
- querying
- (query service)—Jaeger handles millions of spans per second.
- Lesson 1241 — Jaeger Architecture and ComponentsLesson 1730 — What is a Search Engine?
- queue
- and respond immediately while processing happens later.
- Lesson 6 — Components of a System Design SolutionLesson 664 — Choosing Between Queue and Pub-SubLesson 1604 — Message Queue for Processing Jobs
- Queue assignment
- Route the URL to the dedicated queue for that host
- Lesson 1845 — Back Queue: Politeness Enforcement
- Queue buildup
- Failed requests leave behind tasks that still run to completion
- Lesson 1096 — Why Timeouts Must Propagate
- Queue depth
- (number of items waiting right now)
- Lesson 1175 — Gauge MetricsLesson 1184 — Gauge MetricsLesson 1657 — Measuring Fanout PerformanceLesson 1707 — Processing Pipeline MonitoringLesson 1872 — Dynamic Scaling Based on Queue Depth
- Queue depths
- Backlog growing in message queues or thread pools
- Lesson 993 — Adaptive Rate LimitingLesson 1871 — Monitoring Crawler Fleet Performance
- Queue lower-priority operations
- for later processing rather than rejecting them outright.
- Lesson 963 — Graceful Degradation with Rate Limits
- Queue mirroring
- (now called **quorum queues** in modern RabbitMQ) replicates queue data across nodes, ensuring messages aren't lost when a node crashes.
- Lesson 668 — RabbitMQ Clustering and High Availability
- Queue Partitioning
- Split your notification queue into multiple partitions (e.
- Lesson 1708 — Scalability and Horizontal Expansion
- Queues
- provide point-to-point communication with competing consumers.
- Lesson 675 — Azure Service Bus Features
- quorum
- (majority agreement) before declaring a leader dead and triggering failover.
- Lesson 353 — Redis Sentinel for High AvailabilityLesson 425 — Tunable Consistency LevelsLesson 555 — What is a Quorum?Lesson 556 — Read and Write QuorumsLesson 605 — Quorums and Majority AgreementLesson 610 — The Three Roles in PaxosLesson 1340 — Split-Brain ProblemLesson 1361 — Quorum-Based Replication
- Quorum Availability
- Whether a majority of nodes can communicate.
- Lesson 643 — Monitoring and Operating Consensus Clusters
- Quorum waits
- Systems must contact and wait for multiple nodes before returning results
- Lesson 509 — Latency: The Hidden Cost of CAP
- Quorum write
- "Confirm once a majority of replicas acknowledge"
- Lesson 1398 — Consistency Level Per-Operation
- Quota sharding
- means splitting the total allowed quota across your nodes.
- Lesson 984 — Quota Sharding Across Nodes
- Quotas
- Total allowances over longer periods (e.
- Lesson 859 — Rate Limiting at Service BoundariesLesson 1596 — Upload Rate Limiting and Quotas
R
- RabbitMQ
- is a versatile broker built on the AMQP protocol.
- Lesson 665 — Overview of Message Broker Landscape
- race condition
- ).
- Lesson 637 — Distributed Locks via ConsensusLesson 1514 — Custom Short URL SupportLesson 1800 — Race Conditions and Concurrency Control
- Race conditions
- Multiple services detecting expiration simultaneously can trigger parallel refresh attempts.
- Lesson 946 — Token Refresh in Distributed SystemsLesson 977 — Algorithm Implementation Patterns
- Raft
- are battle-tested algorithms that provide this agreement.
- Lesson 527 — Consensus and Strong ConsistencyLesson 618 — Raft Overview: Understandability as a Design GoalLesson 636 — Consensus for Leader ElectionLesson 638 — Configuration Management with Consensus
- Raft consensus
- (which you learned earlier) to ensure all servers agree on which services are healthy and available.
- Lesson 635 — Consul: Service Discovery with Raft Consensus
- Random
- has nearly zero overhead (just pick any item) but evicts blindly.
- Lesson 154 — Implementation Tradeoffs
- Random Generation
- creates identifiers on-the-fly using random characters from a character set.
- Lesson 1551 — Key Generation Strategy
- Random sampling
- Keep 1% of all INFO-level logs, 100% of ERROR
- Lesson 1157 — Log Sampling and FilteringLesson 1164 — Sampling for High-Volume LogsLesson 1217 — Sampling for Expensive Metrics
- Random Selection
- are dead simple—minimal computation, easy to understand and debug.
- Lesson 96 — Algorithm Selection Tradeoffs
- Random, individual file retrieval
- (not batch processing)
- Lesson 1593 — Distributed File System Considerations
- Range
- Assigns contiguous partition ranges per topic
- Lesson 716 — Consumer Groups and Partition Assignment
- Range approach
- Group by genre → finding mysteries is trivial, but the "Romance" section might overflow while "Gardening" sits empty
- Lesson 1454 — Partitioning Tradeoffs: Distribution vs Query Efficiency
- Range partitioning
- enables efficient range queries but can create hot spots
- Lesson 1453 — Composite PartitioningLesson 1480 — Hybrid Partitioning ApproachesLesson 1481 — Range vs Directory Tradeoffs
- Range Partitioning Vulnerability
- As you learned with range partitioning hotspots (lesson 1474), sequential keys naturally create this problem—newest data in one partition, all recent queries hammering that same partition.
- Lesson 1482 — The Hot Partition Problem
- range queries
- incredibly efficient.
- Lesson 422 — Clustering Columns and Row OrderingLesson 1659 — Timeline Storage Requirements
- Range-based partitioning
- does the opposite.
- Lesson 1454 — Partitioning Tradeoffs: Distribution vs Query Efficiency
- Range-based sharding
- Partition documents by ranges (e.
- Lesson 1769 — Horizontal Scaling of Search Infrastructure
- Ranked feeds
- reorder posts using engagement signals:
- Lesson 1644 — Feed Personalization and Ranking Requirements
- Ranking signals
- are measurable properties or metadata about posts and users that indicate how relevant or interesting a post might be.
- Lesson 1666 — Ranking Signals and Features
- Rarely changing reference data
- → Application in-memory cache
- Lesson 130 — Choosing the Right Caching Layer
- Rate
- How many requests per second is your service handling?
- Lesson 1190 — The RED MethodLesson 1265 — RED Method: Rate, Errors, Duration
- rate calculations
- to see throughput trends
- Lesson 1183 — Counter MetricsLesson 1194 — Time-Series Queries and PromQLLesson 1726 — Aggregation and Reporting
- Rate drops
- → upstream service stopped calling you, or clients are timing out
- Lesson 1265 — RED Method: Rate, Errors, Duration
- Rate limit complex queries
- Track cost (filters + sorts) per user
- Lesson 1897 — Performance Considerations and Limits
- Rate limit spikes
- Momentary quota breach that resets quickly
- Lesson 1020 — Why Retries Are Necessary in Distributed Systems
- Rate Limiting
- Edge servers enforce request limits per IP or region, throttling attackers while allowing normal users through.
- Lesson 195 — CDN for DDoS ProtectionLesson 257 — Celebrity Problem in Social GraphsLesson 343 — Time-to-Live and ExpirationLesson 876 — API Gateway as a Cross-Cutting Concern HubLesson 885 — Rate Limiting and ThrottlingLesson 957 — Rate Limiting vs ThrottlingLesson 1157 — Log Sampling and FilteringLesson 1167 — Avoid Log Explosion (+7 more)
- Rate Limiting and Politeness
- You can't hammer a single website with thousands of requests per second.
- Lesson 1838 — URL Frontier: Definition and Purpose
- Rate Limiting Per Host
- Lesson 1840 — Politeness Requirements for Web Crawling
- Rate limits
- Maximum requests per time window (e.
- Lesson 859 — Rate Limiting at Service BoundariesLesson 1699 — Notification Processing Workers
- Rate-Based Adjustment
- If processing rate falls despite a full queue, scale up; if workers are idle, scale down
- Lesson 1872 — Dynamic Scaling Based on Queue Depth
- Rate-limited operations
- API calls to third-party services
- Lesson 659 — Queue Use Cases: Work Distribution
- Ratios and percentages
- – Cache hit rates of 70% vs 95% drastically change design
- Lesson 32 — Rounding and Approximation Techniques
- Raw/high-resolution (1-15 seconds)
- Keep 1-7 days for debugging active incidents
- Lesson 1213 — Metric Retention Policies
- RDF (Resource Description Framework)
- structures data as **triples**:
- Lesson 453 — Property Graphs vs RDF Triples
- re-encryption
- (terminate at load balancer, then re-encrypt to backends).
- Lesson 118 — SSL/TLS Termination at Load BalancersLesson 891 — SSL/TLS Termination
- Re-hash with modified input
- Use the collision result itself as input: `hash(hash(url))`.
- Lesson 1509 — Handling Hash Collisions
- Read
- Check Redis first
- Lesson 355 — Redis as a CacheLesson 387 — CRUD Operations on DocumentsLesson 416 — Read Path and Bloom FiltersLesson 1542 — Pastebin System Overview
- Read `robots.txt` directives
- Many sites specify a `Crawl-delay: N` value (in seconds) telling crawlers to wait N seconds between requests
- Lesson 1842 — Politeness Budget and Crawl Delay
- Read availability
- Multiple nodes can serve the same key simultaneously, spreading read traffic across replicas.
- Lesson 364 — Replication in Distributed Key-Value Stores
- Read bandwidth
- 10-100x write bandwidth due to viral content
- Lesson 1584 — Image/Video Hosting: Problem Definition and Scale
- Read Committed
- You only see data that's been committed.
- Lesson 312 — Isolation Levels and Concurrent Transactions
- Read consistency
- All reads come from the leader (in basic configurations), guaranteeing you see the latest committed data
- Lesson 706 — Leaders and Followers
- Read from the leader
- For data a user modified, always route their reads to the primary replica (which has the latest write).
- Lesson 542 — Read-Your-Writes Consistency
- Read matching SSTables
- scan files that likely contain your data
- Lesson 429 — Read Path and Bloom Filters
- read operations
- (SELECT) to **replica databases**.
- Lesson 220 — Read-Write Splitting FundamentalsLesson 1118 — Per-Operation Timeout ConfigurationLesson 1548 — Read vs Write Path ArchitectureLesson 1636 — Capacity Estimation: Feed Reads vs Writes
- read path
- follows this sequence:
- Lesson 449 — Read Path and CompactionLesson 1548 — Read vs Write Path ArchitectureLesson 1562 — Content Compression and Encoding
- Read Path Characteristics
- Lesson 1548 — Read vs Write Path Architecture
- Read performance scales
- Multiple copies mean more servers can handle read requests simultaneously
- Lesson 68 — What is Data Replication?
- Read replicas
- create copies of your primary database that handle read-only queries.
- Lesson 200 — Why Replicate: Read ScalingLesson 202 — Why Replicate: Geographic DistributionLesson 1483 — Celebrity User ProblemLesson 1522 — Read-Heavy Workload and Database ScalingLesson 1535 — Multi-Region DeploymentLesson 1561 — Read Replicas for Retrieval ScalingLesson 1568 — Scheduled Cleanup Job Design
- read requests
- while rejecting writes
- Lesson 508 — Availability is Also a SpectrumLesson 1373 — Chain Replication
- Read scalability
- Add more followers to distribute read load
- Lesson 71 — Single-Leader Replication ModelLesson 198 — What is Database Replication?Lesson 209 — Read-After-Write Consistency ProblemLesson 352 — Redis Replication Architecture
- Read scaling
- – Add replicas to handle more read traffic without impacting write performance
- Lesson 1365 — Single-Leader Replication Topology
- read time
- , pull recent posts from celebrities the user follows and merge them with their pre-computed feed.
- Lesson 1640 — Celebrity Problem in Push ModelsLesson 1647 — Fanout-on-Read (Pull Model)
- Read timeout
- controls how long your client will wait to *receive* data from the server after the connection succeeds.
- Lesson 1089 — Read Timeout and Write Timeout
- Read Uncommitted
- Allows reading data that other transactions haven't committed yet ("dirty reads").
- Lesson 312 — Isolation Levels and Concurrent Transactions
- read-after-write consistency
- for individual users without forcing *all* reads to the primary, which would defeat the purpose of read replicas.
- Lesson 214 — Session-Based Routing for Read-After-WriteLesson 224 — Read-After-Write ConsistencyLesson 1671 — Real-Time Requirements for Social FeedsLesson 1678 — Read-After-Write Consistency
- Read-from-leader for own writes
- After a write, that client reads from the leader (not replicas) for a period of time or until replication catches up.
- Lesson 1390 — Read-Your-Writes Consistency
- read-heavy
- .
- Lesson 1545 — Traffic and Storage EstimationLesson 1558 — Read Path: Cache-Aside PatternLesson 1569 — CDN Integration for Paste Delivery
- Read-Heavy Access Patterns
- Lesson 290 — When to Denormalize
- Read-heavy celebrities
- Lesson 1483 — Celebrity User Problem
- Read-heavy workload
- Use W=N, R=1.
- Lesson 556 — Read and Write QuorumsLesson 1361 — Quorum-Based ReplicationLesson 1522 — Read- Heavy Workload and Database Scaling
- Read-heavy workloads
- Product catalogs, user profiles, dashboard metrics
- Lesson 124 — Database Query Result CachingLesson 147 — Least Frequently Used (LFU)Lesson 1524 — Cache-Aside Pattern for URL Lookups
- Read-Through + Write-Behind
- Lesson 139 — Combining Cache Patterns
- Read-Through + Write-Through
- Lesson 139 — Combining Cache Patterns
- Read-through caching
- simplifies this by making the cache layer itself handle missing data.
- Lesson 133 — Read-Through Caching Pattern
- Read-to-write ratio: 50,000:1
- (or more conservatively, 100:1 to 1000:1)
- Lesson 1636 — Capacity Estimation: Feed Reads vs Writes
- Read-your-writes
- (you always see your own updates)
- Lesson 541 — The Consistency SpectrumLesson 546 — Session ConsistencyLesson 1359 — Read-Your- Writes Consistency with ReplicasLesson 1392 — Monotonic Write Consistency
- Read-Your-Writes Consistency
- solves this by guaranteeing that once a client writes data, any subsequent reads *by that same client* will reflect that write (or a newer version).
- Lesson 542 — Read-Your-Writes ConsistencyLesson 543 — Monotonic Reads ConsistencyLesson 1390 — Read-Your-Writes Consistency
- Read-your-writes violation
- You can't see your own changes
- Lesson 1358 — Replication Lag in Async Systems
- Read:write ratio
- 100:1 (users view 200 photos/day)
- Lesson 33 — Putting It All Together: Worked Example
- Readable and Descriptive
- Lesson 1209 — Metric Naming Conventions
- Reading data
- `GET /user/456` — fetching repeatedly has no side effects
- Lesson 1006 — Natural Idempotency vs Engineered Idempotency
- Reads
- (SELECT queries) → Distributed across replicas
- Lesson 200 — Why Replicate: Read ScalingLesson 223 — Detecting Read vs Write QueriesLesson 349 — Redis In-Memory Storage ModelLesson 1362 — Chain ReplicationLesson 1636 — Capacity Estimation: Feed Reads vs Writes
- Reads (URL redirection)
- Lesson 1496 — Read-Heavy vs Write-Heavy Characteristics
- Reads and casual writes
- → **AP approach**
- Lesson 513 — Hybrid Approaches: Different Guarantees Per Operation
- Reads can fan out
- – Clients can read from leader or any replica, distributing query load
- Lesson 1365 — Single-Leader Replication Topology
- Reads dominate writes
- (mappings change rarely), so cache aggressively at clients.
- Lesson 1477 — Directory Service Architecture
- Reads may also block
- to prevent stale balance information
- Lesson 511 — Banking Systems: Consistency Over Availability
- Ready
- (initial state) — The participant is idle, ready to receive a prepare request
- Lesson 572 — Participant State Transitions
- Real example
- Cassandra is PA/EL (available during partitions, prioritizes low latency during normal operation).
- Lesson 514 — Beyond CAP: The Need for PACELC
- Real Scale
- Popular Pastebin services handle millions of pastes and hundreds of millions of reads, making partitioning, caching, and CDN strategies essential.
- Lesson 1542 — Pastebin System Overview
- Real-time accuracy critical
- → Skip caching or use very short invalidation windows
- Lesson 130 — Choosing the Right Caching Layer
- Real-time constraint
- If operation A completes before operation B begins (in wall-clock time), A must appear before B in that sequence
- Lesson 523 — Linearizability Defined
- Real-time dashboards
- Backend publishes metrics updates; dashboards subscribe for live data
- Lesson 357 — Redis Pub/Sub for Real-Time MessagingLesson 739 — Stream Processing Use Cases
- Real-time feel
- Active users get instant awareness through push notifications.
- Lesson 1679 — Hybrid Pull-Push Model
- Real-time metrics
- displaying request rates, latency, and error rates
- Lesson 846 — Control Plane: API and User Interface
- Real-time ranking
- means running your ranking model every time a user requests their feed.
- Lesson 1667 — Real-Time vs Precomputed Ranking
- Real-time verdict
- The API returns a risk score or classification (safe, suspicious, malicious)
- Lesson 1540 — Spam and Malicious Link Detection
- Real-time views
- – Recent data processed by the speed layer
- Lesson 750 — Lambda Architecture: Serving Layer
- Real-world analogy
- It's like having library branches that each manage their own inventory.
- Lesson 262 — Referential Integrity Across Shards
- Real-world applications
- Lesson 458 — Use Cases: Fraud Detection and Knowledge Graphs
- rebalance
- data when hotspots emerge (like the Celebrity Problem).
- Lesson 258 — Resharding and Data MigrationLesson 709 — Partition Assignment and RebalancingLesson 716 — Consumer Groups and Partition AssignmentLesson 1486 — Fixed vs Dynamic Partitioning
- Rebalancing detection
- to spot when partitions become skewed
- Lesson 1492 — Operational Complexity of Partitioning
- Recency
- (still important, just not the *only* factor)
- Lesson 1644 — Feed Personalization and Ranking RequirementsLesson 1665 — Feed Ranking FundamentalsLesson 1666 — Ranking Signals and FeaturesLesson 1755 — Relevance Tuning: Boosting and Signals
- Recommendation Engines
- Lesson 739 — Stream Processing Use Cases
- Recommendation systems
- traverse "user liked → movie → similar_to → movie" paths
- Lesson 458 — Use Cases: Fraud Detection and Knowledge Graphs
- Reconcile
- them using domain-specific logic (merge, pick latest timestamp, keep both, etc.
- Lesson 377 — Eventual Consistency and Application Reconciliation
- Reconcile asynchronously
- Background jobs detect and fix mismatches
- Lesson 583 — Alternative: Best Effort with Eventual Consistency
- Reconciliation Overhead
- The serving layer must merge results from both layers.
- Lesson 751 — Lambda Architecture Tradeoffs
- Record metadata
- The snapshot includes the last included index and term—the log position it represents
- Lesson 632 — Log Compaction: Snapshotting
- Record timeout exhaustion events
- when a request times out, log how much time was actually used
- Lesson 1106 — Timeout Propagation Observability
- Recovery
- If a consumer crashes or has a bug, replay from before the problem occurred and reprocess correctly.
- Lesson 695 — Stream Retention and ReplayLesson 1330 — What is Fault Tolerance?
- Recovery challenges
- requires careful design to handle failures
- Lesson 136 — Write-Behind (Write-Back) Caching Pattern
- Recovery mechanism
- Messages can be replayed after fixing issues
- Lesson 1705 — Retry and Dead Letter Queues
- Recovery Point Objective (RPO)
- is non-zero – You lose seconds/minutes of data
- Lesson 1356 — Asynchronous Replication: Speed and RiskLesson 1411 — Defining Recovery Point Objective (RPO)
- Recovery protocols
- Bringing a failed node back online requires careful state reconciliation
- Lesson 617 — Why Paxos Is Difficult in Practice
- Recovery time expectations
- How long does your dependency typically need to recover?
- Lesson 1059 — Timeout Windows and Reset Logic
- Recovery Time Objective (RTO)
- is the maximum acceptable amount of time your system can be down after a failure before business impact becomes unacceptable.
- Lesson 1412 — Defining Recovery Time Objective (RTO)
- Recursive queries
- Database queries that traverse parent relationships
- Lesson 939 — Permission Inheritance and Hierarchies
- Recycling short codes
- back into the pool after a grace period (prevents accidental reuse)
- Lesson 1532 — Expiration and Time-to-Live
- RED method
- (Rate, Errors, Duration) or **Four Golden Signals** helps focus on metrics that matter because each directly maps to user impact and system health.
- Lesson 1215 — Avoiding Vanity Metrics
- Redact sensitive fields
- Lesson 1131 — Logging Sensitive Data: Security Concerns
- Redeliver to different consumer
- Let another instance try
- Lesson 684 — Negative Acknowledgments and Redelivery
- Redelivery
- → Consumer nacks or doesn't acknowledge (**negative acknowledgments**)
- Lesson 687 — Dead Letter Queues
- Redelivery headers
- Metadata tracking attempt count and timestamps
- Lesson 684 — Negative Acknowledgments and Redelivery
- Redirect first
- Look up the destination URL in cache or database and return the HTTP 301/302 immediately
- Lesson 1530 — Analytics and Click Tracking
- Redirection
- Your application is pointed to the new primary
- Lesson 207 — Replica Promotion and Failover Basics
- Redis
- and **Memcached** are the two most popular distributed cache stores:
- Lesson 123 — Distributed Cache Layer (Redis/Memcached)Lesson 327 — Polyglot Persistence PatternLesson 976 — Rate Limiting State StorageLesson 1523 — Caching Layer ArchitectureLesson 1664 — Timeline Caching StrategiesLesson 1702 — User Preferences Lookup
- Redis (with Streams/Pub-Sub)
- offers lightweight messaging built into your caching layer.
- Lesson 665 — Overview of Message Broker Landscape
- Redis Clusters
- Lesson 1821 — Tenant Isolation in Redis
- Redis pipelining
- , you can batch operations like:
- Lesson 1811 — Batch Operations to Reduce Network Calls
- Redis Pub-Sub
- works well for ephemeral, low-latency messaging where losing occasional messages is acceptable.
- Lesson 1675 — Pub-Sub for Real-Time Distribution
- Redis Sentinel
- is a separate process (or cluster of processes) that acts as a watchdog for your Redis deployment.
- Lesson 353 — Redis Sentinel for High Availability
- Redis/Memcached
- Fast, in-memory key-value stores perfect for session data
- Lesson 59 — Externalizing State with Shared Storage
- Reduce configuration surface area
- Every config flag is a potential production incident
- Lesson 1315 — Simplicity as a Core Value
- Reduce fear
- of production incidents by making failure practice routine
- Lesson 1345 — Starting with Game Days
- Reduce human error
- Code executes consistently; humans make mistakes when tired or rushed
- Lesson 1308 — The SRE Philosophy: Treating Operations as Software
- Reduce Phase
- Lesson 1746 — Index Construction at Scale
- Reduced attack window
- Stolen access tokens are useless after minutes
- Lesson 915 — Token Expiration and Refresh Tokens
- Reduced availability
- If any replica is down, writes may fail or block
- Lesson 1355 — Synchronous Replication: Guarantees and Costs
- Reduced coordination
- No cross-team approval needed for BFF changes
- Lesson 906 — BFF Ownership and Team Structure
- Reduced correlated failures
- Regional power outages, natural disasters, or ISP issues affect one location but not geographically distant ones.
- Lesson 1429 — Geographic Backup Distribution
- Reduced database load
- batching multiple writes together
- Lesson 136 — Write-Behind (Write-Back) Caching PatternLesson 1529 — Preloading Hot URLs into Cache
- Reduced Downtime
- When you know something's wrong immediately, you can fix it faster.
- Lesson 1262 — What is Monitoring and Why It Matters
- Reduced Latency
- Read queries return faster because replicas aren't competing with write operations for CPU, memory, and disk I/O.
- Lesson 220 — Read-Write Splitting FundamentalsLesson 887 — API Composition and Aggregation
- Reduced load
- Distributed cache shields the database from repeated queries
- Lesson 143 — Multi-Tier Caching PatternLesson 168 — What is a CDN and Why Use ItLesson 843 — Control Plane: Service Discovery Integration
- Reduced origin load
- Most requests never reach your servers
- Lesson 125 — CDN as Edge Caching LayerLesson 179 — Origin Shield: Protecting Origin Servers
- Reduced replication lag risk
- At least one replica is guaranteed up-to-date
- Lesson 217 — Semi-Synchronous Replication Trade-offs
- Reduced resilience
- – Can't easily redirect traffic away from struggling nodes
- Lesson 982 — Sticky Sessions and Rate Limiting
- Reduced risk
- A bad deployment affects only one service, not everything
- Lesson 786 — Independent Deployability of Microservices
- Reduced throughput
- Limited by slowest node in the coordination group
- Lesson 526 — The Cost of Strong Consistency
- Reduces leader load
- – The root node handles far fewer replication connections
- Lesson 1374 — Tree Replication Topology
- Reduces wasted network calls
- when you know you'll be rejected
- Lesson 1789 — Client-Side vs Server-Side Rate Limiting
- Reducing notification fatigue
- Prevents spam from transient states
- Lesson 1713 — Provider-Side Deduplication
- Redundancy
- Multiple copies of critical components (recall active-active and active-passive patterns)
- Lesson 1330 — What is Fault Tolerance?Lesson 1335 — Failover Mechanisms
- Redundancy and Failover
- Lesson 196 — Multi-CDN Strategies
- Redundancy for failures
- Add extra servers so if one fails, the others can absorb its load.
- Lesson 28 — Server Count Estimation
- Redundant information
- If your framework already logs HTTP requests, don't duplicate it.
- Lesson 1129 — What to Log vs What Not to Log
- Redundant instances
- across multiple availability zones
- Lesson 1784 — Non-Functional Requirements: Latency and Availability
- Reference
- documents from other documents (similar to foreign keys, though not enforced)
- Lesson 382 — Document IDs and Primary Keys
- Reference counting
- is critical: only delete the physical file when *all* metadata references are gone.
- Lesson 1622 — Deduplication Strategies
- referencing
- .
- Lesson 292 — Embedding vs Referencing in DocumentsLesson 386 — Embedded Documents vs References
- Referential integrity
- is the guarantee that relationships between tables remain valid.
- Lesson 300 — Foreign Keys and Referential Integrity
- Referrer data
- Log the HTTP `Referer` header to see where traffic originates—social media, search engines, or direct links.
- Lesson 1583 — Analytics and Usage Metrics
- Refine alert thresholds
- If an alert fires 20 times per day but only matters once per week, adjust the threshold or add context filters.
- Lesson 1171 — Log Review and Alert Fatigue
- Refresh periodically
- as new data arrives and old data ages out
- Lesson 1117 — Adaptive Timeouts Based on Historical Latency
- Refresh token
- = your ID at the front desk (valid for your entire stay)
- Lesson 915 — Token Expiration and Refresh TokensLesson 926 — Access Tokens vs Refresh TokensLesson 946 — Token Refresh in Distributed Systems
- region
- (geo), then by **hash(customer_id)** within each region.
- Lesson 250 — Hybrid Sharding StrategiesLesson 435 — HBase Regions and Region Servers
- Regional Data Partitioning
- Each region owns specific user data exclusively
- Lesson 1435 — Multi-Region Architecture for DR
- Regional failover
- (if one region goes down, others remain)
- Lesson 53 — Geographic Distribution Benefits
- Regional fanout workers
- processing feed updates
- Lesson 1682 — Scaling to Billions of Daily Active Users
- Regional Network Load Balancer
- Layer 4 for regional deployments
- Lesson 114 — Cloud Load Balancers (GCP and Azure)
- regions
- and spreading them across multiple servers.
- Lesson 435 — HBase Regions and Region ServersLesson 1334 — Geographic Redundancy and Multi-Region
- RegionServer
- is a worker node that hosts multiple regions.
- Lesson 435 — HBase Regions and Region Servers
- Registers
- Last-write-wins registers using timestamps
- Lesson 538 — Conflict-Free Replicated Data Types (CRDTs)
- Registry validates compatibility
- The registry checks if the new schema is compatible with existing versions using rules (backward, forward, full compatibility)
- Lesson 725 — Schema Registry and Evolution
- Regular posts
- 30 seconds to 2 minutes is typically acceptable
- Lesson 1671 — Real-Time Requirements for Social Feeds
- Regulatory compliance
- Keep European users' data in European datacenters
- Lesson 117 — Global Server Load Balancing (GSLB)Lesson 317 — ACID vs BASE TradeoffsLesson 1452 — List-Based PartitioningLesson 1626 — Geolocation-Based StorageLesson 1685 — SMS Notifications
- reject requests
- rather than violate consistency guarantees.
- Lesson 526 — The Cost of Strong ConsistencyLesson 532 — Why Eventual Consistency ExistsLesson 1399 — Consistency Pattern Tradeoffs in Practice
- Reject the request immediately
- without calling downstream services.
- Lesson 1102 — Handling Zero or Negative Timeouts
- Rejected alternatives
- What did you consider but not choose, and why?
- Lesson 42 — Document Your Decisions
- Related data stays together
- When you query a user's timeline or a tenant's orders, all that data lives on one shard—no cross- shard joins needed.
- Lesson 244 — Entity-Based Sharding
- Related logs
- Deep links to log queries for the time window and affected hosts
- Lesson 1293 — Alert Context and Enrichment
- Relational approach
- Open the entire phone book, find your friend, then open it again to find *their* friends listed somewhere else.
- Lesson 476 — Graph Query Performance Characteristics
- relational database
- , data is split across normalized tables with rows connected through foreign keys.
- Lesson 381 — Documents vs Rows in Relational DatabasesLesson 476 — Graph Query Performance CharacteristicsLesson 934 — RBAC Implementation Patterns
- Relational database (PostgreSQL)
- Lesson 1819 — Per-Tenant Configuration Storage
- Relational databases excel at
- Lesson 1556 — Hybrid Storage: Metadata + Content References
- relationship
- .
- Lesson 298 — The Relational Model FoundationLesson 462 — Creating Nodes and Relationships
- Relationship Complexity Is High
- Lesson 459 — Graph vs Relational Trade-offs
- Relationship Depth Matters
- Lesson 459 — Graph vs Relational Trade-offs
- Relationship features
- how often you interact with this author
- Lesson 1668 — Machine Learning for Feed Ranking
- relationships
- (connections between them).
- Lesson 462 — Creating Nodes and RelationshipsLesson 1232 — Span Relationships and Hierarchy
- Relationships and Joins
- Lesson 346 — When Not to Use Key-Value Stores
- Relative duration
- "Keep this for 5 minutes from now"
- Lesson 156 — Time-Based Expiration (TTL)Lesson 1112 — HTTP Header-Based Propagation
- Relative timeout (alternative)
- Lesson 1112 — HTTP Header-Based Propagation
- Relatively static data
- Data that doesn't change every second
- Lesson 124 — Database Query Result Caching
- Release locks and connections
- to prevent resource exhaustion
- Lesson 1115 — Deadline Exceeded Error Handling
- Release resources immediately
- (connection pool slots, memory buffers, file handles)
- Lesson 1094 — Timeout Cancellation and Cleanup
- Relevance metrics
- Click-through rate (CTR), mean reciprocal rank (MRR), and normalized discounted cumulative gain (NDCG) tell you if top results match user intent
- Lesson 1779 — Search Analytics and Click Tracking
- reliability
- .
- Lesson 14 — Availability and Reliability RequirementsLesson 98 — What Are Health Checks?Lesson 646 — The Producer-Consumer ModelLesson 678 — At-Most-Once DeliveryLesson 804 — Network Latency and ReliabilityLesson 827 — What is a Service Mesh?Lesson 1322 — Availability vs Reliability: Key DifferencesLesson 1698 — Message Queue for Decoupling
- Reliability per service
- If Service A is down, its queue holds messages until it recovers—Service B is unaffected
- Lesson 663 — Hybrid Patterns: Topic + Queue
- Reliable
- Don't depend on another potentially failing service
- Lesson 1061 — Fallback StrategiesLesson 1402 — Full Backups
- Reliable ID reservation
- No race conditions on duplicate paste keys
- Lesson 1559 — Write Path: Synchronous vs Asynchronous Storage
- Remove old columns later
- in a separate migration phase
- Lesson 265 — Schema Changes in Sharded Environments
- Removing fields
- from responses: Clients expecting `user.
- Lesson 1905 — Breaking vs Non-Breaking Changes
- Removing or renaming endpoints
- Old URLs return 404 errors
- Lesson 1905 — Breaking vs Non-Breaking Changes
- Renaming fields
- `firstName` becoming `first_name` severs existing references
- Lesson 1905 — Breaking vs Non-Breaking Changes
- Rendering layer
- UI components that display notification badges and lists
- Lesson 1687 — In-App Notifications
- Repair outdated replicas
- (read-repair) by writing the latest value back to nodes with stale data
- Lesson 559 — Strong Consistency with Quorums
- Repeat
- Each iteration solves one major constraint, revealing the next challenge.
- Lesson 35 — Iterate Based on ConstraintsLesson 822 — The Strangler Fig Pattern for MigrationLesson 1732 — Crawling and Document Collection
- Repeatable Read
- Guarantees that re-reading the same rows returns identical data within your transaction, even if others commit changes.
- Lesson 312 — Isolation Levels and Concurrent Transactions
- Replace Subqueries with Joins
- Lesson 281 — Query Rewriting for Performance
- replay
- events by resetting their cursor backward
- Lesson 694 — Producers and ConsumersLesson 698 — Streaming vs Message QueuesLesson 754 — Event Log Replay in Kappa
- Replay capability
- Consumers can restart from any position in the stream
- Lesson 697 — Push vs Pull Consumption Models
- Replayability
- Consumers can rewind to any offset and re-process events
- Lesson 693 — The Commit Log Abstraction
- Replayable Event Log
- An immutable, persistent log (like Kafka) that stores all events and allows replay from any offset
- Lesson 752 — Kappa Architecture Overview
- Replica connection pool(s)
- Separate connection pools to one or more read replicas, used for SELECT queries
- Lesson 221 — Application-Level Connection Management
- replica databases
- Lesson 198 — What is Database Replication?Lesson 220 — Read-Write Splitting Fundamentals
- Replica processing time
- Replicas must apply the changes they receive.
- Lesson 208 — Replication Lag: What It Is and Why It Happens
- Replica promotion
- is the process of elevating one of your read replicas to become the new primary database when the original primary fails.
- Lesson 207 — Replica Promotion and Failover Basics
- Replica reads
- Route celebrity profile reads to dedicated read replicas with extra capacity.
- Lesson 257 — Celebrity Problem in Social Graphs
- Replicas receive updates
- – Followers pull or receive the changelog
- Lesson 1365 — Single-Leader Replication Topology
- Replicas serve reads
- → Answer queries from their local copy of data
- Lesson 199 — Primary-Replica Architecture
- Replicate hot content
- Copy popular files to regions with high demand
- Lesson 1631 — Multi-Region Replication Strategy
- Replicate the Auth Service
- Lesson 950 — Auth Service Single Point of Failure
- replicated log
- .
- Lesson 634 — etcd: Distributed Key-Value Store with RaftLesson 638 — Configuration Management with Consensus
- Replicated Session Storage
- Lesson 947 — Distributed Session Management
- Replication
- makes *copies* of the same data across nodes.
- Lesson 68 — What is Data Replication?Lesson 70 — Partitioning and Replication TogetherLesson 229 — What is Sharding?Lesson 299 — Primary Keys and Entity IntegrityLesson 360 — What Makes a Key-Value Store DistributedLesson 364 — Replication in Distributed Key-Value StoresLesson 424 — Replication Strategy and FactorLesson 713 — Kafka's Write Path and Durability (+4 more)
- Replication Factor
- How many copies for backup/availability?
- Lesson 25 — Storage Estimation BasicsLesson 364 — Replication in Distributed Key-Value StoresLesson 705 — Replication and Fault ToleranceLesson 1770 — Index Replication for Availability
- replication lag
- ), so reads from followers might be slightly stale.
- Lesson 71 — Single-Leader Replication ModelLesson 204 — Asynchronous Replication ExplainedLesson 208 — Replication Lag: What It Is and Why It HappensLesson 209 — Read-After-Write Consistency ProblemLesson 210 — Monotonic Reads GuaranteeLesson 352 — Redis Replication ArchitectureLesson 1356 — Asynchronous Replication: Speed and RiskLesson 1358 — Replication Lag in Async Systems (+2 more)
- Replication mode
- Asynchronous replication inherently creates lag because the primary doesn't wait for replicas to confirm receipt before completing the write.
- Lesson 208 — Replication Lag: What It Is and Why It Happens
- Replication per shard
- Each shard has its own replicas (covered earlier), so a single server failure doesn't kill the shard
- Lesson 266 — Shard Failure and Partial Outages
- replication strategy
- determines.
- Lesson 424 — Replication Strategy and FactorLesson 1435 — Multi-Region Architecture for DR
- representations
- (JSON, XML) of their current state
- Lesson 1874 — What REST Means: Resource-Oriented ArchitectureLesson 1902 — Content Negotiation with Media Types
- Reprocessing = replaying streams
- from earlier offsets
- Lesson 753 — Kappa Architecture: Single Processing Path
- Reproducing production environments locally
- becomes nearly impossible.
- Lesson 806 — Testing Complexity
- Reputation and Customer Trust
- Lesson 1420 — Business Impact Analysis for RPO/RTO
- Request and Response Transformation
- is the gateway's ability to rewrite, reshape, and adapt messages bidirectionally.
- Lesson 882 — Request and Response Transformation
- Request arrives
- at your service or API gateway (the PEP)
- Lesson 941 — Policy Decision Points (PDP) and Enforcement Points (PEP)
- Request attributes
- "Only GET requests allowed from external-facing services"
- Lesson 854 — Request-Level Authorization
- request body
- `{"idempotency_key": ".
- Lesson 1010 — Idempotency Keys for POST RequestsLesson 1546 — API Design for Paste Operations
- Request metadata
- enough details to verify it's the same request (method, endpoint, user ID)
- Lesson 1004 — Server-Side State for Idempotency
- Request routing
- to the correct backend service
- Lesson 870 — What is an API Gateway?Lesson 880 — Request Routing and Load Balancing
- Request template
- by ID and locale: `template_id="order_shipped"`, `locale="es"`
- Lesson 1701 — Template Service for Content
- Request timeout
- (sometimes called "read timeout" or "socket timeout") limits the total time for the entire request- response cycle *after* the connection is established.
- Lesson 1088 — Connection Timeout vs Request Timeout
- Request timeouts
- ensure calls don't hang indefinitely.
- Lesson 852 — Circuit Breaking at the Mesh Level
- Request Transformation
- happens before forwarding to backend services:
- Lesson 882 — Request and Response Transformation
- Request Validation
- Basic checks (malformed JSON, missing headers, invalid content types) happen at the gateway, keeping invalid traffic from consuming service resources.
- Lesson 876 — API Gateway as a Cross-Cutting Concern Hub
- request-driven services
- web servers, APIs, microservices, and RPC handlers.
- Lesson 1190 — The RED MethodLesson 1265 — RED Method: Rate, Errors, Duration
- Request-Reply
- Temporary queues for synchronous-style responses
- Lesson 671 — ActiveMQ and Traditional Enterprise Messaging
- Request-response
- You ask for a glass of water and wait until someone brings it
- Lesson 690 — What is Event Streaming?
- Request/response bodies by default
- Logging full payloads on every request kills performance and bloats storage.
- Lesson 1129 — What to Log vs What Not to Log
- Requests are processed
- (leak out) at a **constant rate**—say, 100 requests per second
- Lesson 965 — Leaky Bucket Algorithm
- Requests flow normally
- – Every call goes directly to the downstream service
- Lesson 1045 — The Three States: Closed
- Requests vs Limits
- Guaranteed minimums (requests) and hard caps (limits)
- Lesson 1072 — CPU and Memory Bulkheads: Resource Quotas
- RequestVote RPC
- (Remote Procedure Call) to all other servers.
- Lesson 621 — Leader Election: RequestVote RPC
- RequestVote RPCs
- , and the new leader has all committed entries.
- Lesson 634 — etcd: Distributed Key-Value Store with Raft
- Required external services
- Can we reach the payment API, authentication service, etc.
- Lesson 101 — Health Check Endpoints
- Required Features
- Lesson 119 — Choosing Load Balancer Technology
- Required fields
- Are all mandatory parameters present?
- Lesson 886 — Request ValidationLesson 1786 — API Contract: Request and Response Format
- Requires stable, unique ordering
- The keyset column must be indexed and provide a deterministic sort order
- Lesson 1890 — Keyset Pagination
- Reserve buffer time
- for network latency and your own processing (typically 10-20%)
- Lesson 1098 — Per-Hop Timeout Budgets
- Reserve keywords
- Block system routes like `admin`, `api`, `stats`, or future features.
- Lesson 1531 — Custom Aliases and Vanity URLs
- Reserve local processing time
- Subtract your expected work duration
- Lesson 1119 — Timeout Budget Management Across Service Chains
- Reservoir sampling
- Maintain a fixed-size sample from an unbounded stream
- Lesson 1217 — Sampling for Expensive Metrics
- resilience
- .
- Lesson 47 — Single Point of Failure in Vertical ScalingLesson 781 — What are Microservices?Lesson 793 — Fault Isolation and ResilienceLesson 1725 — Analytics Pipeline Architecture
- Resilience Improves Dramatically
- If one gateway instance crashes, requests simply flow to healthy instances.
- Lesson 878 — Stateless Gateway Design
- Resilience patterns
- like retries, timeouts, and circuit breaking
- Lesson 838 — Data Plane: Sidecar Proxy Pattern
- Resilience4j
- (the modern alternative).
- Lesson 1075 — Implementing Bulkheads in Practice: Hystrix and Resilience4j
- Resist over-engineering
- Build for today's scale, not imaginary future scale
- Lesson 1315 — Simplicity as a Core Value
- Resist that urge
- The best system designs always start with the simplest possible solution that satisfies the functional and non-functional requirements you've identified.
- Lesson 34 — Start Simple: The Minimum Viable Design
- Resolution
- → Fix code/data, then manually replay or discard messages
- Lesson 687 — Dead Letter QueuesLesson 1270 — Monitoring Resolution and Retention Tradeoffs
- Resolution approaches
- Lesson 1509 — Handling Hash Collisions
- Resolution ladder
- Create multiple sizes from the same source:
- Lesson 1601 — Video Transcoding Fundamentals
- Resolvers
- are the implementation functions that fulfill that contract by actually fetching or manipulating data.
- Lesson 1912 — GraphQL Schema and ResolversLesson 1913 — The N+1 Query Problem in GraphQL
- Resource
- What they tried to access (file ID, patient record, financial data)
- Lesson 944 — Auditing and Compliance for AuthorizationLesson 1874 — What REST Means: Resource- Oriented Architecture
- Resource allocation
- Budget far more servers and memory for read replicas and cache layers than for write handling.
- Lesson 1636 — Capacity Estimation: Feed Reads vs Writes
- Resource attributes
- Classification, owner, sensitivity level
- Lesson 935 — Attribute-Based Access Control (ABAC) Introduction
- Resource blocking
- The API server is tied up during the entire fanout process
- Lesson 1651 — Asynchronous Fanout Processing
- Resource constraints
- If a replica has slower CPU, less memory, or slower disks than the primary, it can't keep pace.
- Lesson 208 — Replication Lag: What It Is and Why It HappensLesson 1073 — Bulkhead Sizing: Balancing Isolation and UtilizationLesson 1255 — Adaptive SamplingLesson 1835 — Crawl Freshness Requirements
- Resource efficiency
- State storage remains bounded and predictable
- Lesson 1005 — Idempotency Time WindowsLesson 1679 — Hybrid Pull-Push Model
- resource exhaustion
- (rate limits, overload)
- Lesson 1021 — Immediate Retry vs Delayed RetryLesson 1086 — What Timeouts Are and Why They MatterLesson 1096 — Why Timeouts Must PropagateLesson 1342 — Testing Redundancy with Fault InjectionLesson 1640 — Celebrity Problem in Push Models
- Resource isolation
- Heavy read traffic won't starve write operations of connections
- Lesson 221 — Application-Level Connection ManagementLesson 1595 — Thumbnail and Preview Generation TriggerLesson 1790 — Multi-Tenancy Considerations
- Resource quotas
- act as bulkheads at the infrastructure level.
- Lesson 1072 — CPU and Memory Bulkheads: Resource Quotas
- Resource Tuning
- Properly sizing proxy CPU/memory limits prevents resource contention
- Lesson 841 — Data Plane: Performance and Latency Overhead
- Resource types
- Topics, consumer groups, clusters
- Lesson 727 — Kafka Security: Authentication and Encryption
- Resource waste
- Some servers sit idle while others are overwhelmed
- Lesson 1462 — The Uneven Distribution Problem
- Resources
- Database connections, queue workers, and write capacity get overwhelmed
- Lesson 1649 — The Celebrity Problem in FanoutLesson 1901 — Header-Based Versioning
- Respect crawl-delay
- Don't hammer servers with rapid-fire requests.
- Lesson 1831 — Robots.txt and Crawl Etiquette
- Respect quiet periods
- Aggregate multiple low-priority notifications before sending, preventing notification fatigue across all channels.
- Lesson 1689 — Multi-Channel Delivery
- Respect robots.txt
- Some sites block automated scraping
- Lesson 1538 — Link Preview and MetadataLesson 1826 — What is a Web Crawler
- Respecting `robots.txt`
- Lesson 1840 — Politeness Requirements for Web Crawling
- Response
- Returns the generated paste key (or confirms custom key) and full URL.
- Lesson 1546 — API Design for Paste Operations
- Response fields
- Lesson 1786 — API Contract: Request and Response Format
- Response time
- How quickly each server answers requests (milliseconds per request)
- Lesson 92 — Least Response Time Algorithm
- Response Transformation
- happens before returning to clients:
- Lesson 882 — Request and Response Transformation
- Responsibility
- Owns the protected data and grants permission to access it.
- Lesson 921 — OAuth2 Roles: Resource Owner, Client, Server
- REST wins on
- Lesson 1911 — GraphQL vs REST: Tradeoffs
- Restoration requirement
- All backups in order (Day 1 → Day 2 → Day 3 → Day 4)
- Lesson 1422 — Incremental Backup Strategy
- Result
- 200 requests in 2 seconds!
- Lesson 961 — Time Windows for Rate LimitsLesson 998 — What is Idempotency?Lesson 1085 — Preventing Cascades with Circuit Breakers and BulkheadsLesson 1711 — Idempotency Keys for Notifications
- Result consistency
- across retries (may return same response or updated metadata)
- Lesson 1008 — What Makes an API Idempotent
- Resume from failure point
- Execute only remaining steps
- Lesson 1016 — Idempotency for Multi-Step Operations
- Retention and Compliance
- requires storing logs immutably for regulatory periods (often 90 days to 7 years), with tamper- proof guarantees.
- Lesson 954 — Distributed Auth Audit Logging
- retention policies
- that automatically clean up old data:
- Lesson 711 — Message Retention and Log SegmentsLesson 1517 — Short URL Expiration and Reuse
- Retriable failures
- Timeouts, 503 Service Unavailable, network errors—suggest temporary downstream problems.
- Lesson 1057 — Failure Detection and Counting
- Retries
- If a request fails, the proxy can automatically retry it (with configurable backoff strategies) without the calling service needing retry logic.
- Lesson 839 — Data Plane: Proxy ResponsibilitiesLesson 1234 — Span Events and Logs
- Retries with Exponential Backoff
- Lesson 1656 — Fanout Failure Handling
- Retrieval
- Serve media quickly via direct links or embedded players
- Lesson 1584 — Image/Video Hosting: Problem Definition and Scale
- Retrieval tiers
- Choose speed vs cost (instant, 3-5 hours, 12+ hours)
- Lesson 1623 — Cold Storage and Archival
- Retrieve pending URLs
- The coordinator looks up which URLs were assigned to that worker (often stored in a "worker_id → URL_list" mapping in Redis or a database)
- Lesson 1866 — Worker Health Monitoring and Failover
- Retrieve template
- "Hola {name}, tu pedido {orderId} ha sido enviado"
- Lesson 1701 — Template Service for Content
- retry
- until you succeed?
- Lesson 596 — Forward Recovery vs Backward RecoveryLesson 1026 — Retry on Which ErrorsLesson 1027 — Idempotency Tokens in Retry Logic
- Retry after timeout
- See key `notif_abc123` already exists with status "sent"—skip sending, return success
- Lesson 1711 — Idempotency Keys for Notifications
- Retry amplification
- Clients retry after timeout, but the original request is still processing, doubling the load
- Lesson 1096 — Why Timeouts Must Propagate
- Retry and timeout policies
- How proxies should handle failures
- Lesson 842 — Control Plane: Configuration Management
- Retry complexity
- Operations frequently fail and need multiple attempts
- Lesson 654 — When to Use Async vs Sync
- Retry logic
- and timeouts
- Lesson 840 — Data Plane: Envoy Proxy FundamentalsLesson 1595 — Thumbnail and Preview Generation TriggerLesson 1780 — Distributed Query CoordinationLesson 1875 — HTTP Methods: GET, POST, PUT, DELETE Semantics
- retry storm
- synchronized retries that prevent the system from recovering.
- Lesson 1022 — Fixed Delay Retry StrategyLesson 1029 — Retry Budgets and Rate LimitingLesson 1030 — Combining Retries with Circuit Breakers
- Retry with timestamp
- Append the current timestamp milliseconds: `hash(url + timestamp)`.
- Lesson 1509 — Handling Hash Collisions
- Return
- Application closes/releases connection → Pool marks as idle → Connection rejoins the pool for reuse
- Lesson 270 — Connection Lifecycle in a PoolLesson 355 — Redis as a Cache
- Return only documents
- where positions satisfy this adjacency constraint
- Lesson 1751 — Phrase Queries and Positional Indexes
- Return partial results
- (show 10 products instead of 100)
- Lesson 1083 — Graceful Degradation Strategies
- Return rendered content
- "Hola Carlos, tu pedido 54321 ha sido enviado"
- Lesson 1701 — Template Service for Content
- Return the final result
- Whether fresh or from the previous attempt
- Lesson 1016 — Idempotency for Multi-Step Operations
- Return to frontier
- Those URLs are pushed back into the distributed URL frontier as if never assigned
- Lesson 1866 — Worker Health Monitoring and Failover
- Returns
- the connection to the pool after the query completes (not closing it)
- Lesson 267 — What is Connection PoolingLesson 1907 — Gateway-Level Version Routing
- Revenue Impact
- Lesson 1420 — Business Impact Analysis for RPO/RTO
- Revenue per request
- Direct financial impact of each transaction
- Lesson 1196 — Business vs Technical Metrics
- Revocation becomes possible
- Store refresh tokens in a database; you can invalidate them immediately
- Lesson 915 — Token Expiration and Refresh Tokens
- Revocation Lists (Deny Lists)
- Maintain a shared, fast-lookup store (Redis, Memcached) containing revoked token IDs (the `jti` claim).
- Lesson 948 — Token Revocation at Scale
- Revocation Speed
- Stateless tokens can't be instantly invalidated without introducing state
- Lesson 947 — Distributed Session Management
- Revokes certificates
- – Invalidates compromised or outdated certificates instantly
- Lesson 844 — Control Plane: Certificate Management
- Riak
- implemented Dynamo's design nearly verbatim, including vector clocks for conflict resolution and active anti-entropy.
- Lesson 378 — Dynamo's Influence on Modern SystemsLesson 521 — PACELC Tradeoffs in Real Systems
- Rich data structures
- Can efficiently support lists, sets, sorted sets in memory
- Lesson 349 — Redis In-Memory Storage Model
- Rich Observability
- Built-in support for metrics, logging, and distributed tracing out of the box
- Lesson 840 — Data Plane: Envoy Proxy Fundamentals
- Risk
- If your consumer crashes mid-processing, the message is already gone from the queue— **message loss** occurs.
- Lesson 683 — Consumer Acknowledgment TimingLesson 1102 — Handling Zero or Negative Timeouts
- Risk acceptance
- Very late retries (beyond the window) might re-execute
- Lesson 1005 — Idempotency Time Windows
- Risk of stale data
- Since your application code updates the database but not the cache immediately, the cache can serve outdated information until expiration (TTL) or manual invalidation.
- Lesson 132 — Cache-Aside: Pros and Cons
- Risks
- Lesson 136 — Write-Behind (Write-Back) Caching PatternLesson 1363 — Statement-Based vs Row-Based Replication
- Role-Based Access Control (RBAC)
- groups permissions into **roles**, then assigns users to those roles.
- Lesson 933 — Role-Based Access Control (RBAC) FundamentalsLesson 1160 — Security and Access Control for Logs
- rollback
- the database uses those logs to undo every change made during the transaction, restoring the data to its pre-transaction state.
- Lesson 304 — Transaction Atomicity in PracticeLesson 1303 — Incident Mitigation vs Fix
- Rollback complexity
- If migration fails halfway, how do you recover?
- Lesson 258 — Resharding and Data Migration
- Rollback Plans
- Lesson 1299 — Runbooks and Playbooks
- Rollback simplicity
- Revert just the problematic service, not the entire system
- Lesson 786 — Independent Deployability of Microservices
- Rolling windows
- Last 30 days, last 7 days, last hour
- Lesson 1274 — SLI Measurement Windows and AggregationLesson 1277 — SLO Time Windows: Rolling vs Calendar
- root cause
- and implement a permanent solution.
- Lesson 1303 — Incident Mitigation vs FixLesson 1304 — Blameless PostmortemsLesson 1352 — Postmortem Structure and Action Items
- Root cause analysis
- The combination of error logs, context, and timing lets you pinpoint why failures occurred.
- Lesson 1127 — What is Logging and Why It MattersLesson 1350 — What is a Postmortem?
- root hash
- represents the entire dataset
- Lesson 369 — Anti-Entropy and Merkle TreesLesson 376 — Anti-Entropy with Merkle Trees
- root span
- (no parent) represents the entry point of your request
- Lesson 1232 — Span Relationships and HierarchyLesson 1239 — Root Span and Entry Points
- Rotate refresh tokens
- Issue new refresh tokens on each use and invalidate old ones
- Lesson 931 — OAuth2 Security Best Practices
- Rotates certificates
- – Automatically refreshes certificates before expiration (often every few hours or days) without service downtime
- Lesson 844 — Control Plane: Certificate Management
- Rotating certificates
- automatically before expiration (often every 24 hours)
- Lesson 851 — Mutual TLS (mTLS) Authentication
- Rotation fairness
- Distribute difficult shifts equitably; avoid perpetually assigning nights to junior engineers
- Lesson 1297 — On-Call Fundamentals and Rotation Models
- Round Robin
- would send every 5th customer to desk #5, regardless of whether that desk is still helping someone from 30 minutes ago
- Lesson 87 — Least Connections AlgorithmLesson 96 — Algorithm Selection TradeoffsLesson 98 — What Are Health Checks?
- Round-Robin
- Lesson 226 — Load Distribution Across ReplicasLesson 716 — Consumer Groups and Partition AssignmentLesson 880 — Request Routing and Load Balancing
- Round-Robin Partitioning
- Lesson 703 — Partitioning Strategies and Key Selection
- Route
- requests for that feature to the new service via a proxy/gateway
- Lesson 822 — The Strangler Fig Pattern for Migration
- Route Optimization
- Instead of your request traveling the public internet's unpredictable path to the origin server, the CDN routes it through its private, optimized backbone network.
- Lesson 186 — Dynamic Content Acceleration
- Route the request
- Direct `INCR`, Lua scripts, or sliding window operations to that specific node
- Lesson 1806 — Rate Limiting with Consistent Hashing
- Routers
- automatically forward it to the "nearest" server based on BGP routing metrics
- Lesson 176 — Geographic Routing and Anycast
- Routing
- The proxy determines where traffic should go based on rules from the control plane.
- Lesson 839 — Data Plane: Proxy ResponsibilitiesLesson 1295 — Testing Alerts and Dry Runs
- Routing logic
- Clients or proxy layers route requests to the correct node
- Lesson 360 — What Makes a Key-Value Store Distributed
- routing rules
- .
- Lesson 666 — RabbitMQ Architecture FundamentalsLesson 842 — Control Plane: Configuration Management
- Row key
- Unique identifier for each row
- Lesson 410 — What is a Wide-Column Store?Lesson 413 — Row Keys and Clustering
- Row keys
- identify entities (like user IDs)
- Lesson 444 — Data Model: Sparse, Distributed, Multi-Dimensional Map
- rows
- .
- Lesson 231 — Vertical Partitioning vs Horizontal PartitioningLesson 298 — The Relational Model FoundationLesson 411 — Column Families and Super Columns
- RPO
- (Recovery Point Objective) and **RTO** (Recovery Time Objective) targets, you're essentially promising faster recovery with less data loss.
- Lesson 1413 — The Cost-Availability Tradeoff
- RPO and RTO targets
- , backup procedures, failover mechanisms, and infrastructure redundancy.
- Lesson 1433 — Disaster Recovery vs Business Continuity
- RPO approaches zero
- You cannot afford to lose even minutes of data
- Lesson 1427 — Continuous Data Protection
- RPO requirement
- to replication mode, then verify distance works with latency constraints.
- Lesson 1439 — Data Replication for DR
- RPO Zero
- means that your Recovery Point Objective is literally zero — you cannot afford to lose *any* data, not even one millisecond's worth of writes.
- Lesson 1414 — RPO Zero: Synchronous ReplicationLesson 1415 — Near-Zero RPO with Asynchronous Replication
- RTO
- (Recovery Time Objective) targets, you're essentially promising faster recovery with less data loss.
- Lesson 1413 — The Cost-Availability TradeoffLesson 1417 — Hot Standby vs Cold Standby
- RTO = hours
- Could rely on manual restoration from backups
- Lesson 1412 — Defining Recovery Time Objective (RTO)
- RTO = minutes
- May use active-passive with quick automated failover
- Lesson 1412 — Defining Recovery Time Objective (RTO)
- RTO = seconds
- Requires active-active multi-region setups with automatic failover
- Lesson 1412 — Defining Recovery Time Objective (RTO)
- Rule of thumb
- More connections doesn't mean better performance.
- Lesson 275 — Common Pooling Anti-Patterns
- Runaway storage
- Storage growing faster than historical rate
- Lesson 1574 — Monitoring Expiration and Storage Health
- runbook
- a step-by-step guide attached to each alert.
- Lesson 1287 — Actionability: Every Alert Needs a RunbookLesson 1441 — Runbooks and Automation
- Runbook clarity
- Can someone follow your runbook without confusion?
- Lesson 1295 — Testing Alerts and Dry Runs
- Runbooks
- with new context from this incident
- Lesson 1296 — Post-Incident Alert ReviewLesson 1297 — On-Call Fundamentals and Rotation ModelsLesson 1299 — Runbooks and Playbooks
S
- Sacrifice availability
- (reject requests until partition heals) to maintain consistency (CP choice)
- Lesson 506 — CAP in Normal Operation vs Partition
- Sacrifice consistency
- (allow divergent data) to remain available (AP choice)
- Lesson 506 — CAP in Normal Operation vs Partition
- Safer failover
- You have a known-good replica for promotion
- Lesson 217 — Semi-Synchronous Replication Trade-offs
- safety
- and **liveness**.
- Lesson 604 — Safety vs Liveness PropertiesLesson 609 — Paxos Safety and Liveness GuaranteesLesson 615 — Handling Conflicts and PreemptionLesson 618 — Raft Overview: Understandability as a Design GoalLesson 651 — Message Durability
- Safety guarantees
- so two leaders never exist simultaneously
- Lesson 636 — Consensus for Leader Election
- Safety window
- Protects against common retry scenarios (network blips, client crashes, reasonable user behavior)
- Lesson 1005 — Idempotency Time Windows
- Saga
- splits a distributed transaction into a sequence of **local transactions**, where each local transaction updates data within a single service.
- Lesson 585 — Alternative: Saga Pattern IntroductionLesson 588 — The Saga Pattern: Motivation and DefinitionLesson 589 — Saga Fundamentals: Local Transactions and Compensations
- Same-thread execution
- Request runs on the caller's thread, reducing context switches
- Lesson 1070 — Semaphore-Based Bulkheads: Limiting Concurrent Requests
- Sample aggressively
- Log 1 in every 100 successful requests, but capture all errors—you'll still get actionable insights without drowning in data.
- Lesson 1170 — Performance Impact of Logging
- Sample more aggressively
- for high-cardinality scenarios (as covered in your sampling lessons)
- Lesson 1258 — Cardinality Explosion
- Sample size
- How many requests to allow through Half-Open (small, like 5–10)
- Lesson 1052 — Circuit Breaker Reset Logic
- Sample when necessary
- For debugging scenarios, sample high-cardinality data (e.
- Lesson 1210 — Cardinality Management
- Sampling
- Log only 1% of successful requests, but always log errors.
- Lesson 1133 — Logging Performance ImpactLesson 1135 — Log Retention and Volume ManagementLesson 1157 — Log Sampling and FilteringLesson 1164 — Sampling for High-Volume LogsLesson 1167 — Avoid Log ExplosionLesson 1252 — Sampling Strategies OverviewLesson 1505 — Analytics and Tracking Requirements
- Sampling decisions
- determine whether a trace will be collected or discarded.
- Lesson 1238 — Span Sampling Decisions
- Sanitize before logging
- by running all log data through filters that detect and mask patterns like credit card numbers or tokens.
- Lesson 1131 — Logging Sensitive Data: Security Concerns
- Saturation
- Resource utilization like connection pool usage
- Lesson 856 — Observability: Metrics CollectionLesson 1189 — The USE MethodLesson 1263 — Four Golden Signals: Latency, Traffic, Errors, SaturationLesson 1264 — USE Method: Utilization, Saturation, Errors
- Saturation percentage
- (not raw resource usage) → indicates when to scale
- Lesson 1215 — Avoiding Vanity Metrics
- scalability
- and **performance** requirements.
- Lesson 24 — Peak Traffic MultipliersLesson 70 — Partitioning and Replication TogetherLesson 168 — What is a CDN and Why Use ItLesson 646 — The Producer-Consumer ModelLesson 699 — Event Streaming Platform RequirementsLesson 744 — Stream Processing FrameworksLesson 766 — Apache Airflow FundamentalsLesson 781 — What are Microservices? (+7 more)
- Scalability bottleneck
- The directory can become a performance choke point
- Lesson 242 — Directory-Based Sharding
- scalability limits
- .
- Lesson 322 — Transaction Requirements and Trade-offsLesson 526 — The Cost of Strong Consistency
- Scale
- Handle thousands of ephemeral containers
- Lesson 1169 — Centralized vs Localized LoggingLesson 1251 — Choosing a Tracing SystemLesson 1584 — Image/Video Hosting: Problem Definition and ScaleLesson 1730 — What is a Search Engine?Lesson 1732 — Crawling and Document Collection
- Scale horizontally
- when you hit hardware limits or need redundancy
- Lesson 52 — Hybrid Scaling StrategiesLesson 443 — BigTable Overview and Motivation
- Scale independently
- relational metadata database can be optimized differently than object storage
- Lesson 1590 — Metadata Database Design
- Scale operations effort sublinearly
- One automation handles thousands of instances
- Lesson 1308 — The SRE Philosophy: Treating Operations as Software
- Scale Requirements
- Lesson 119 — Choosing Load Balancer TechnologyLesson 826 — Decision Framework for Microservices AdoptionLesson 901 — Choosing the Right API Gateway Technology
- Scale-Down Trigger
- If queue depth drops below a minimum (e.
- Lesson 1872 — Dynamic Scaling Based on Queue Depth
- Scale-Up Trigger
- If queue depth exceeds a threshold (e.
- Lesson 1872 — Dynamic Scaling Based on Queue Depth
- Scales fan-out
- – Can reach thousands of nodes without overwhelming the leader
- Lesson 1374 — Tree Replication Topology
- Scaling
- | Manual shard splitting/merging | Add brokers, reassign partitions |
- Lesson 728 — AWS Kinesis Overview
- Scaling challenges
- Vertical scaling (more RAM) hits limits faster than horizontal disk scaling
- Lesson 349 — Redis In-Memory Storage Model
- Scaling characteristics
- reveal how performance changes with growth.
- Lesson 677 — Message Broker Performance Characteristics
- Scaling complications
- – Adding/removing nodes disrupts existing session assignments
- Lesson 982 — Sticky Sessions and Rate Limiting
- Scaling costs
- Write-heavy workloads are harder and more expensive to scale than read-heavy ones
- Lesson 296 — Write Amplification Costs
- Scaling Limitations
- You can't scale individual features independently.
- Lesson 785 — When Monoliths Become Problematic
- Scan operations
- Sequential reads are fast because data is ordered
- Lesson 1451 — Range-Based Partitioning
- Scenario
- Proposer A starts with proposal number 10, while Proposer B simultaneously starts with proposal number 15.
- Lesson 615 — Handling Conflicts and Preemption
- Scenario A (Real-World)
- You have months, a team of specialists, detailed requirements from the homeowner, and the ability to research materials, consult experts, and iterate on your design multiple times.
- Lesson 5 — Real-World vs Interview System Design
- Scenario B (Interview)
- You have 45 minutes, minimal information about what the homeowner wants, no internet access, and you must sketch the entire house from foundation to roof while explaining your reasoning out loud.
- Lesson 5 — Real-World vs Interview System Design
- Schedule dry runs
- Monthly or quarterly, simulate an incident and walk through the response
- Lesson 1295 — Testing Alerts and Dry Runs
- Schedule regular restore drills
- Monthly or quarterly full restoration tests
- Lesson 1408 — Backup Verification and TestingLesson 1430 — Backup Verification and Testing
- Scheduled maintenance
- is planned downtime where you intentionally take systems offline for upgrades, patches, or infrastructure changes.
- Lesson 1328 — Scheduled Maintenance and Availability Accounting
- Scheduled Messages
- let you enqueue a message now but delay its availability until a future timestamp.
- Lesson 675 — Azure Service Bus Features
- Scheduled warming
- runs during low-traffic periods (like 3 AM) to refresh or preload data without impacting peak users.
- Lesson 140 — Cache Warming Strategies
- schema
- ensures every book entry has those exact fields
- Lesson 298 — The Relational Model FoundationLesson 1912 — GraphQL Schema and Resolvers
- Schema enforcement
- Only data matching the defined schema gets in
- Lesson 764 — Data Governance and Quality
- Schema evolution
- When requirements change, just start writing documents with new fields.
- Lesson 380 — Document Structure and Schema Flexibility
- Schema flexibility
- Mobile apps evolve rapidly.
- Lesson 404 — Mobile and IoT Backend StorageLesson 1138 — JSON as Log FormatLesson 1721 — Preference Storage Strategy
- Schema ID embedding
- Messages include a small schema ID instead of the full schema, saving bandwidth
- Lesson 725 — Schema Registry and Evolution
- schema-on-read
- you decide how to interpret the data only when you actually read and analyze it.
- Lesson 758 — Data Lake FundamentalsLesson 759 — Schema-on-Write vs Schema-on-ReadLesson 1154 — Alternative: Splunk Architecture
- Schema-on-Read advantages
- Lesson 759 — Schema-on-Write vs Schema-on-Read
- Schema-on-Read disadvantages
- Lesson 759 — Schema-on-Write vs Schema-on-Read
- schema-on-write
- you define the structure *before* loading data.
- Lesson 757 — Data Warehouse FundamentalsLesson 759 — Schema-on-Write vs Schema-on-Read
- Schema-on-Write advantages
- Lesson 759 — Schema-on-Write vs Schema-on-Read
- Schema-on-Write disadvantages
- Lesson 759 — Schema-on-Write vs Schema-on-Read
- Scope
- Local cache is instance-specific; distributed cache is shared across all instances
- Lesson 143 — Multi-Tier Caching PatternLesson 859 — Rate Limiting at Service BoundariesLesson 1634 — Feed Scope: What Content to Show
- Scoped appropriately
- (per user, per account, etc.
- Lesson 1036 — Request Token Generation and Management
- Scopes
- are string identifiers that represent specific permissions.
- Lesson 930 — OAuth2 Scopes and Consent
- Scoping boundaries
- Lesson 1346 — Blast Radius and Safety Controls
- scoring function
- and potentially ML models
- Lesson 1644 — Feed Personalization and Ranking RequirementsLesson 1761 — Scoring and Ranking Suggestions
- SCRAM
- Salted Challenge Response (more secure than PLAIN)
- Lesson 727 — Kafka Security: Authentication and Encryption
- Scrape Interval Matters
- Most monitoring systems (like Prometheus) scrape metrics periodically.
- Lesson 1187 — Rate Calculations from Counters
- Scraper
- Pulls metrics from configured endpoints at fixed intervals (e.
- Lesson 1198 — Prometheus Architecture and Data Model
- Scribe
- Documents everything in real-time—timeline of events, decisions made, actions taken.
- Lesson 1300 — Incident Command System (ICS)
- Scrubbing and redaction
- should happen *before* data reaches your logging pipeline.
- Lesson 1160 — Security and Access Control for Logs
- Search engine
- If personalized ranking fails, fall back to generic relevance scoring
- Lesson 1336 — Graceful DegradationLesson 1730 — What is a Search Engine?
- Search engines
- use knowledge graphs to understand entities and context
- Lesson 458 — Use Cases: Fraud Detection and Knowledge GraphsLesson 1826 — What is a Web Crawler
- Search HFiles on disk
- – If the data isn't in memory, HBase must check potentially many immutable HFiles created by previous flushes
- Lesson 437 — HBase Read Path and Bloom Filters
- Searchability
- Fast queries for forensic investigations
- Lesson 944 — Auditing and Compliance for AuthorizationLesson 1169 — Centralized vs Localized Logging
- Second Normal Form (2NF)
- Remove partial dependencies—non-key attributes depend on the entire primary key
- Lesson 302 — Normalization Fundamentals
- Second read
- Hits Replica A (only caught up through transaction #148)
- Lesson 1360 — Monotonic Reads Across Replicas
- Second retry
- Wait 200ms
- Lesson 1564 — Retrieval Error Handling and FallbacksLesson 1695 — Fallback and Retry Logic
- Secure Token Storage
- Tokens stay server-side, never exposed to the browser or user device.
- Lesson 922 — Authorization Code Flow
- Security
- mutual TLS encryption between services
- Lesson 827 — What is a Service Mesh?Lesson 1789 — Client-Side vs Server-Side Rate LimitingLesson 1894 — Sorting Query Parameters
- Security policies
- Which services can talk to which, mTLS settings, authentication rules
- Lesson 842 — Control Plane: Configuration Management
- Security requirements
- demanding a single enforcement point before internal systems
- Lesson 879 — When to Introduce an API Gateway
- Security requirements matter
- High-security operations (financial transactions, admin actions) may skip caching entirely.
- Lesson 951 — Caching Authorization Decisions
- Seed URLs
- Start with a known list (popular sites, sitemaps, user submissions)
- Lesson 1732 — Crawling and Document CollectionLesson 1828 — Seed URLs and Starting Point
- Segmentation
- Each quality version is split into small chunks (~2-10 seconds)
- Lesson 1602 — Adaptive Bitrate Streaming (ABR)
- segments
- smaller, manageable files on disk.
- Lesson 711 — Message Retention and Log SegmentsLesson 1612 — Adaptive Bitrate Streaming (ABR)
- Segments users
- by behavior patterns (morning openers, evening readers, weekday vs weekend responders)
- Lesson 1729 — Analytics-Driven Optimization
- Selection
- A healthy replica is chosen (usually the one with the most up-to-date data)
- Lesson 207 — Replica Promotion and Failover Basics
- Selection logic
- An algorithm to pick one server from the list (round-robin, random, least-connections, etc.
- Lesson 83 — Client-Side Load Balancing
- Selective Application
- Apply heavyweight idempotency only where consequences are severe (payments, orders).
- Lesson 1042 — Idempotency vs Performance Tradeoffs
- Selective Broadcasting
- The WebSocket gateway subscribes to relevant channels and pushes updates only to connected users who should see that content (based on their follow graph).
- Lesson 1672 — WebSocket Architecture for Live Updates
- Selective fan-out
- Only push to highly engaged followers; others get pull-based delivery.
- Lesson 1640 — Celebrity Problem in Push Models
- Selective push
- means making smart choices about *who* gets *what* updates *when*, based on user activity, relationship strength, and content importance.
- Lesson 1677 — Selective Push Strategies
- Selective Service Meshing
- Apply the mesh pattern only to critical services that truly benefit—perhaps those handling payments, authentication, or high-value transactions.
- Lesson 869 — Alternatives to Full Service Mesh
- Self-contained requests
- Each request includes everything needed to process it—authentication tokens, user IDs, necessary parameters
- Lesson 55 — What Makes a Service Stateless
- Self-Documenting
- Field names are explicit.
- Lesson 1138 — JSON as Log FormatLesson 1910 — GraphQL Fundamentals and Query Language
- Self-healing
- No manual intervention needed after recovery
- Lesson 375 — Sloppy Quorum and Hinted HandoffLesson 1475 — Dynamic Range Splitting
- Self-healing automation
- system detects and recovers without human intervention
- Lesson 1441 — Runbooks and Automation
- Self-hosted
- RabbitMQ, NATS, Redis when you need control or hybrid deployments
- Lesson 676 — Choosing Between Message Broker TechnologiesLesson 900 — Open-Source vs Managed Gateway Tradeoffs
- Self-hosted gateways
- require you to manage infrastructure, monitoring, scaling, patching, and high availability.
- Lesson 900 — Open-Source vs Managed Gateway Tradeoffs
- Self-managed complexity
- You handle updates, monitoring, and troubleshooting (unless using managed services)
- Lesson 108 — Hardware vs Software Load Balancers
- Semantic Lock Pattern
- places an application-level flag or status field on data that a saga is currently processing.
- Lesson 595 — Semantic Lock Pattern
- Semaphore Bulkheads
- Limit concurrent calls without separate threads
- Lesson 1075 — Implementing Bulkheads in Practice: Hystrix and Resilience4j
- Semaphore-based bulkheads
- offer a lighter alternative.
- Lesson 1070 — Semaphore-Based Bulkheads: Limiting Concurrent Requests
- Semi-synchronous replication
- sits right in the middle—it requires *at least one* replica to acknowledge the write before confirming success to the client.
- Lesson 205 — Semi-Synchronous Replication
- SendGrid
- , **Amazon SES**, or **Mailgun** that handle the heavy lifting:
- Lesson 1686 — Email Notifications
- Sending notifications
- Each retry sends another email
- Lesson 1006 — Natural Idempotency vs Engineered Idempotency
- Sensitive data
- Never log passwords, credit card numbers, PII, or API keys.
- Lesson 1129 — What to Log vs What Not to Log
- Sent
- The notification left your system and was handed off to the channel provider (APNs, FCM, Twilio, SendGrid, etc.
- Lesson 1724 — Notification Analytics Events
- Sentinel
- (Java): Alibaba's comprehensive resilience framework
- Lesson 1062 — Circuit Breaker Libraries and Frameworks
- Separate Databases
- Lesson 1821 — Tenant Isolation in Redis
- Separate deployment pipelines
- Each service has its own build, test, and release process
- Lesson 791 — Independent Deployability
- Separate Routers, Shared Services
- Lesson 1904 — Maintaining Multiple API Versions
- Separate service pools
- Direct high-volume tenants to dedicated rate limiter nodes
- Lesson 1823 — Hot Tenant Problem
- Separate storage tier
- Move celebrity profiles to a specialized datastore optimized for read-heavy workloads.
- Lesson 257 — Celebrity Problem in Social Graphs
- Sequence Numbers (Position-Based)
- Lesson 212 — Measuring Replication Lag
- sequential consistency
- and **linearizability** are strong consistency models, but they differ in one critical way: **real- time guarantees**.
- Lesson 524 — Sequential Consistency vs LinearizabilityLesson 541 — The Consistency Spectrum
- Sequential processing per host
- Each hostname queue is processed by one worker at a time
- Lesson 1841 — Single-Host Queue Pattern
- Sequential scans
- When someone runs a large report that reads thousands of records once, those records flood the cache and evict frequently-used data
- Lesson 151 — LRU-K and Advanced LRU Variants
- Serializable
- Strongest isolation—transactions execute as if they ran one after another, with no concurrency at all.
- Lesson 312 — Isolation Levels and Concurrent Transactions
- Serve cached data
- instead of fresh queries when read limits are hit.
- Lesson 963 — Graceful Degradation with Rate Limits
- server
- (often through a **load balancer**).
- Lesson 6 — Components of a System Design SolutionLesson 111 — NGINX as a Load BalancerLesson 673 — NATS and Lightweight Messaging
- Server A
- Weight = 3, Current connections = 6 → Score = 6/3 = **2**
- Lesson 88 — Weighted Least Connections
- Server B
- Weight = 1, Current connections = 1 → Score = 1/1 = **1**
- Lesson 88 — Weighted Least Connections
- Server counts
- – 847 servers → 850 or even 1,000 (you'll add buffer capacity anyway)
- Lesson 32 — Rounding and Approximation Techniques
- Server degradation
- → Switch to Least Response Time to route around slow servers
- Lesson 97 — Dynamic Algorithm Selection
- Server looks up session
- by ID to verify authentication and retrieve user context
- Lesson 909 — Session-Based Authentication Fundamentals
- Server pushes
- Resolver sends update through WebSocket to all subscribed clients
- Lesson 1915 — GraphQL Subscriptions for Real-Time Data
- Server registers
- Maps subscription to event stream (e.
- Lesson 1915 — GraphQL Subscriptions for Real-Time Data
- Server Response
- Lesson 1882 — Content Negotiation and Accept Headers
- Server restarts
- store in a database or distributed cache, not just memory
- Lesson 1004 — Server-Side State for Idempotency
- Server sends session ID
- to the client as an HTTP cookie
- Lesson 909 — Session-Based Authentication Fundamentals
- Server state required
- Every instance needs access to shared session storage
- Lesson 916 — Session vs Token Tradeoffs
- Server stickiness
- Users must return to the same server that has their session
- Lesson 356 — Redis as a Session Store
- Server stores session
- in memory, Redis, or a database with a unique session ID
- Lesson 909 — Session-Based Authentication Fundamentals
- Server validates
- credentials against the database
- Lesson 909 — Session-Based Authentication Fundamentals
- Server verification
- Server hashes the received verifier and compares it to the stored challenge
- Lesson 923 — PKCE: Proof Key for Code Exchange
- server-side
- (every keystroke triggers a backend query), or use a **hybrid approach**.
- Lesson 1762 — Client-Side vs Server-Side TypeaheadLesson 1789 — Client-Side vs Server-Side Rate Limiting
- Server-side enforcement
- is critical when:
- Lesson 1123 — Client-Side vs Server-Side Timeout Enforcement
- Server-side rendering
- processes syntax highlighting during paste creation, storing pre-rendered HTML with CSS classes.
- Lesson 1575 — Syntax Highlighting and Language Detection
- Server-side timeout
- The maximum time a server allows itself to process a request before abandoning work
- Lesson 1090 — Client-Side vs Server-Side Timeouts
- Server-side timeouts
- protect backend resources from runaway operations and ensure fair resource allocation among all callers.
- Lesson 1123 — Client-Side vs Server-Side Timeout Enforcement
- Serves slightly stale data
- from a cache or replica
- Lesson 315 — Basically Available: Prioritizing Uptime
- Service → Gateway (gRPC)
- → Gateway translates → **Gateway → Client (HTTP)**
- Lesson 874 — Protocol Translation
- Service A
- makes a normal HTTP/gRPC call to Service B
- Lesson 828 — The Sidecar Proxy PatternLesson 851 — Mutual TLS (mTLS) AuthenticationLesson 1077 — What is a Cascading FailureLesson 1097 — The Timeout Chain Problem
- Service accounts
- give each service a unique identity—think of them as machine users with their own credentials.
- Lesson 953 — Service-to-Service Authentication
- Service B
- Lesson 851 — Mutual TLS (mTLS) AuthenticationLesson 1077 — What is a Cascading FailureLesson 1097 — The Timeout Chain Problem
- Service B's sidecar proxy
- Lesson 828 — The Sidecar Proxy Pattern
- Service discovery
- A way to learn which backend servers are available (often from a registry like we'll cover later, but for now imagine a configuration file or API that lists server addresses)
- Lesson 83 — Client-Side Load BalancingLesson 861 — Istio: Architecture and ComponentsLesson 1197 — Pull vs Push Metrics Collection Models
- Service discovery data
- Which instances of a service are currently available and healthy
- Lesson 842 — Control Plane: Configuration Management
- Service entry points
- When a request arrives (HTTP endpoint, message consumer, RPC handler), create a new span or continue an existing trace using the incoming trace ID and span ID.
- Lesson 1223 — Instrumentation Basics
- Service exit points
- When making outgoing calls (HTTP clients, message producers, database calls), create a child span and inject the trace context into the outbound request headers.
- Lesson 1223 — Instrumentation Basics
- service identity
- (like "payment-service" or "user-api").
- Lesson 844 — Control Plane: Certificate ManagementLesson 854 — Request-Level Authorization
- Service instances
- (separate deployments)
- Lesson 1067 — Bulkhead Pattern: Isolating Resources to Prevent Total Failure
- Service Level Indicator (SLI)
- is a carefully chosen quantifiable metric that measures a specific aspect of your service's quality from the user's perspective.
- Lesson 1272 — What Are Service Level Indicators (SLIs)
- service mesh
- solves this by extracting all that communication logic into a separate infrastructure layer.
- Lesson 827 — What is a Service Mesh?Lesson 1126 — Timeout Configuration in Service Mesh
- Service Mesh Foundation
- Envoy runs as a **sidecar proxy** alongside each microservice instance.
- Lesson 115 — Envoy Proxy Architecture
- Service mesh technologies
- add another layer of infrastructure to handle service-to-service communication, security, and observability.
- Lesson 811 — Infrastructure and Tooling Costs
- Service quality
- Spot tenants experiencing high rejection rates who might need help optimizing their integration
- Lesson 1825 — Monitoring and Analytics Per Tenant
- Service Registry
- The log replicated via Raft, containing all service and health data
- Lesson 635 — Consul: Service Discovery with Raft Consensus
- Service Registry Integration
- The mesh's control plane connects to a service registry (like Consul, etcd, or Kubernetes' built-in registry) that maintains the current list of all healthy service instances
- Lesson 832 — Service Discovery in a Mesh
- Service tiers
- Premium customers on high-performance nodes, free tier on standard infrastructure
- Lesson 1452 — List-Based Partitioning
- Service topology graphs
- showing how services communicate
- Lesson 846 — Control Plane: API and User Interface
- Service-to-Service (Internal)
- Between different backend services within your system
- Lesson 78 — Load Balancer Placement in Architecture
- Services need message replay
- New instances can catch up on historical events
- Lesson 734 — NATS Streaming
- Session
- Read-your-writes within a client session
- Lesson 554 — Consistency Model Examples in Real Systems
- session affinity
- ) solve this by configuring your load balancer to remember which server initially handled a user's request.
- Lesson 60 — Sticky Sessions and Load Balancer AffinityLesson 89 — IP Hash AlgorithmLesson 543 — Monotonic Reads Consistency
- Session affinity (sticky sessions)
- Route all requests from a user to the same replica, ensuring it has seen all their writes.
- Lesson 1390 — Read-Your-Writes Consistency
- Session consistency
- (strong consistency within a user session, weaker globally)
- Lesson 541 — The Consistency Spectrum
- Session data
- (user login state, shopping carts)
- Lesson 141 — Cache-as-SoR (System of Record) PatternLesson 1530 — Analytics and Click Tracking
- Session identifiers
- Full session tokens (use truncated versions if needed)
- Lesson 1163 — Avoid Logging Sensitive Data
- Session management
- User A's login on Server 1 must work on Server 2
- Lesson 49 — Application Complexity Trade-offsLesson 343 — Time-to-Live and Expiration
- Session replication delays
- With session-based auth, copying session data across continents introduces lag.
- Lesson 952 — Cross-Region Authentication
- Session stickiness
- Pin a user's session to one replica that has their writes.
- Lesson 542 — Read-Your-Writes ConsistencyLesson 1360 — Monotonic Reads Across Replicas
- Session-aware routing
- Tag requests with a session or user token.
- Lesson 1678 — Read-After-Write Consistency
- Sessions
- group related messages together, ensuring they're processed in order by the same consumer.
- Lesson 675 — Azure Service Bus FeaturesLesson 916 — Session vs Token Tradeoffs
- Set and enforce SLOs
- defining acceptable reliability targets and error budgets (concepts you've already learned)
- Lesson 1307 — What is Site Reliability Engineering (SRE)?
- Set baseline metrics
- before making changes so you know if improvements worked
- Lesson 40 — Measure Before Optimizing
- Set expiration
- Use `EXPIRE` to automatically clean up the key after the window passes
- Lesson 1794 — Redis-Based Rate Limiting with INCR
- Set Size Limits
- Lesson 1560 — Handling Large Pastes Efficiently
- Set timeout
- = `P(chosen percentile) × multiplier` (e.
- Lesson 1117 — Adaptive Timeouts Based on Historical Latency
- Set Toil Budgets
- Lesson 1312 — Measuring and Reducing Toil
- Sets
- store unique, unordered items.
- Lesson 341 — Data Types and Value ComplexityLesson 538 — Conflict-Free Replicated Data Types (CRDTs)
- Setting a value
- `SET user_status = "active"` — repeating this produces the same result
- Lesson 1006 — Natural Idempotency vs Engineered Idempotency
- Severity level
- (critical → on-call engineer immediately, warning → team Slack channel)
- Lesson 1292 — Alert Routing and Escalation
- severity levels
- if engineers ignored critical pages
- Lesson 1296 — Post-Incident Alert ReviewLesson 1298 — Incident Severity Levels and Escalation
- Severity tuning
- Not everything warrants paging someone at 3 AM.
- Lesson 1171 — Log Review and Alert Fatigue
- SHA-256 hash
- a unique cryptographic fingerprint of the file's contents.
- Lesson 1591 — Deduplication Using Content Hashing
- Shadow traffic
- Route reads to old shard, writes to both, gradually shift reads
- Lesson 258 — Resharding and Data Migration
- Shallow checks
- are fast, lightweight, and won't overload your infrastructure.
- Lesson 102 — Shallow vs Deep Health Checks
- Shard 0
- might have copies on Server A (primary), Server B (replica 1), and Server C (replica 2)
- Lesson 1770 — Index Replication for Availability
- Shard 1
- (users A-F): Primary + 2 replicas
- Lesson 70 — Partitioning and Replication TogetherLesson 1770 — Index Replication for Availability
- shard key
- (like `user_id`), then use it to determine which shard stores each row.
- Lesson 229 — What is Sharding?Lesson 232 — Shard Key SelectionLesson 234 — Data Distribution and HotspotsLesson 240 — Hash-Based ShardingLesson 243 — Geo-Based ShardingLesson 249 — Time- Based ShardingLesson 251 — Shard Key ImmutabilityLesson 263 — Shard Key Immutability Problem (+2 more)
- shard map
- is essentially a directory or lookup table that tracks which shard key values live on which physical shard.
- Lesson 236 — Shard Mapping and RoutingLesson 1541 — Sharding and Database Scaling
- Shard routing
- often uses the primary key to determine which shard stores a record
- Lesson 299 — Primary Keys and Entity Integrity
- Sharded
- with each partition on different servers
- Lesson 1447 — Partitioning vs Sharding vs ReplicationLesson 1742 — Search System Architecture Overview
- sharding
- ) means dividing a large dataset into smaller, manageable pieces and storing each piece on a different machine.
- Lesson 65 — What is Data Partitioning?Lesson 67 — Partitioning vs Sharding TerminologyLesson 229 — What is Sharding?Lesson 354 — Redis Cluster ShardingLesson 1150 — The ELK Stack: ElasticsearchLesson 1446 — What is Data Partitioning?Lesson 1447 — Partitioning vs Sharding vs ReplicationLesson 1541 — Sharding and Database Scaling (+1 more)
- shards
- , with each shard living on a separate server.
- Lesson 229 — What is Sharding?Lesson 396 — Sharding in MongoDBLesson 728 — AWS Kinesis Overview
- Shared Code and Libraries
- Lesson 784 — Development Velocity in Early Stages
- Shared Database
- All components typically read from and write to the same database schema.
- Lesson 779 — What is a Monolithic Architecture?
- Shared databases
- Multiple services directly querying the same database tables violates data ownership boundaries.
- Lesson 824 — Avoiding Distributed Monoliths
- Shared identifiers
- 50 accounts all using the same device fingerprint or billing address
- Lesson 474 — Fraud Detection Through Pattern Matching
- Shared Memory Space
- Lesson 780 — Characteristics of Monolithic Systems
- Shield caches
- act as a middle layer between edge and origin
- Lesson 1611 — Multi-Tier Caching Architecture
- shipping service
- creates a label.
- Lesson 576 — When 2PC is Used in PracticeLesson 658 — Topic Subscriptions and Filtering
- Shopping cart
- Availability matters (don't block purchases), but you want low latency too → PA/EL with conflict resolution
- Lesson 520 — Practical PACELC Analysis for Design DecisionsLesson 553 — Choosing Consistency Levels
- Shopping cart services
- that never want to reject an "add to cart" operation
- Lesson 494 — AP Systems: Prioritizing Availability
- Shopping cart updates
- might use eventual consistency with conflict resolution, accepting temporary divergence
- Lesson 488 — CAP as a Spectrum, Not Binary
- Shopping carts
- stored in server memory
- Lesson 56 — What Makes a Service StatefulLesson 338 — What is a Key-Value Store?Lesson 540 — Use Cases for Eventual Consistency
- Short circuit
- A dangerous direct path allowing excessive current flow
- Lesson 1044 — The Electrical Analogy
- Short intervals
- (checking every 1-2 seconds) detect failures quickly but generate lots of traffic.
- Lesson 100 — Health Check Intervals and Timeouts
- Short timeouts
- (500ms) catch stuck servers quickly but may flag slow-but-healthy servers as down.
- Lesson 100 — Health Check Intervals and Timeouts
- Short TTLs (1-5 minutes)
- Minimize staleness risk but still capture significant performance gains for burst access patterns.
- Lesson 942 — Caching Authorization Decisions
- Short-lived access tokens
- Limit blast radius if tokens leak
- Lesson 931 — OAuth2 Security Best Practices
- Short-lived tokens
- Use 5-15 minute expiration times, making clock skew less impactful relative to token lifetime.
- Lesson 949 — Clock Skew and Token Validation
- Short-Lived Tokens with Refresh
- Issue access tokens with very short lifespans (5-15 minutes).
- Lesson 948 — Token Revocation at Scale
- Short-term burst limits
- Apply token bucket or sliding window for seconds/minutes
- Lesson 994 — Quota Management and Burst Allowances
- Shorter intervals
- (50–100ms): Better accuracy, higher Redis load
- Lesson 1802 — Synchronization Strategies for Local Caches
- Shuffle Phase
- Lesson 1746 — Index Construction at Scale
- Side effects happen once
- (database writes, external API calls, charges)
- Lesson 1008 — What Makes an API Idempotent
- Sidecar Interception
- When Service A makes a request to "Service B," its sidecar proxy intercepts it
- Lesson 832 — Service Discovery in a Mesh
- sidecar proxies
- to manage network communication, there are two ways those proxies can intercept traffic between services:
- Lesson 831 — Transparent vs Explicit ProxyingLesson 833 — Polyglot Microservices SupportLesson 850 — Service Discovery Integration
- sidecar proxy
- alongside each microservice instance.
- Lesson 115 — Envoy Proxy ArchitectureLesson 828 — The Sidecar Proxy PatternLesson 830 — Service Mesh vs Library-Based SolutionsLesson 849 — Load Balancing Strategies in Service MeshLesson 1101 — Timeout Propagation in Service Meshes
- Signal cancellation
- to the underlying operation (close connections, interrupt threads, send cancel RPCs)
- Lesson 1094 — Timeout Cancellation and Cleanup
- Signal preservation
- You still get representative traffic patterns and catch all critical issues
- Lesson 1164 — Sampling for High-Volume Logs
- Signed URLs
- solve this by embedding cryptographic proof that the request is authorized and valid only for a limited time.
- Lesson 1615 — Signed URLs and Token-Based Access
- Silent failures
- Users never know their data was discarded
- Lesson 1381 — Limitations of Last-Write-Wins
- Silent timeout bugs
- Does your cleanup logic actually fire when timeouts trigger?
- Lesson 1125 — Timeout Testing and Chaos Engineering
- SIMD operations
- Use vectorized CPU instructions to compare multiple characters simultaneously during trie traversal.
- Lesson 1776 — Typeahead Index Optimization
- Simhash
- from lesson 1855: for near-duplicates (pages differing by ads/timestamps), use locality-sensitive hashing to cluster similar content and store only canonical versions.
- Lesson 1870 — Content Storage and Deduplication
- Similar to URL Shortener
- You've just studied URL shorteners (lessons 1494-1541).
- Lesson 1542 — Pastebin System Overview
- Simple
- Less complex than Layer 7, easier to configure
- Lesson 109 — Layer 4 (Transport) Load BalancingLesson 148 — First In First Out (FIFO)
- Simple Architectures
- When your services communicate in straightforward patterns without complex routing, retries, or circuit breaking needs, standard HTTP libraries and basic load balancers suffice.
- Lesson 835 — When You Don't Need a Service MeshLesson 1260 — Cost-Benefit Analysis
- Simple consistency
- One source of truth for counters
- Lesson 1791 — Single Data Center vs Distributed Setup
- Simple domains
- with tightly coupled business logic gain nothing from service boundaries.
- Lesson 814 — When Complexity Outweighs Benefits
- Simple hash-based
- is easiest to implement and understand
- Lesson 253 — Evaluating Sharding Strategy Tradeoffs
- Simple Linear Merge
- Lesson 1749 — Query Processing and Term Intersection
- Simple queries
- Lookups by short code are fast, cacheable, and index-friendly
- Lesson 1522 — Read-Heavy Workload and Database Scaling
- Simple reads
- "Get user X's preferences" — one key, one document, millisecond response.
- Lesson 1721 — Preference Storage Strategy
- Simple reasoning
- The linear structure makes it easier to reason about ordering and consistency
- Lesson 1362 — Chain Replication
- Simple request-response
- The operation is fast and unlikely to fail
- Lesson 654 — When to Use Async vs Sync
- Simple to implement
- (just timestamp → window ID mapping)
- Lesson 967 — Fixed Window CounterLesson 1525 — Cache Eviction Policy for URL Shortener
- Simple to reason about
- One source of truth for writes prevents conflicts
- Lesson 71 — Single-Leader Replication Model
- Simpler backend services
- that can focus on business logic, not identity verification
- Lesson 883 — Authentication at the Gateway
- Simpler broker design
- The platform doesn't need complex flow control
- Lesson 697 — Push vs Pull Consumption Models
- Simpler Clients
- Mobile apps and web frontends don't need complex orchestration logic
- Lesson 887 — API Composition and Aggregation
- Simpler Debugging
- Lesson 784 — Development Velocity in Early Stages
- Simpler deployment
- One build, one deploy process—leveraging the deployment simplicity advantage
- Lesson 820 — When a Monolith is the Right Choice
- Simpler implementation
- – Standard in-memory algorithms work perfectly
- Lesson 982 — Sticky Sessions and Rate LimitingLesson 1830 — Breadth-First vs Depth-First Crawling
- Simpler operations
- One service to deploy, monitor, and scale
- Lesson 904 — BFF vs Single Gateway TradeoffsLesson 1436 — Active-Passive vs Active-Active DR
- Simpler origin infrastructure
- Your origin can potentially communicate with the CDN over plain HTTP or use simpler TLS configurations, reducing complexity.
- Lesson 187 — SSL/TLS Termination at the Edge
- Simpler recovery than incrementals
- You need just two backups to restore—the last full backup plus the last differential
- Lesson 1404 — Differential Backups
- Simpler Refactoring
- When each service has a narrow responsibility, you can rewrite, restructure, or upgrade it without touching other services—as long as the API contract remains stable.
- Lesson 797 — Improved Code Maintainability
- Simpler service code
- Services just expose a `/metrics` endpoint
- Lesson 1197 — Pull vs Push Metrics Collection Models
- Simpler services
- Participants don't need to know about the overall flow
- Lesson 591 — Orchestration-Based Sagas
- Simpler tuning
- One number (max permits) instead of pool sizes, queue depths, and rejection policies
- Lesson 1070 — Semaphore-Based Bulkheads: Limiting Concurrent Requests
- Simplicity
- One fewer system to manage (no separate database)
- Lesson 141 — Cache-as-SoR (System of Record) PatternLesson 736 — What is Batch Processing?Lesson 755 — When to Choose Lambda vs KappaLesson 1365 — Single-Leader Replication TopologyLesson 1508 — Hash-Based Generation Approach
- Simplicity matters
- Small teams prefer less operational overhead than Kafka
- Lesson 734 — NATS Streaming
- Simplified Backend
- Backend servers can be simpler, lighter applications without SSL libraries and certificate handling code.
- Lesson 118 — SSL/TLS Termination at Load Balancers
- Simplified CI/CD Pipeline
- Your continuous integration and deployment pipeline has one clear job: build the monolith, run tests, and deploy it.
- Lesson 783 — Deployment Simplicity: Monolith Advantage
- Simplified operations
- Automatic upgrades, built-in monitoring, and IAM-based security reduce operational burden compared to self-hosted meshes.
- Lesson 864 — AWS App Mesh and Cloud-Native Meshes
- Simplified retention policies
- Delete 2023's data by dropping its partition
- Lesson 1473 — Range Partitioning Benefits
- Simplified service code
- Backend services don't need SSL libraries or certificate configuration.
- Lesson 891 — SSL/TLS Termination
- Simulate realistic failure scenarios
- (region outage, database corruption, full datacenter loss)
- Lesson 1419 — Measuring and Testing RPO/RTO Compliance
- Simulate realistic traffic patterns
- , not just brute-force requests.
- Lesson 997 — Testing and Monitoring Rate Limiters
- Simulating network partitions
- creates the split-brain scenarios you learned about.
- Lesson 1347 — Common Chaos Experiments
- Simultaneously
- , a background process fetches fresh data from the source
- Lesson 162 — Stale-While-Revalidate Pattern
- Single Codebase
- All functionality exists in one repository.
- Lesson 779 — What is a Monolithic Architecture?
- Single Deployment
- When you make any change—even a minor bug fix in one feature—you must rebuild and redeploy the entire application.
- Lesson 779 — What is a Monolithic Architecture?
- Single Deployment Pipeline
- Lesson 780 — Characteristics of Monolithic Systems
- Single endpoint
- instead of multiple versioned REST paths
- Lesson 1910 — GraphQL Fundamentals and Query Language
- Single Entry Point Pattern
- solves this by placing an API gateway as the sole entry point for all client requests.
- Lesson 871 — The Single Entry Point Pattern
- single point of failure
- .
- Lesson 68 — What is Data Replication?Lesson 76 — What Is a Load Balancer?Lesson 77 — Why Load Balancers Are NecessaryLesson 242 — Directory-Based ShardingLesson 569 — The Coordinator Role in 2PCLesson 979 — Centralized vs Decentralized ApproachesLesson 1365 — Single-Leader Replication TopologyLesson 1479 — Directory Service as Single Point of Failure (+2 more)
- Single Responsibility Principle
- you've learned and respects **Bounded Contexts** from Domain-Driven Design.
- Lesson 817 — Identifying Service Boundaries by Data Ownership
- Single Shared Cache
- Use a distributed cache like Redis shared by all servers.
- Lesson 167 — Cache Coherence in Distributed Systems
- single source of truth
- for the transaction outcome.
- Lesson 569 — The Coordinator Role in 2PCLesson 843 — Control Plane: Service Discovery IntegrationLesson 1301 — War Rooms and Communication ChannelsLesson 1804 — Multi-Region Rate Limiting Challenges
- Single-digit milliseconds
- → Distributed cache like Redis
- Lesson 130 — Choosing the Right Caching Layer
- Single-field indexes
- accelerate queries on one field: indexing `email` speeds up lookups by email address.
- Lesson 385 — Indexing in Document Stores
- Single-Host Queue Pattern
- maintains a separate queue for each hostname in your URL frontier.
- Lesson 1841 — Single-Host Queue Pattern
- Single-leader architecture
- (in primary-replica setups) where writes go to one source of truth
- Lesson 308 — Strong Consistency by Default
- single-leader replication
- , only one node accepts writes while others replicate data from it.
- Lesson 72 — Multi-Leader Replication ModelLesson 1365 — Single-Leader Replication TopologyLesson 1367 — Multi-Leader Replication Fundamentals
- Single-system image
- The distributed database behaves as if it's one atomic system
- Lesson 484 — Consistency in CAP Context
- Singular for instantaneous states
- Gauges showing current values use singular: `memory_usage_bytes`, `active_connection_count`.
- Lesson 1182 — Metric Naming Conventions
- Sink Connectors
- Read data *from* Kafka topics and push it *into* external systems (databases, data warehouses, search indexes)
- Lesson 721 — Kafka Connect Framework
- Site completion
- Better for focused crawls that want to exhaust one domain completely before moving on
- Lesson 1830 — Breadth-First vs Depth-First Crawling
- Sitemap endpoints
- Many sites publish XML sitemaps (`example.
- Lesson 1828 — Seed URLs and Starting Point
- Size Limits
- Enforce maximum file sizes to prevent storage abuse and resource exhaustion.
- Lesson 1592 — Upload Validation and Virus ScanningLesson 1599 — Upload Validation and Virus Scanning
- Size-aware eviction
- if URL metadata varies significantly
- Lesson 1525 — Cache Eviction Policy for URL Shortener
- Size-based retention
- Keep the most recent messages up to a certain total size (e.
- Lesson 711 — Message Retention and Log Segments
- Size-Tiered Compaction (STCS)
- Lesson 428 — Compaction Strategies
- skewed access patterns
- a small set of items accessed far more often than others—LFU (Least Frequently Used) shines.
- Lesson 153 — Choosing an Eviction PolicyLesson 256 — Hotspots and Uneven Data Distribution
- Skewed data distribution
- happens when certain shard key values are far more common than others.
- Lesson 234 — Data Distribution and HotspotsLesson 256 — Hotspots and Uneven Data Distribution
- Skip pointers
- Jump over irrelevant sections of posting lists
- Lesson 1741 — Search Latency and Response Time
- Skyrocketing costs
- You throw expensive hardware at problems that better design would solve cheaply
- Lesson 2 — Why System Design Matters
- SLAs (Service Level Agreements)
- are *external* contracts with customers that include financial or legal consequences when breached.
- Lesson 1283 — SLOs vs SLAs: The Critical Difference
- sliding window
- looks back from the current moment.
- Lesson 961 — Time Windows for Rate LimitsLesson 990 — Tiered Rate Limits for Different User ClassesLesson 1053 — Sliding Window vs Fixed WindowLesson 1057 — Failure Detection and CountingLesson 1697 — API Layer and Rate Limiting
- Sliding window counter
- hybrids use two counters, doubling fixed window memory but staying manageable.
- Lesson 970 — Fixed vs Sliding Window TradeoffsLesson 975 — Algorithm Selection CriteriaLesson 1813 — Memory Footprint per User and Limits
- Sliding window counters
- add weighted calculations but remain simpler than full logs.
- Lesson 970 — Fixed vs Sliding Window Tradeoffs
- Sliding window log
- approaches store every request timestamp, consuming significantly more memory (N timestamps for N requests).
- Lesson 970 — Fixed vs Sliding Window TradeoffsLesson 975 — Algorithm Selection CriteriaLesson 1796 — Sliding Window Log in RedisLesson 1809 — Memory Optimization with Sliding Window LogLesson 1813 — Memory Footprint per User and Limits
- Sliding window logs
- require maintaining sorted timestamps, pruning old entries, and careful data structure management.
- Lesson 970 — Fixed vs Sliding Window TradeoffsLesson 1808 — Redis Data Structures for Rate Limiting
- Sliding windows
- (log or counter) eliminate this edge case entirely.
- Lesson 970 — Fixed vs Sliding Window TradeoffsLesson 1053 — Sliding Window vs Fixed Window
- Slight replication lag
- between regions (typically seconds)
- Lesson 202 — Why Replicate: Geographic Distribution
- SLIs
- measure reliability from the user's perspective
- Lesson 1313 — Monitoring and Observability for SRE
- SLO setting
- – MTBF informs realistic availability targets (remember: availability and reliability are different but related)
- Lesson 1323 — Mean Time Between Failures (MTBF)
- sloppy quorum
- relaxes this requirement.
- Lesson 366 — Sloppy Quorums and Hinted HandoffLesson 375 — Sloppy Quorum and Hinted HandoffLesson 561 — Sloppy Quorums and Hinted HandoffLesson 1372 — Sloppy Quorums and Hinted Handoff
- sloppy quorums
- and **hinted handoff**.
- Lesson 370 — Distributed Key-Value Store Architectures in PracticeLesson 565 — Quorum Trade-offs and Failure Scenarios
- SLOs (Service Level Objectives)
- are your *internal* targets—the reliability goals your team commits to achieving.
- Lesson 1283 — SLOs vs SLAs: The Critical Difference
- Slow for popular users
- – If you follow 5,000 people, querying their posts takes time
- Lesson 1637 — Pull (Read-Time) Feed Model
- Slow queries
- as the system scans millions of series
- Lesson 1211 — Avoiding High-Cardinality LabelsLesson 1258 — Cardinality Explosion
- Slow Query Logs
- are built into most databases (MySQL, PostgreSQL, etc.
- Lesson 287 — Monitoring Slow QueriesLesson 1777 — Query Performance Monitoring
- Slow reads
- – if you follow 1,000 people, you're querying 1,000 users' posts
- Lesson 1647 — Fanout-on-Read (Pull Model)
- Slow response times
- The user waits while thousands (or millions) of feeds update
- Lesson 1651 — Asynchronous Fanout Processing
- Slower writes
- Every committed log entry must reach more machines
- Lesson 639 — Consensus Cluster Sizing TradeoffsLesson 759 — Schema-on-Write vs Schema-on-Read
- Sluggish performance
- Pages take 30 seconds to load instead of milliseconds
- Lesson 2 — Why System Design Matters
- Small Deployments
- If you have 3-5 microservices, manually configuring communication is straightforward.
- Lesson 835 — When You Don't Need a Service Mesh
- Small images
- (~few MB): Synchronous processing works well.
- Lesson 1598 — Synchronous vs Asynchronous Processing
- Small or Medium-Sized Datasets
- Lesson 239 — When Not to Shard
- Small scale (100 req/sec)
- 180ms × 100 = 18 seconds wasted/second
- Lesson 276 — Why Query Optimization Matters at Scale
- Small team
- If you have 3 engineers, maintaining 3+ BFFs drains resources better spent elsewhere.
- Lesson 908 — When to Use BFF Pattern
- Smaller Codebases
- Lesson 796 — Faster Development Cycles
- Smart Routing
- Circuit breaking and locality-aware load balancing reduce unnecessary hops
- Lesson 841 — Data Plane: Performance and Latency Overhead
- Smart TV BFF
- Handles remote control navigation patterns and low-resolution constraints
- Lesson 902 — Backend-for-Frontend (BFF) Pattern Overview
- Smooth migration path
- When a module becomes a bottleneck or needs independent scaling, you can extract it as a microservice because the boundaries already exist.
- Lesson 825 — Starting with a Modular Monolith
- Smoother scaling
- When you add a new server, its virtual nodes "steal" small chunks from many existing servers, not just neighbors
- Lesson 363 — Virtual Nodes and Load Distribution
- SMS
- Plain text, 160 chars max, tracking link shortened
- Lesson 1692 — Channel-Specific FormattingLesson 1693 — Delivery Receipt TrackingLesson 1694 — Channel Costs and Economics
- SMTP
- (Simple Mail Transfer Protocol) is the foundational protocol for sending email.
- Lesson 1686 — Email Notifications
- snapshot
- a compact representation of the current application state at a specific point.
- Lesson 632 — Log Compaction: SnapshottingLesson 1401 — Backup vs Replication vs SnapshotsLesson 1426 — Snapshot-Based Backups
- Snapshots
- are **instantaneous, space-efficient copies** of data at a specific moment.
- Lesson 1401 — Backup vs Replication vs Snapshots
- Snowflake Schema
- normalizes dimensions further—breaking a customer dimension into separate customer, city, and country tables.
- Lesson 760 — Data Warehouse Architecture
- Social media feed
- → Eventual consistency (stale likes won't break anything)
- Lesson 553 — Choosing Consistency LevelsLesson 1336 — Graceful Degradation
- Social media feeds
- can show slightly stale data
- Lesson 317 — ACID vs BASE TradeoffsLesson 318 — When to Choose ACID or BASELesson 540 — Use Cases for Eventual Consistency
- Social media metrics
- If a "like" counter is temporarily off, it's not catastrophic
- Lesson 137 — Write-Behind: Risks and Use Cases
- Social recommendations
- "Friends who liked this post" combines friendship and interaction edges
- Lesson 457 — Use Cases: Social Networks and RecommendationsLesson 464 — Traversal Queries: Friends of Friends
- Soft Invalidation
- Lesson 1617 — Cache Invalidation and Purging
- Soft purge
- Marks content as stale but keeps it available as a backup while fetching fresh content
- Lesson 185 — Purging and Cache Invalidation Strategies
- Soft state
- Data state may change over time, even without new input
- Lesson 314 — BASE Properties OverviewLesson 316 — Soft State and Eventual Consistency
- Soft vs Hard Purge
- Lesson 185 — Purging and Cache Invalidation Strategies
- software solutions
- .
- Lesson 79 — Hardware vs Software Load BalancersLesson 108 — Hardware vs Software Load Balancers
- Solution
- Insert intermediate cache tiers between edge and origin.
- Lesson 182 — Cache Hierarchies and Tiered Caching
- Solution 1 - JOIN
- Fetch everything in one query using table joins:
- Lesson 282 — Avoiding N+1 Query Problems
- Solution approaches
- Use headless browsers (like Puppeteer/Selenium) that execute JavaScript, or detect and call the underlying APIs directly.
- Lesson 1834 — HTML Parsing Challenges
- Somewhere in between
- (user profiles, notifications): **Semi-synchronous** offers a middle ground.
- Lesson 1364 — Choosing a Replication Mode
- Sort order
- (rows are stored sorted by key on disk)
- Lesson 413 — Row Keys and ClusteringLesson 422 — Clustering Columns and Row OrderingLesson 1451 — Range-Based Partitioning
- Sorted arrays
- Insertion is O(n) — too slow when millions of URLs arrive.
- Lesson 1847 — Heap-Based Priority Queue Implementation
- Sorted by document ID
- for faster intersection operations during multi-term queries
- Lesson 1745 — Posting Lists and Document IDs
- sorted sets
- solve this elegantly.
- Lesson 359 — Redis for Leaderboards and CountingLesson 1808 — Redis Data Structures for Rate Limiting
- Source Connectors
- Pull data *from* external systems (databases, files, APIs) and write it *into* Kafka topics
- Lesson 721 — Kafka Connect Framework
- source of truth
- it survives crashes and enables deterministic recovery.
- Lesson 574 — Recovery Protocols and LogsLesson 754 — Event Log Replay in KappaLesson 1722 — Real- Time Preference Updates
- Space-efficient
- Only changed blocks consume additional storage
- Lesson 1426 — Snapshot-Based BackupsLesson 1758 — Trie Data Structure for Prefix Matching
- span
- is one runner's segment—when they received the baton, when they passed it, and how long they ran.
- Lesson 855 — Observability: Distributed TracingLesson 1221 — Traces, Spans, and Parent-Child Relationships
- Span attributes
- (also called **tags** or **labels**) let you attach key-value metadata to spans, transforming them from skeletal timing information into rich, searchable debugging narratives.
- Lesson 1225 — Span Attributes and TagsLesson 1229 — Service Dependency Graphs
- span events
- are timestamped snapshots of interesting moments *within* that span.
- Lesson 1226 — Span Events and LogsLesson 1234 — Span Events and Logs
- span ID
- uniquely identifies each step or service call within that journey.
- Lesson 1146 — Correlation IDs Across ServicesLesson 1222 — Trace ID and Span IDLesson 1230 — Trace Context Fundamentals
- span IDs
- (identifies individual operations).
- Lesson 1219 — What is Distributed Tracing?Lesson 1229 — Service Dependency GraphsLesson 1249 — Integrating Traces with Logs and Metrics
- Span Sampling Decisions
- (lesson 1238)—the mechanism that determines whether to record a trace.
- Lesson 1252 — Sampling Strategies Overview
- spans
- (timed segments) for each service-to-service call
- Lesson 855 — Observability: Distributed TracingLesson 1219 — What is Distributed Tracing?Lesson 1229 — Service Dependency Graphs
- Spark Streaming
- extends the popular Apache Spark framework with micro-batching capabilities.
- Lesson 744 — Stream Processing FrameworksLesson 769 — Spark Streaming and Structured StreamingLesson 771 — Flink vs Spark for Streaming
- Sparse posting lists
- (rare terms): Delta + variable-byte encoding
- Lesson 1752 — Index Compression Techniques
- Special character handling
- decide how to treat punctuation, numbers
- Lesson 1733 — Document Processing Pipeline
- Specific
- "Add timeout to database query X" not "improve monitoring"
- Lesson 1352 — Postmortem Structure and Action Items
- spectrum
- , adjusting their behavior based on the operation type, business priority, and acceptable risk.
- Lesson 488 — CAP as a Spectrum, Not BinaryLesson 507 — Consistency is a Spectrum in Practice
- Speed
- No disk I/O on reads or writes
- Lesson 141 — Cache-as-SoR (System of Record) PatternLesson 143 — Multi-Tier Caching PatternLesson 168 — What is a CDN and Why Use ItLesson 349 — Redis In-Memory Storage ModelLesson 356 — Redis as a Session StoreLesson 448 — Write Path: MemTable and Commit LogsLesson 529 — When to Choose Strong ConsistencyLesson 1730 — What is a Search Engine? (+1 more)
- Speed counts
- NATS's low latency fits real-time service-to-service communication
- Lesson 734 — NATS Streaming
- speed layer
- handles only the most recent data—streaming it incrementally to provide near-instant views.
- Lesson 749 — Lambda Architecture: Speed LayerLesson 750 — Lambda Architecture: Serving Layer
- Speed mismatch
- Producers can work faster or slower than consumers without blocking
- Lesson 646 — The Producer-Consumer Model
- Speed vs. Consistency
- Your writes are fast because you're not waiting on replicas, but there's a window where replicas are "behind" the primary.
- Lesson 204 — Asynchronous Replication Explained
- Spike detected
- → Switch to Least Connections to prevent overload
- Lesson 97 — Dynamic Algorithm Selection
- SPL (Search Processing Language)
- to query logs, create dashboards, and build alerts.
- Lesson 1154 — Alternative: Splunk Architecture
- Split logically
- Store celebrity data separately from regular users entirely
- Lesson 1483 — Celebrity User Problem
- split-brain
- two nodes both thinking they're the leader, causing data corruption.
- Lesson 603 — Consensus Use CasesLesson 636 — Consensus for Leader Election
- Splunk
- is an enterprise-grade, proprietary platform built specifically for machine-generated data at massive scale.
- Lesson 1154 — Alternative: Splunk Architecture
- SQL Database
- holds paste metadata: `paste_id`, `created_at`, `expiration_time`, `user_id`, `language`, `views`, and crucially, a **reference pointer** to where the actual content lives
- Lesson 1556 — Hybrid Storage: Metadata + Content References
- SQL databases
- come with decades of battle-tested tooling, extensive documentation, and a vast pool of experienced database administrators (DBAs).
- Lesson 326 — Operational Complexity ConsiderationsLesson 327 — Polyglot Persistence Pattern
- SQL excels when
- Lesson 323 — Query Pattern Complexity Analysis
- SQL for management
- (admin dashboards, user settings pages with complex filters) and **cache preferences in Redis/Memcached** for the hot path.
- Lesson 1721 — Preference Storage Strategy
- SSE advantages
- Lesson 1673 — Server-Sent Events (SSE) Alternative
- SSE limitations
- Lesson 1673 — Server-Sent Events (SSE) Alternative
- SSL/TLS pass-through
- (load balancer forwards encrypted traffic without decrypting) or **re-encryption** (terminate at load balancer, then re-encrypt to backends).
- Lesson 118 — SSL/TLS Termination at Load Balancers
- SSTable
- (Sorted String Table) is an immutable, on-disk file that stores sorted key-value pairs.
- Lesson 427 — SSTables and Immutable StorageLesson 446 — SSTable and GFS Dependencies
- SSTables
- (Sorted String Tables)—immutable, sorted files containing key-value pairs.
- Lesson 439 — Google BigTable ArchitectureLesson 446 — SSTable and GFS Dependencies
- Stability
- New inserts don't shift your pages; the keyset anchors your position in the dataset
- Lesson 1890 — Keyset Pagination
- Stable, well-understood systems
- If your service's behavior is predictable and issues are rare, aggressive sampling or trace-on- demand strategies work better than always-on tracing.
- Lesson 1260 — Cost-Benefit Analysis
- Stack trace
- – The call chain showing exactly which line of code threw the error and how execution reached it.
- Lesson 1142 — Logging Exceptions and Stack Traces
- stages
- , each represented by an operator starting with `$`.
- Lesson 399 — Aggregation PipelineLesson 768 — Apache Spark Overview
- Stagger warming
- Avoid overwhelming your database by spreading loads over time
- Lesson 161 — Cache Warming Strategies
- Staging environments
- identical to production, where you verify changes under realistic conditions
- Lesson 1314 — Release Engineering and Safe Deployment
- Stakeholder channel
- Sanitized updates for leadership, customer support, sales
- Lesson 1301 — War Rooms and Communication Channels
- Stale data is unacceptable
- medical records, legal documents
- Lesson 518 — PC/EC Systems: Consistency Always
- Stale reads
- from any follower: instant, but might be outdated
- Lesson 640 — Performance Characteristics of Consensus
- Stale-While-Revalidate
- pattern solves a common caching dilemma: what happens when cached data expires?
- Lesson 162 — Stale-While-Revalidate Pattern
- Standardized APIs and SDKs
- for creating spans, propagating context, and recording telemetry
- Lesson 1240 — OpenTelemetry Overview
- Star Schema
- is the simplest: one central fact table surrounded by dimension tables, like planets around a sun.
- Lesson 760 — Data Warehouse Architecture
- Start at that position
- Locate where `87` sits on the 0–359 ring
- Lesson 1459 — Clockwise Key Assignment Rule
- Start vertical
- when your system is new and traffic is manageable
- Lesson 52 — Hybrid Scaling Strategies
- Start with application requirements
- Lesson 553 — Choosing Consistency Levels
- Start with business requirements
- What's the maximum acceptable latency for your users?
- Lesson 1091 — Default Timeout Pitfalls
- Start with consistency requirements
- Lesson 1364 — Choosing a Replication Mode
- Startup Speed
- Proxies initialize in milliseconds, not seconds, enabling faster pod scaling and deployments.
- Lesson 862 — Linkerd: Lightweight Service Mesh
- Startup warming
- loads critical data when your application or cache service boots up—before accepting traffic.
- Lesson 140 — Cache Warming Strategies
- Stat
- Big numbers with sparklines (total errors in last hour)
- Lesson 1200 — Grafana for Metrics Visualization
- State changes
- track transitions between healthy and unhealthy.
- Lesson 107 — Monitoring Health Check MetricsLesson 1234 — Span Events and Logs
- State Isolation
- Lesson 1790 — Multi-Tenancy Considerations
- State machine replication
- What's the next operation all replicas should execute?
- Lesson 599 — What Is Distributed Consensus?
- State management
- Maintaining computation state across events
- Lesson 744 — Stream Processing FrameworksLesson 756 — Hybrid and Modern Alternatives
- State Synchronization
- The backup must have up-to-date state—often achieved through replication or shared storage—so users don't lose data mid-failover.
- Lesson 1335 — Failover Mechanisms
- State transitions
- Order status changes, workflow progressions, job completions.
- Lesson 1129 — What to Log vs What Not to Log
- State-based CRDTs (CvRDTs)
- Replicas send their entire state to each other and merge using a commutative, associative, idempotent merge function.
- Lesson 538 — Conflict-Free Replicated Data Types (CRDTs)Lesson 1384 — Conflict-Free Replicated Data Types (CRDTs)
- stateful
- each server must either store sessions locally (breaking horizontal scaling) or share session storage (adding complexity and latency).
- Lesson 61 — Stateless Authentication with TokensLesson 1338 — Stateless vs Stateful Redundancy
- Stateful algorithms
- (Least Connections, Least Response Time) require tracking server metrics, adding complexity and requiring coordination between load balancer instances.
- Lesson 96 — Algorithm Selection Tradeoffs
- Stateful interactions
- The bug might only appear with specific data flowing through the system
- Lesson 807 — Debugging and Troubleshooting
- stateless
- , you can deploy multiple Kong nodes behind a load balancer.
- Lesson 894 — Kong Gateway ArchitectureLesson 1338 — Stateless vs Stateful RedundancyLesson 1512 — Random String GenerationLesson 1605 — Distributed Worker Architecture
- Stateless algorithms
- (Round Robin, Random, IP Hash) make decisions using only current request data—easier to scale and make highly available.
- Lesson 96 — Algorithm Selection Tradeoffs
- Stateless server
- no queues to manage, no acknowledgments to track
- Lesson 673 — NATS and Lightweight Messaging
- Stateless servers
- Any application server can handle any request since session data lives externally.
- Lesson 356 — Redis as a Session Store
- Stateless services
- are ideal candidates for horizontal scaling.
- Lesson 51 — When to Choose Horizontal ScalingLesson 61 — Stateless Authentication with TokensLesson 76 — What Is a Load Balancer?
- Stateless Sessions (Token-Based)
- Lesson 947 — Distributed Session Management
- Stateless Token Design
- Lesson 950 — Auth Service Single Point of Failure
- Stateless verification
- No database lookup needed, just cryptographic check
- Lesson 916 — Session vs Token Tradeoffs
- Statement level
- Individual queries can override defaults
- Lesson 285 — Query Timeout and Statement Limits
- Static analysis
- Tools that flag potential credential logging
- Lesson 1163 — Avoid Logging Sensitive Data
- Static assets
- are files that remain the same for all users and don't change on every request.
- Lesson 173 — Content Types Suited for CDNs
- static content
- images, videos, CSS, JavaScript, fonts—anything that doesn't change per user.
- Lesson 125 — CDN as Edge Caching LayerLesson 130 — Choosing the Right Caching Layer
- Static sizing pain
- Pre-allocating 50 threads per bulkhead when average usage is 5 wastes memory
- Lesson 1076 — Bulkhead Tradeoffs: Complexity and Resource Overhead
- StatsD with Graphite
- handles moderate loads well, while **Prometheus** excels at hundreds of thousands of time series but may need federation beyond single-datacenter deployments.
- Lesson 1208 — Choosing a Metrics System for Your Scale
- Status
- pending, sent, failed
- Lesson 1711 — Idempotency Keys for NotificationsLesson 1726 — Aggregation and Reporting
- Status = completed
- Return the cached result from the first request
- Lesson 1013 — Handling In-Progress Requests
- Status = in-progress
- Return `HTTP 409 Conflict` or `HTTP 202 Accepted` with a message like "Request is being processed"
- Lesson 1013 — Handling In-Progress Requests
- Status updates
- When a user goes idle (no heartbeat), their presence key expires automatically
- Lesson 1676 — Presence Detection and User Status
- Steady, predictable traffic
- **Leaky Bucket** or **Fixed Window**
- Lesson 975 — Algorithm Selection Criteria
- Stemming
- "running" → "run" (match variations)
- Lesson 1733 — Document Processing PipelineLesson 1738 — Query Processing Flow
- Stemming/Lemmatization
- Reducing words to root forms using language rules.
- Lesson 1778 — Multi-Language Search Support
- Step 2: Externalize State
- Lesson 63 — Converting Stateful to Stateless Architectures
- Step-by-step
- Lesson 968 — Sliding Window Log
- Sticky
- Minimizes partition movement during rebalancing
- Lesson 716 — Consumer Groups and Partition Assignment
- Sticky routing
- Route users consistently to one region when possible, falling back to others only during failures.
- Lesson 987 — Multi-Region Rate Limiting Challenges
- Sticky sessions
- (also called **session affinity**) solve this by configuring your load balancer to remember which server initially handled a user's request.
- Lesson 60 — Sticky Sessions and Load Balancer AffinityLesson 89 — IP Hash AlgorithmLesson 94 — Session Affinity (Sticky Sessions)Lesson 96 — Algorithm Selection TradeoffsLesson 215 — Sticky Sessions and Replica AffinityLesson 225 — Sticky Sessions and Primary ReadsLesson 535 — Monotonic ReadsLesson 543 — Monotonic Reads Consistency (+2 more)
- Sticky Sessions (Session Affinity)
- Lesson 947 — Distributed Session Management
- Stock tickers
- showing prices within seconds (not milliseconds) of market changes
- Lesson 549 — Bounded Staleness
- Stop at first node
- Assign the key to the first node you encounter
- Lesson 1459 — Clockwise Key Assignment Rule
- Stop Words
- Common words to filter differ per language ("the," "a" in English; "le," "la" in French).
- Lesson 1778 — Multi-Language Search Support
- Stop-the-world migration
- Shut down, move data, restart (rarely acceptable)
- Lesson 258 — Resharding and Data Migration
- Storage
- Lesson 33 — Putting It All Together: Worked ExampleLesson 1242 — Zipkin Architecture and DesignLesson 1421 — Full Backup StrategyLesson 1504 — Link Expiration and Retention PoliciesLesson 1584 — Image/Video Hosting: Problem Definition and ScaleLesson 1624 — Thumbnail and Preview GenerationLesson 1827 — Crawler Architecture Overview
- Storage and Indexing
- A centralized database optimized for log search and retention (e.
- Lesson 1148 — Centralized Logging Architecture
- Storage capacity
- grows linearly with machines
- Lesson 65 — What is Data Partitioning?Lesson 229 — What is Sharding?
- Storage corruption
- over time without errors being reported
- Lesson 1430 — Backup Verification and Testing
- Storage Cost
- Replication multiplies storage requirements linearly with replicas
- Lesson 947 — Distributed Session Management
- Storage Cost Tiers
- Offer permanent storage as a premium feature.
- Lesson 1573 — Handling Never-Expiring Pastes
- Storage costs
- Longer retention means more keys in your database or cache
- Lesson 1012 — Idempotency Key Expiration StrategyLesson 1192 — Cardinality and Label ExplosionLesson 1228 — Trace Sampling FundamentalsLesson 1252 — Sampling Strategies OverviewLesson 1621 — Compression and Format OptimizationLesson 1631 — Multi-Region Replication Strategy
- Storage explosion
- More disk space consumed exponentially
- Lesson 1207 — Metrics Cardinality and Performance Impact
- Storage growth
- – 2.
- Lesson 32 — Rounding and Approximation TechniquesLesson 1584 — Image/Video Hosting: Problem Definition and Scale
- Storage growth rate
- Monitor total object storage size and metadata database size over time.
- Lesson 1574 — Monitoring Expiration and Storage Health
- Storage Layer
- Lesson 1477 — Directory Service ArchitectureLesson 1585 — Upload Flow Architecture OverviewLesson 1687 — In-App Notifications
- Storage Organization
- Lesson 1603 — Thumbnail and Preview Generation
- storage overhead
- .
- Lesson 409 — Data Size and Storage ConsiderationsLesson 1516 — Counter-Based vs UUID ApproachesLesson 1638 — Push (Write-Time) Feed Model
- Storage Patterns
- monitor upload frequency, file size distributions, format preferences, and deduplication savings.
- Lesson 1628 — Usage Analytics and Metrics
- Storage requirements
- You're not duplicating unchanged files repeatedly
- Lesson 1403 — Incremental Backups
- Storage savings
- Automatically reclaim space from temporary content
- Lesson 1565 — Expiration Requirements and TTL Basics
- Storage scalability
- Distribute terabytes or petabytes across hundreds of nodes
- Lesson 1446 — What is Data Partitioning?
- Storage tiering
- reduces costs:
- Lesson 1135 — Log Retention and Volume ManagementLesson 1589 — Storage Tiering StrategyLesson 1620 — Storage Tiering for Cost Optimization
- Storage utilization
- Set thresholds on storage tier capacity (hot vs cold from lesson 1557).
- Lesson 1574 — Monitoring Expiration and Storage Health
- Storage vs Retention
- Keeping logs forever for compliance or deep analysis requires exponentially growing storage.
- Lesson 1159 — Log Aggregation Performance Considerations
- Store
- Save the raw document in a distributed storage system
- Lesson 1732 — Crawling and Document Collection
- Store in object storage
- (S3, Azure Blob) and keep only metadata in the database
- Lesson 1553 — Object Storage vs Database for Paste Content
- Store in the database
- alongside metadata (title, expiration, owner)
- Lesson 1553 — Object Storage vs Database for Paste Content
- Store receipt records
- in your database, linked to the original notification ID
- Lesson 1693 — Delivery Receipt Tracking
- Store the hash
- (32 bytes for SHA-256) in a seen-content set
- Lesson 1852 — Content Fingerprinting with Hashing
- Store the key
- (receipt number) with the operation outcome
- Lesson 1011 — Idempotency Key Storage and Lookup
- Stores only recent results
- old data is discarded once the batch layer catches up
- Lesson 749 — Lambda Architecture: Speed Layer
- Stores session data
- in the server's memory (user preferences, authentication tokens, shopping cart contents)
- Lesson 56 — What Makes a Service Stateful
- Strangler Fig Pattern
- is named after a tropical plant that grows around a host tree, eventually replacing it.
- Lesson 822 — The Strangler Fig Pattern for Migration
- Stream processing
- is like instant messaging—every message arrives immediately, but requires constant connectivity and resources.
- Lesson 746 — Choosing Batch vs StreamLesson 1726 — Aggregation and Reporting
- Stream Processing Layer
- All data flows through a stream processing framework that handles both real-time and historical data
- Lesson 752 — Kappa Architecture Overview
- Stream processing minimizes latency
- by handling records immediately as they arrive.
- Lesson 740 — Latency vs Throughput Tradeoffs
- Stream to Object Storage
- Lesson 1560 — Handling Large Pastes Efficiently
- Streaming applications
- (video calls, live audio, gaming) need the server to maintain buffers, connection quality metrics, and synchronization state for each active session.
- Lesson 62 — When Stateful Services Are Necessary
- Streamlined accepts
- The leader sends accept requests for new log entries without repeating prepare, as long as it remains unchallenged
- Lesson 616 — Multi-Paxos for Log Replication
- Strengths
- Lesson 1373 — Chain Replication
- Strict consistency needed
- (financial transactions, inventory): Lean toward **synchronous** or **quorum-based replication**.
- Lesson 1364 — Choosing a Replication Mode
- Strict consistency rules
- Database constraints that must never be violated
- Lesson 322 — Transaction Requirements and Trade-offs
- Strict serializability
- is the marriage of both: transactions execute in a serial order *that respects real-time*, meaning if transaction T1 commits before T2 begins in wall-clock time, T1 must appear before T2 in the serial order.
- Lesson 525 — Strict Serializability
- String formatting
- for timestamps, numbers, and escaped characters
- Lesson 1143 — Performance Impact of Structured Logging
- Strings
- are the most basic type—text or binary data stored as-is.
- Lesson 341 — Data Types and Value ComplexityLesson 1808 — Redis Data Structures for Rate Limiting
- strong consistency
- every read gets the most recent write, no exceptions.
- Lesson 15 — Consistency Requirements and TradeoffsLesson 39 — Trade-offs Over Best PracticesLesson 157 — Active Invalidation on WriteLesson 167 — Cache Coherence in Distributed SystemsLesson 308 — Strong Consistency by DefaultLesson 331 — What NewSQL IsLesson 365 — Tunable Consistency with Quorum Reads and WritesLesson 443 — BigTable Overview and Motivation (+13 more)
- Strong consistency is non-negotiable
- → Single-leader or chain replication
- Lesson 1376 — Topology Selection Tradeoffs
- Strong consistency needed
- → Database internal caching (buffer pool)
- Lesson 130 — Choosing the Right Caching Layer
- Strong consistency needed immediately
- You must know the outcome before proceeding
- Lesson 654 — When to Use Async vs Sync
- Strong guarantees
- (bank balances, inventory counts, distributed locks) → **CP**
- Lesson 503 — Choosing Between CP and AP
- Strong read
- "Give me data only after checking all replicas"
- Lesson 1398 — Consistency Level Per-Operation
- Strong Transactional Guarantees
- Lesson 320 — When SQL Is the Right Choice
- Strongly consistent reads
- Guarantees read-your-writes, but higher latency and less availability during partitions
- Lesson 554 — Consistency Model Examples in Real Systems
- structure
- of your entire graph to answer questions like "Who's most influential?
- Lesson 468 — Graph Algorithms: PageRank and CentralityLesson 1034 — Database Patterns for Idempotency
- Structured logging
- Pre-structured formats (JSON) avoid repeated serialization overhead.
- Lesson 1133 — Logging Performance ImpactLesson 1136 — Logging Libraries and StandardsLesson 1137 — What is Structured Logging
- Structured Streaming
- (the newer API) can use either micro-batching or **continuous processing** for true low-latency streaming.
- Lesson 769 — Spark Streaming and Structured Streaming
- Sub-millisecond needs
- → In-memory application cache (fastest)
- Lesson 130 — Choosing the Right Caching Layer
- subscribers
- listening to that channel via `SUBSCRIBE channel`.
- Lesson 357 — Redis Pub/Sub for Real-Time MessagingLesson 656 — Pub-Sub Pattern FundamentalsLesson 674 — Google Cloud Pub/Sub Architecture
- Substitute variables
- Pass `{name: "Carlos", orderId: "54321"}`
- Lesson 1701 — Template Service for Content
- Success rates
- show what percentage of health checks are passing.
- Lesson 107 — Monitoring Health Check Metrics
- Success threshold
- How many consecutive passes restore the server (e.
- Lesson 103 — Marking Servers UnhealthyLesson 106 — Health Check False Positives and FlappingLesson 1052 — Circuit Breaker Reset Logic
- Support contract-first design
- where you define the API before implementation
- Lesson 1885 — API Documentation with OpenAPI/Swagger
- Support multiple client types
- (mobile, web, partners) with different needs
- Lesson 882 — Request and Response Transformation
- Supports ranking
- Store metadata (frequency, popularity) at terminal nodes to rank suggestions
- Lesson 1758 — Trie Data Structure for Prefix Matching
- Surrogate Keys (Cache Tags)
- Lesson 1617 — Cache Invalidation and Purging
- Survivability
- If a node fails, the system automatically promotes replicas and continues operating without data loss—true high availability without manual intervention.
- Lesson 334 — CockroachDB and Distributed SQL
- Sustained load
- Ensure long-running traffic doesn't cause memory leaks or state drift
- Lesson 997 — Testing and Monitoring Rate Limiters
- sweet spot
- between what users need, what you can afford, and what your system can reliably deliver.
- Lesson 1276 — Setting Realistic SLOsLesson 1652 — Fanout Worker Parallelization
- Switch to simpler alternatives
- (serve cached data instead of fresh queries)
- Lesson 1083 — Graceful Degradation Strategies
- Switchover Logic
- Once failure is detected, the mechanism updates routing (DNS, load balancer configuration, virtual IP reassignment) to point traffic to the backup.
- Lesson 1335 — Failover Mechanisms
- Symptoms vs Causes
- alerting fires on what users feel, not what machines think
- Lesson 1313 — Monitoring and Observability for SRE
- Synchronization delay
- Propagating counter updates takes milliseconds to seconds
- Lesson 1791 — Single Data Center vs Distributed Setup
- Synchronized deployments
- If you must deploy services A, B, and C simultaneously or risk breaking things, they're a distributed monolith.
- Lesson 824 — Avoiding Distributed Monoliths
- Synchronous
- The chef stops cooking, walks to the office, writes in the logbook, and returns.
- Lesson 1134 — Synchronous vs Asynchronous LoggingLesson 1354 — Synchronous vs Asynchronous ReplicationLesson 1364 — Choosing a Replication ModeLesson 1365 — Single-Leader Replication Topology
- Synchronous I/O operations
- that block your application threads while waiting for logs to be written.
- Lesson 1170 — Performance Impact of Logging
- Synchronous logging
- blocks your application thread until the log message is written to disk or sent over the network.
- Lesson 1134 — Synchronous vs Asynchronous LoggingLesson 1143 — Performance Impact of Structured Logging
- Synchronous processing
- means the upload request waits until all operations complete—validation, virus scanning, thumbnail generation, and storage.
- Lesson 1598 — Synchronous vs Asynchronous Processing
- synchronous replication
- waits for *all* replicas to confirm writes (slow but safe), while **asynchronous replication** doesn't wait for any confirmations (fast but risky)?
- Lesson 205 — Semi-Synchronous ReplicationLesson 217 — Semi-Synchronous Replication Trade-offsLesson 509 — Latency: The Hidden Cost of CAPLesson 1354 — Synchronous vs Asynchronous ReplicationLesson 1414 — RPO Zero: Synchronous Replication
- synchronously
- before returning success to the client.
- Lesson 134 — Write-Through Caching PatternLesson 528 — Single-Leader Replication for Strong ConsistencyLesson 1559 — Write Path: Synchronous vs Asynchronous StorageLesson 1651 — Asynchronous Fanout Processing
- System resources
- CPU utilization, memory pressure, connection pool saturation
- Lesson 993 — Adaptive Rate Limiting
- System understanding
- Logs reveal actual runtime behavior versus what you *think* the code does.
- Lesson 1127 — What is Logging and Why It Matters
T
- Table bloat
- Large TEXT/BLOB columns make tables enormous, slowing down backups and maintenance
- Lesson 1550 — Object Storage for Paste Content
- tablets
- , which are horizontal partitions containing contiguous ranges of rows (sorted by row key).
- Lesson 440 — BigTable Tablets and Tablet ServersLesson 444 — Data Model: Sparse, Distributed, Multi- Dimensional MapLesson 445 — Tablet Architecture and Distribution
- Tag/Category Index
- Lesson 1563 — Indexing for Ownership and Search
- tags
- (indexed metadata like `host=server1`) and **fields** (measured values like `cpu=45.
- Lesson 1203 — InfluxDB and Time-Series DatabasesLesson 1225 — Span Attributes and TagsLesson 1233 — Span Tags and Attributes
- tail
- , with each replica forwarding the write to the next.
- Lesson 1362 — Chain ReplicationLesson 1373 — Chain Replication
- Tail acknowledges
- → Only when tail applies the update does it respond to the client
- Lesson 1373 — Chain Replication
- tail latency
- those rare but painful slow requests that hurt user experience.
- Lesson 1031 — Hedged Requests and Speculative ExecutionLesson 1188 — Percentiles and Tail Latencies
- Take a snapshot
- Periodically, a server serializes its current state machine state (e.
- Lesson 632 — Log Compaction: Snapshotting
- Tamper-evident storage
- Cryptographic proof logs weren't modified
- Lesson 944 — Auditing and Compliance for Authorization
- Target: < 1ms overhead
- Lesson 1784 — Non-Functional Requirements: Latency and Availability
- Target: 99.99%+ uptime
- Lesson 1784 — Non-Functional Requirements: Latency and Availability
- Task Distribution
- A producer adds work items to the queue—image resize jobs, email sends, report generation, anything time-consuming.
- Lesson 659 — Queue Use Cases: Work Distribution
- TaskManagers
- are the worker nodes that execute the actual stream processing logic.
- Lesson 770 — Apache Flink Architecture
- tasks
- across the cluster.
- Lesson 721 — Kafka Connect FrameworkLesson 766 — Apache Airflow FundamentalsLesson 768 — Apache Spark Overview
- TCP Optimization
- The edge server is geographically close to you, so the initial TCP connection completes quickly.
- Lesson 186 — Dynamic Content Acceleration
- Team autonomy
- Each team controls its own deployment schedule
- Lesson 786 — Independent Deployability of MicroservicesLesson 794 — Team Autonomy and OwnershipLesson 904 — BFF vs Single Gateway Tradeoffs
- Team Capabilities
- Can you support multiple independent teams?
- Lesson 826 — Decision Framework for Microservices Adoption
- Team Expertise
- Lesson 119 — Choosing Load Balancer TechnologyLesson 901 — Choosing the Right API Gateway Technology
- Team Skills
- Lesson 765 — Choosing Lake vs Warehouse
- Team structure supports it
- You have separate frontend teams with backend capabilities who can own their BFFs end-to-end.
- Lesson 908 — When to Use BFF Pattern
- Technology Flexibility
- Different services can use different programming languages, databases, or frameworks based on what fits best for that specific capability.
- Lesson 781 — What are Microservices?
- technology heterogeneity
- the freedom for each service to use different programming languages, frameworks, and databases optimized for its particular use case.
- Lesson 787 — Technology Heterogeneity in MicroservicesLesson 792 — Technology HeterogeneityLesson 799 — Progressive Technology Adoption
- Technology Lock-In
- The entire system typically uses one language and framework.
- Lesson 785 — When Monoliths Become Problematic
- Template definitions
- with variable placeholders like `{username}` or `{amount}`
- Lesson 1701 — Template Service for Content
- Template Service
- acts as the central repository for all notification content across channels (push, email, SMS).
- Lesson 1701 — Template Service for Content
- Temporal
- is a workflow orchestration platform that treats long-running sagas as durable workflows.
- Lesson 598 — Saga Frameworks and Real-World Adoption
- Temporal complexity
- Events happen in distributed time, not sequential order
- Lesson 807 — Debugging and Troubleshooting
- Temporal Data
- Lesson 1642 — Post Metadata and Schema DesignLesson 1895 — Default Sorting and Index Alignment
- Temporal decoupling
- Services don't need to be available simultaneously
- Lesson 654 — When to Use Async vs Sync
- Temporal hotspots
- occur when recent data gets disproportionate access.
- Lesson 234 — Data Distribution and Hotspots
- Temporal Index
- Lesson 1563 — Indexing for Ownership and Search
- temporal locality
- the pattern where recently accessed data is likely to be accessed again soon.
- Lesson 146 — Least Recently Used (LRU)Lesson 153 — Choosing an Eviction Policy
- Temporal Ordering
- Every event in a stream has a strict position in time.
- Lesson 692 — Streams vs Traditional Databases
- Temporal patterns
- With time-based sharding, the "current" shard (today's data) receives all writes while historical shards sit cold.
- Lesson 256 — Hotspots and Uneven Data DistributionLesson 997 — Testing and Monitoring Rate LimitersLesson 1482 — The Hot Partition Problem
- Temporary job queues
- Lesson 141 — Cache-as-SoR (System of Record) Pattern
- Temporary Storage
- Messages persist in the queue until successfully consumed, surviving consumer crashes or temporary unavailability.
- Lesson 647 — Message Queue Basics
- Temporary tokens
- Authentication tokens that self-destruct after a time window
- Lesson 343 — Time-to-Live and Expiration
- Temporary unavailability
- Service restarting or experiencing a brief GC pause
- Lesson 1020 — Why Retries Are Necessary in Distributed Systems
- Tenant A
- might use Google Workspace, **Tenant B** relies on Azure AD, and **Tenant C** manages users internally.
- Lesson 932 — Multi-Tenant OAuth2 and Identity FederationLesson 1819 — Per-Tenant Configuration Storage
- Tenant affinity
- All requests for a tenant hit the same node and Redis shard
- Lesson 1822 — Scaling Rate Limiter Horizontally
- Tenant B
- relies on Azure AD, and **Tenant C** manages users internally.
- Lesson 932 — Multi-Tenant OAuth2 and Identity FederationLesson 1819 — Per-Tenant Configuration Storage
- Tenant ID
- (multi-tenant SaaS applications)
- Lesson 244 — Entity-Based ShardingLesson 1237 — Baggage and Cross-Cutting ConcernsLesson 1819 — Per-Tenant Configuration Storage
- Tenant isolation
- is enforced through namespaces and authorization rules
- Lesson 860 — Multi-Cluster and Multi-Tenancy
- Tenant-level limits
- protect multi-tenant systems where one organization shouldn't affect others (e.
- Lesson 973 — Multi-Tier Rate Limiting
- Tenant-specific IdP mapping
- Store configurations per tenant—which IdP they use, OAuth2 client credentials, redirect URIs, and custom claim mappings.
- Lesson 932 — Multi-Tenant OAuth2 and Identity Federation
- term
- of its last log entry
- Lesson 628 — Election Restriction: Up-to-Date CheckLesson 1743 — What Is an Inverted Index
- Term comparison first
- Higher term number = more up-to-date
- Lesson 627 — Safety: Leader Completeness Property
- Term frequency
- how often the term appears in that document
- Lesson 1735 — Inverted Index StructureLesson 1745 — Posting Lists and Document IDs
- Term frequency (TF)
- How many times the term appears in that document
- Lesson 1736 — Posting Lists and Document IDsLesson 1740 — TF-IDF Scoring Fundamentals
- term number
- (the election period)
- Lesson 621 — Leader Election: RequestVote RPCLesson 623 — Log Structure and Entries
- Term-based sharding
- partitions by terms themselves—each shard handles a specific range of vocabulary (e.
- Lesson 1753 — Distributed Index Sharding
- termination
- (the decision completes eventually)—all while nodes crash and networks partition.
- Lesson 599 — What Is Distributed Consensus?Lesson 608 — The Problem Paxos Solves
- Test at different times
- including off-hours when fewer experts are available
- Lesson 1438 — DR Testing Strategies
- Test different failure scenarios
- (single file, full database, entire region)
- Lesson 1430 — Backup Verification and Testing
- Test realistic scenarios
- Simulate actual disaster conditions, not just happy paths
- Lesson 1408 — Backup Verification and Testing
- Test under failure
- Simulate downstream delays and measure resource consumption at various timeout values.
- Lesson 1091 — Default Timeout Pitfalls
- Test-on-borrow
- Execute a lightweight query (like `SELECT 1`) before giving the connection to the application.
- Lesson 271 — Connection Validation and Stale Connections
- Test-on-return
- Validate when returning a connection to the pool, marking bad ones for removal.
- Lesson 271 — Connection Validation and Stale Connections
- Test-while-idle
- Periodically check idle connections in the background.
- Lesson 271 — Connection Validation and Stale Connections
- Testing
- goes further—it proves you can actually *restore* from backups:
- Lesson 1408 — Backup Verification and TestingLesson 1430 — Backup Verification and TestingLesson 1690 — Channel Provider Abstraction
- Testing alerts
- means deliberately triggering conditions that should generate alerts, verifying they reach the right people, and confirming responders know how to act.
- Lesson 1295 — Testing Alerts and Dry Runs
- Testing and debugging
- Replay production events in a test environment to reproduce issues safely.
- Lesson 695 — Stream Retention and Replay
- Thanos
- extends Prometheus by uploading data blocks to cheap object storage (S3, GCS).
- Lesson 1206 — Metrics Federation and Long-Term Storage
- Then consider latency tolerance
- Lesson 1364 — Choosing a Replication Mode
- They multiply load
- When one service fails, retries from upstream services create even more pressure on remaining healthy instances
- Lesson 1077 — What is a Cascading Failure
- They're hard to trace
- The root cause may be buried under layers of dependent failures
- Lesson 1077 — What is a Cascading Failure
- Think like this
- An e-commerce shopping cart can tolerate eventual consistency (items may briefly appear/disappear across sessions).
- Lesson 1399 — Consistency Pattern Tradeoffs in Practice
- Think of it like
- Organizing your closet into sections—shoes, shirts, pants—all still in one physical closet.
- Lesson 67 — Partitioning vs Sharding TerminologyLesson 596 — Forward Recovery vs Backward RecoveryLesson 1363 — Statement-Based vs Row-Based ReplicationLesson 1417 — Hot Standby vs Cold StandbyLesson 1917 — gRPC: Protocol Buffers and Binary RPC
- Third Normal Form (3NF)
- Remove transitive dependencies—non-key attributes depend only on the primary key, not on other non-key attributes
- Lesson 302 — Normalization Fundamentals
- Third retry
- Wait 400ms
- Lesson 1564 — Retrieval Error Handling and FallbacksLesson 1695 — Fallback and Retry Logic
- Third-party integrations
- Your app needs to post to users' Twitter accounts without storing their Twitter passwords.
- Lesson 920 — OAuth2 Fundamentals and Use Cases
- Thread Pool Bulkheads
- Fully isolated execution (via CompletableFuture)
- Lesson 1075 — Implementing Bulkheads in Practice: Hystrix and Resilience4j
- Thread pool size
- (threads in use at this instant)
- Lesson 1175 — Gauge MetricsLesson 1184 — Gauge Metrics
- Thread pools
- (separate pools per service)
- Lesson 1067 — Bulkhead Pattern: Isolating Resources to Prevent Total Failure
- Threshold checking
- – After each failure, it checks: "Have we hit our failure threshold yet?
- Lesson 1045 — The Three States: Closed
- Threshold-based
- Replicate after N requests from a specific region
- Lesson 1631 — Multi-Region Replication Strategy
- Threshold-based switching
- Automatically transition users to pull-based delivery once they exceed a follower threshold (e.
- Lesson 1640 — Celebrity Problem in Push Models
- throttling
- may slow down or delay requests that exceed thresholds rather than rejecting them outright.
- Lesson 885 — Rate Limiting and ThrottlingLesson 957 — Rate Limiting vs Throttling
- Throughput
- is how many requests the system can handle in a given time period — usually expressed as requests per second (RPS) or queries per second (QPS).
- Lesson 12 — Performance Requirements: Latency and ThroughputLesson 22 — Throughput vs LatencyLesson 120 — Caching Hierarchy OverviewLesson 677 — Message Broker Performance CharacteristicsLesson 736 — What is Batch Processing?Lesson 740 — Latency vs Throughput TradeoffsLesson 1273 — Choosing Good SLIsLesson 1707 — Processing Pipeline Monitoring (+1 more)
- Throughput increase
- Multiple servers handle queries in parallel
- Lesson 1446 — What is Data Partitioning?
- Throughput Metrics
- measure crawling velocity: pages crawled per second/minute, bytes downloaded, and URLs processed per worker.
- Lesson 1871 — Monitoring Crawler Fleet Performance
- Throughput Needs
- With millions of daily active users (DAU), you must handle peak traffic.
- Lesson 1633 — Non-Functional Requirements: Scale and Performance
- Tier 1 (Critical)
- RPO < 5 minutes, RTO < 15 minutes → synchronous replication, multi-region active-active
- Lesson 1420 — Business Impact Analysis for RPO/RTO
- Tier 1 (Hot)
- Active users get WebSocket connections with sub-second updates
- Lesson 1683 — Cost Optimization for Real-Time Features
- Tier 2 (Distributed/Remote)
- Slower but shared across services.
- Lesson 143 — Multi-Tier Caching Pattern
- Tier 2 (Important)
- RPO < 1 hour, RTO < 4 hours → asynchronous replication, hot standby
- Lesson 1420 — Business Impact Analysis for RPO/RTO
- Tier 2 (Warm)
- Recently active users receive Server-Sent Events (SSE) with delayed updates
- Lesson 1683 — Cost Optimization for Real-Time Features
- Tier 3 (Cold)
- Inactive users get no real-time connection; they pull on next app open
- Lesson 1683 — Cost Optimization for Real-Time Features
- Tier 3 (Standard)
- RPO < 24 hours, RTO < 24 hours → daily backups, cold standby
- Lesson 1420 — Business Impact Analysis for RPO/RTO
- Tiered rate limiting
- means assigning different rate limit policies based on user class—typically determined by subscription level, authentication status, or organizational size.
- Lesson 990 — Tiered Rate Limits for Different User Classes
- Tight Coupling
- Lesson 780 — Characteristics of Monolithic SystemsLesson 785 — When Monoliths Become ProblematicLesson 907 — BFF Anti-Patterns and Pitfalls
- Tight Coupling Despite Separation
- Lesson 823 — Signs You're Over-Decomposing Services
- Tight coupling manifests as
- Lesson 789 — The Distributed Monolith Anti-Pattern
- Tight memory budget
- Use **Fixed Window Counter** (single counter) or **Leaky Bucket** (timestamp + counter)
- Lesson 975 — Algorithm Selection Criteria
- time
- (e.
- Lesson 549 — Bounded StalenessLesson 741 — Windowing in Stream ProcessingLesson 1358 — Replication Lag in Async SystemsLesson 1421 — Full Backup StrategyLesson 1649 — The Celebrity Problem in Fanout
- Time of day
- (route to regional on-call during business hours)
- Lesson 1292 — Alert Routing and Escalation
- Time To Live (TTL)
- tells clients how long to cache a DNS response.
- Lesson 116 — DNS-Based Load Balancing
- Time travel
- Replay events to see what state was at any point in the past
- Lesson 691 — Events as First-Class Citizens
- time window
- you allow *N* requests per *time period*.
- Lesson 961 — Time Windows for Rate LimitsLesson 1819 — Per-Tenant Configuration StorageLesson 1824 — Tiered Rate Limiting
- Time zone differences
- A batch job in one region processes messages created hours earlier in another region
- Lesson 650 — Temporal Decoupling
- Time-based
- simplifies adding new shards for new time periods
- Lesson 253 — Evaluating Sharding Strategy TradeoffsLesson 549 — Bounded StalenessLesson 695 — Stream Retention and Replay
- Time-Based Expiration
- Lesson 1617 — Cache Invalidation and Purging
- Time-Based Expiration (TTL)
- concept you learned earlier, but instead of blocking on expiration, it serves stale data gracefully.
- Lesson 162 — Stale-While-Revalidate Pattern
- Time-based expiry (TTL)
- Accept eventual consistency—caches expire after a set time, guaranteeing freshness within that window.
- Lesson 128 — Cache Coherence Across Layers
- Time-based retention
- Keep messages for a specified duration (e.
- Lesson 711 — Message Retention and Log Segments
- Time-consuming
- – backing up terabytes fully each night may exceed your maintenance window
- Lesson 1402 — Full Backups
- Time-of-Day Patterns
- If timeouts spike every morning at 9 AM, you have a capacity problem during peak load, not a random failure.
- Lesson 1124 — Timeout Metrics and Anomaly Detection
- Time-ordered results
- Newest posts first without expensive sorting
- Lesson 1661 — Timeline Schema Design
- Time-series data at scale
- When tracking billions of sensor readings, user events, or financial ticks with random access patterns, these systems excel.
- Lesson 442 — When to Use HBase or BigTable
- Time-series database
- (InfluxDB, TimescaleDB) for efficient time-based queries
- Lesson 1530 — Analytics and Click Tracking
- Time-series logs
- Shard key = `(device_id, timestamp)` → groups device logs together, spreads load across devices.
- Lesson 245 — Composite Shard Keys
- Time-to-first-byte (TTFB)
- benchmarks for your target regions
- Lesson 191 — CDN Provider Feature Comparison
- Time-to-Live (TTL)
- is a mechanism in key-value stores that automatically expires and deletes keys after a defined duration.
- Lesson 343 — Time-to-Live and ExpirationLesson 1523 — Caching Layer ArchitectureLesson 1810 — Counter Expiration and TTL Management
- Time-Window Compaction (TWCS)
- Lesson 428 — Compaction Strategies
- Timed drills
- Measure how long restoration actually takes (crucial for meeting RTO targets)
- Lesson 1408 — Backup Verification and Testing
- Timeline
- Minute-by-minute sequence of events (when alerts fired, what actions were taken, what commands were run)
- Lesson 1304 — Blameless PostmortemsLesson 1350 — What is a Postmortem?Lesson 1352 — Postmortem Structure and Action Items
- timeout
- (how long to wait for a response)—require careful balancing.
- Lesson 100 — Health Check Intervals and TimeoutsLesson 1086 — What Timeouts Are and Why They MatterLesson 1108 — What is Deadline Propagation
- Timeout budget management
- means intelligently dividing the *remaining time* among all downstream dependencies so every hop has a realistic chance to succeed.
- Lesson 1119 — Timeout Budget Management Across Service Chains
- Timeout Distribution by Endpoint
- Different operations have different normal timeout rates.
- Lesson 1124 — Timeout Metrics and Anomaly Detection
- Timeout Duration Patterns
- Are timeouts happening exactly at your configured limit?
- Lesson 1124 — Timeout Metrics and Anomaly Detection
- Timeout limits
- External sites may be slow; cap fetch time at 5-10 seconds
- Lesson 1538 — Link Preview and Metadata
- Timeout propagation failures
- Do downstream timeouts respect the remaining deadline budget?
- Lesson 1125 — Timeout Testing and Chaos Engineering
- Timeout Rate
- The percentage of requests that time out.
- Lesson 1124 — Timeout Metrics and Anomaly Detection
- Timeout simulation
- Inject artificial delays that exceed your timeout threshold
- Lesson 1065 — Testing Circuit Breaker Behavior
- Timeouts
- Proxies enforce maximum wait times for responses, preventing services from hanging indefinitely when downstream dependencies slow down.
- Lesson 839 — Data Plane: Proxy Responsibilities
- Timers/Histograms
- calculates percentiles, mean, max, min
- Lesson 1201 — StatsD and Metric Aggregation Daemons
- timestamp
- (or sequence number).
- Lesson 216 — Timestamp-Based Consistency ChecksLesson 390 — BSON Format and Data TypesLesson 944 — Auditing and Compliance for AuthorizationLesson 1226 — Span Events and LogsLesson 1505 — Analytics and Tracking RequirementsLesson 1511 — Distributed ID GenerationLesson 1711 — Idempotency Keys for Notifications
- Timestamp-based keys
- Log entries with `timestamp` as the partition key
- Lesson 1474 — Hotspot Problems in Range Partitioning
- Timestamp-based routing
- Tag writes with timestamps.
- Lesson 1359 — Read-Your-Writes Consistency with Replicas
- Timestamps
- track versions of the same cell
- Lesson 444 — Data Model: Sparse, Distributed, Multi-Dimensional MapLesson 1211 — Avoiding High- Cardinality Labels
- Timestamps (Time-Based)
- Lesson 212 — Measuring Replication Lag
- Timing
- Some layers may invalidate faster than others, creating temporary inconsistencies
- Lesson 163 — Multi-Level Cache InvalidationLesson 1624 — Thumbnail and Preview Generation
- Timing matters
- Trigger the hedge too early and you waste resources; too late and you've already suffered the delay
- Lesson 1031 — Hedged Requests and Speculative Execution
- TLS termination
- for encrypted communication
- Lesson 840 — Data Plane: Envoy Proxy FundamentalsLesson 841 — Data Plane: Performance and Latency Overhead
- To improve availability
- , you have two levers:
- Lesson 1325 — Availability Formula: MTBF and MTTR Relationship
- Together
- Bulkheads limit blast radius through isolation, while circuit breakers detect failures quickly and stop wasting resources.
- Lesson 1074 — Bulkheads vs Circuit Breakers: Complementary PatternsLesson 1354 — Synchronous vs Asynchronous Replication
- Toil
- is the repetitive, manual work that doesn't add long-term value.
- Lesson 1288 — Alert Fatigue and ToilLesson 1308 — The SRE Philosophy: Treating Operations as SoftwareLesson 1311 — Toil: The Enemy of ScaleLesson 1312 — Measuring and Reducing ToilLesson 1350 — What is a Postmortem?
- Token bucket
- *allows bursts* up to the bucket capacity.
- Lesson 966 — Token Bucket vs Leaky BucketLesson 975 — Algorithm Selection CriteriaLesson 990 — Tiered Rate Limits for Different User ClassesLesson 1691 — Rate Limits per ChannelLesson 1697 — API Layer and Rate LimitingLesson 1798 — Token Bucket Implementation in RedisLesson 1813 — Memory Footprint per User and Limits
- Token claim normalization
- External IdPs return different claim structures.
- Lesson 932 — Multi-Tenant OAuth2 and Identity Federation
- Token exchange
- Client sends the original `code_verifier` (not the hash)
- Lesson 923 — PKCE: Proof Key for Code Exchange
- Token introspection
- means the resource server asks the authorization server "Is this token still valid?
- Lesson 927 — Token Introspection and Validation
- Token Issuance
- Server validates credentials and generates a signed token (often a JWT—JSON Web Token)
- Lesson 912 — Token-Based Authentication Fundamentals
- Token propagation delays
- In service-to-service calls, ensure refreshed tokens cascade properly through the call chain, typically by updating headers in flight.
- Lesson 946 — Token Refresh in Distributed Systems
- Token Request
- Your backend server exchanges this code for actual access tokens by making a **server-to-server request**, including your app's secret credentials (client ID + client secret).
- Lesson 922 — Authorization Code Flow
- Token revocation propagation
- When you revoke a token (user logs out or password changes), that revocation must reach all regions.
- Lesson 952 — Cross-Region Authentication
- Token Scoping
- Not all operations need system-wide uniqueness.
- Lesson 1042 — Idempotency vs Performance Tradeoffs
- Token validation across regions
- If you use JWT tokens, every region needs access to the same signing keys.
- Lesson 952 — Cross-Region Authentication
- Tokenization
- transforms unstructured text into discrete, normalized units (tokens) that can be indexed and matched against user queries.
- Lesson 1734 — Tokenization and Text AnalysisLesson 1738 — Query Processing FlowLesson 1744 — Building an Inverted Index: TokenizationLesson 1778 — Multi-Language Search Support
- Tokens
- embed authentication state *in* the credential itself.
- Lesson 916 — Session vs Token TradeoffsLesson 920 — OAuth2 Fundamentals and Use CasesLesson 1733 — Document Processing Pipeline
- Tolerate small skew
- Add safety margins to deadlines (e.
- Lesson 1114 — Clock Skew and Time Synchronization
- Tombstones
- are deletion markers that accumulate when you frequently delete data.
- Lesson 432 — Data Modeling Best Practices
- Too large
- Risk exceeding global limits significantly
- Lesson 986 — Local Rate Limiting with Overage BuffersLesson 1652 — Fanout Worker Parallelization
- Too many labels
- Millions of unique series crash your metrics system.
- Lesson 1214 — Tagging Strategy for Filtering
- Too small
- Still see false rejections during sync delays
- Lesson 986 — Local Rate Limiting with Overage BuffersLesson 1652 — Fanout Worker Parallelization
- Tooling compatibility
- Some API frameworks and generators expect specific conventions
- Lesson 1877 — Singular vs Plural Resource Names
- Top-K pre-computation
- Instead of scoring suggestions at query time, pre-compute and store the top-K results (e.
- Lesson 1776 — Typeahead Index Optimization
- topic
- is like the newspaper title (e.
- Lesson 656 — Pub-Sub Pattern FundamentalsLesson 658 — Topic Subscriptions and FilteringLesson 663 — Hybrid Patterns: Topic + QueueLesson 666 — RabbitMQ Architecture FundamentalsLesson 700 — Kafka Overview and Core ComponentsLesson 701 — Topics and Partitions
- Topic subscriptions with filtering
- let each subscriber define rules about which messages they actually want to receive.
- Lesson 658 — Topic Subscriptions and Filtering
- Topic-based routing
- Messages are categorized by topic, not destination
- Lesson 656 — Pub-Sub Pattern Fundamentals
- topics
- .
- Lesson 656 — Pub-Sub Pattern FundamentalsLesson 674 — Google Cloud Pub/Sub ArchitectureLesson 675 — Azure Service Bus Features
- Total daily queries
- = DAU × actions per user per day
- Lesson 23 — QPS and Daily Active Users Estimation
- Total database queries executed
- Lesson 1174 — Counter Metrics
- Total order
- All operations appear to happen in a single, global sequence
- Lesson 523 — Linearizability DefinedLesson 633 — ZooKeeper: Coordination Service Built on Consensus
- Total ordering
- All operations appear to happen in a strict global order
- Lesson 484 — Consistency in CAP Context
- Total per URL
- ~607 bytes, round to **1 KB** for indexes and overhead
- Lesson 1498 — Storage Capacity Estimation
- Total requests
- attempted in the current window
- Lesson 1029 — Retry Budgets and Rate LimitingLesson 1329 — Partial Availability and Graceful Degradation
- Total URLs
- 100M/day × 365 days × 10 years = **365 billion URLs**
- Lesson 1498 — Storage Capacity Estimation
- Total write time
- 10ms × 10M = ~28 hours of sequential work
- Lesson 1640 — Celebrity Problem in Push Models
- Total: ~$2.67M/month
- Lesson 30 — CDN Bandwidth and Cost Estimation
- trace
- is the entire race from start to finish.
- Lesson 855 — Observability: Distributed TracingLesson 1219 — What is Distributed Tracing?Lesson 1221 — Traces, Spans, and Parent-Child Relationships
- Trace context
- is the set of identifiers that gets passed along with each request, enabling this linkage.
- Lesson 1230 — Trace Context FundamentalsLesson 1237 — Baggage and Cross-Cutting ConcernsLesson 1238 — Span Sampling Decisions
- trace ID
- and attaches it to the request.
- Lesson 855 — Observability: Distributed TracingLesson 1146 — Correlation IDs Across ServicesLesson 1168 — Correlation with Metrics and TracesLesson 1222 — Trace ID and Span IDLesson 1229 — Service Dependency GraphsLesson 1230 — Trace Context Fundamentals
- trace IDs
- (identifies the whole request) and **span IDs** (identifies individual operations).
- Lesson 1219 — What is Distributed Tracing?Lesson 1249 — Integrating Traces with Logs and Metrics
- Trace sampling
- means intentionally recording only a subset of traces—say 1% or 0.
- Lesson 1228 — Trace Sampling Fundamentals
- Traces
- Distributed tracing spans showing how a single request flows across multiple services
- Lesson 845 — Control Plane: Telemetry CollectionLesson 1173 — Metrics vs Logs vs TracesLesson 1268 — Monitoring Data Sources: Metrics, Logs, Traces
- Traces → Logs
- When instrumenting your application, inject the current trace ID and span ID into your logging context.
- Lesson 1249 — Integrating Traces with Logs and Metrics
- Traces → Metrics
- Add trace ID or service-level identifiers as metric labels (carefully, to avoid cardinality explosion).
- Lesson 1249 — Integrating Traces with Logs and Metrics
- Track access patterns
- Monitor where requests originate and which files are frequently accessed
- Lesson 1631 — Multi-Region Replication Strategy
- Track client writes
- The client remembers the timestamp/version of its last write and only reads from replicas caught up to at least that point.
- Lesson 542 — Read-Your-Writes Consistency
- Track metadata
- Keep records of where each tablet lives and which tablet servers are active
- Lesson 447 — Master Server and Metadata Management
- Track recent latencies
- in a sliding time window (e.
- Lesson 1117 — Adaptive Timeouts Based on Historical Latency
- Track remaining time
- Calculate deadline minus current time
- Lesson 1119 — Timeout Budget Management Across Service Chains
- Track write timestamps
- The client remembers when they last wrote.
- Lesson 1390 — Read-Your-Writes Consistency
- Tracked
- Added to your project management system with deadlines
- Lesson 1352 — Postmortem Structure and Action Items
- Tracks elapsed time
- as the request travels hop-to-hop
- Lesson 1101 — Timeout Propagation in Service Meshes
- Trade-off
- You exchange write complexity and slight staleness for dramatic read speed improvements.
- Lesson 284 — Aggregation Query OptimizationLesson 375 — Sloppy Quorum and Hinted HandoffLesson 589 — Saga Fundamentals: Local Transactions and CompensationsLesson 1250 — Trace Collector and Agent PatternsLesson 1370 — Multi-Leader Topologies: Star and CircularLesson 1388 — Conflict Avoidance Through DesignLesson 1541 — Sharding and Database ScalingLesson 1760 — Top-K Suggestions Problem (+2 more)
- Trade-off: precision vs efficiency
- You lose exact values but gain efficient storage and queryability across millions of requests.
- Lesson 1185 — Histogram Metrics
- Trade-offs
- More complexity, potential for stale data across tiers, and cache coherence challenges.
- Lesson 143 — Multi-Tier Caching Pattern
- Trade-offs Over Best Practices
- from earlier—there's no universal "best")
- Lesson 42 — Document Your Decisions
- Tradeoff
- Hints are stored for a limited time (typically a few hours).
- Lesson 431 — Hinted Handoff and Read RepairLesson 769 — Spark Streaming and Structured StreamingLesson 1245 — Trace Storage BackendsLesson 1439 — Data Replication for DRLesson 1709 — At-Most- Once, At-Least-Once, Exactly-Once Semantics
- Tradeoffs
- Lesson 368 — Conflict Resolution StrategiesLesson 1149 — Log Shipping vs Agent-Based CollectionLesson 1197 — Pull vs Push Metrics Collection ModelsLesson 1327 — Active-Active vs Active-Passive AvailabilityLesson 1594 — Storage Sharding by User or Time
- Traditional Databases
- Lesson 1040 — Idempotency Token Storage Strategies
- Traditional hashing
- (modulo-based): roughly `K/N` keys move, where `K` is total keys—requires rehashing almost everything
- Lesson 1460 — Adding Nodes with Minimal Disruption
- Traditional RDBMS
- (PostgreSQL, MySQL with strong settings):
- Lesson 518 — PC/EC Systems: Consistency Always
- Traditional relational databases
- with synchronous replication (like PostgreSQL with synchronous standby)
- Lesson 493 — CP Systems: Prioritizing Consistency
- Traffic
- Request volume and rate over time
- Lesson 856 — Observability: Metrics CollectionLesson 1263 — Four Golden Signals: Latency, Traffic, Errors, SaturationLesson 1332 — Active-Active vs Active-Passive Redundancy
- Traffic (QPS)
- Lesson 33 — Putting It All Together: Worked Example
- Traffic Filtering
- Edge servers analyze incoming requests and block suspicious patterns (unusual request rates, malformed packets, known bad actors) before forwarding legitimate traffic to your origin.
- Lesson 195 — CDN for DDoS Protection
- Traffic management
- to gradually shift load to new versions (blue-green, canary deployments)
- Lesson 810 — Deployment ComplexityLesson 827 — What is a Service Mesh?
- Traffic Manager
- DNS-based global routing (similar to Route 53)
- Lesson 114 — Cloud Load Balancers (GCP and Azure)
- Traffic policies
- (retries, circuit breaking) apply uniformly
- Lesson 860 — Multi-Cluster and Multi-Tenancy
- Traffic rerouting
- Redirect requests away from failing components to healthy ones
- Lesson 1303 — Incident Mitigation vs Fix
- Traffic routing
- based on rules from the control plane
- Lesson 838 — Data Plane: Sidecar Proxy PatternLesson 840 — Data Plane: Envoy Proxy FundamentalsLesson 1334 — Geographic Redundancy and Multi-Region
- Traffic Shifts
- New user requests resolve to the backup location
- Lesson 1440 — DNS and Traffic Management in DR
- Traffic volume
- (bulk discounts at higher tiers)
- Lesson 30 — CDN Bandwidth and Cost EstimationLesson 1255 — Adaptive Sampling
- Train the model
- Use algorithms like LambdaMART, RankNet, or gradient-boosted trees to learn which feature combinations predict clicks
- Lesson 1781 — Machine Learning for Ranking
- Training
- Teach developers what constitutes sensitive data
- Lesson 1163 — Avoid Logging Sensitive DataLesson 1317 — Blameless Culture and Learning from Failure
- Transaction challenges
- What if the move fails halfway through?
- Lesson 263 — Shard Key Immutability Problem
- Transaction commit
- Should we commit or abort this distributed transaction?
- Lesson 599 — What Is Distributed Consensus?
- Transaction complexity
- Keeping denormalized data consistent requires larger transactions or eventual consistency patterns
- Lesson 296 — Write Amplification Costs
- Transaction isolation levels
- coordinating concurrent operations
- Lesson 308 — Strong Consistency by Default
- Transactional coupling
- between message acknowledgment and side effects
- Lesson 680 — Exactly-Once Delivery
- Transactional guarantees
- Depending on your broker and database, you might leverage distributed transactions (expensive) or design around eventual consistency with compensating actions (as you learned with Sagas).
- Lesson 688 — Transactional Semantics
- Transactional Outbox Pattern
- solves this by treating notification events as data.
- Lesson 1716 — Transactional Outbox Pattern
- Transactional semantics
- means coordinating database operations and message operations so they succeed or fail together, maintaining consistency.
- Lesson 688 — Transactional Semantics
- Transactions with mixed operations
- Even a `SELECT` inside a transaction that will later `UPDATE` should stay on primary for consistency
- Lesson 223 — Detecting Read vs Write Queries
- Transform
- field values (normalize timestamps, convert data types)
- Lesson 1151 — The ELK Stack: Logstash
- Transformation Layer
- Lesson 1908 — Database Schema Evolution with API Versions
- Transformations
- Apply functions to each event (e.
- Lesson 722 — Kafka Streams APILesson 768 — Apache Spark Overview
- Transient errors
- are temporary hiccups that might succeed if you try again:
- Lesson 1026 — Retry on Which ErrorsLesson 1048 — Failure Thresholds and Detection
- transient failures
- favor forward recovery, **permanent failures** require backward recovery.
- Lesson 596 — Forward Recovery vs Backward RecoveryLesson 1020 — Why Retries Are Necessary in Distributed Systems
- Transparency
- Users know what data they're sharing
- Lesson 930 — OAuth2 Scopes and ConsentLesson 1328 — Scheduled Maintenance and Availability Accounting
- Transparent proxying
- automatically captures all outbound network traffic from your application without any code changes.
- Lesson 831 — Transparent vs Explicit Proxying
- transparently
- you configure policies once in the control plane, and every sidecar enforces them automatically.
- Lesson 852 — Circuit Breaking at the Mesh LevelLesson 855 — Observability: Distributed Tracing
- Transport Layer
- of the OSI networking model—where TCP and UDP protocols operate.
- Lesson 109 — Layer 4 (Transport) Load BalancingLesson 1148 — Centralized Logging Architecture
- Tree Replication Topology
- , data replication follows a hierarchical structure similar to an organizational chart or family tree.
- Lesson 1374 — Tree Replication Topology
- Triage Steps
- Lesson 1299 — Runbooks and Playbooks
- trie
- (pronounced "try," from re**trie**val) is a tree-like data structure where each node represents a single character of a string.
- Lesson 1758 — Trie Data Structure for Prefix MatchingLesson 1767 — Personalized Typeahead
- Trigger actions
- based on status (retry on failure, update analytics, alert on high bounce rates)
- Lesson 1693 — Delivery Receipt Tracking
- Triggers
- or application logic that update summaries when new data arrives
- Lesson 294 — Aggregation Tables
- Troubleshooting tools
- to trace request paths and identify failures
- Lesson 846 — Control Plane: API and User Interface
- TrueTime API
- backed by atomic clocks and GPS receivers in every Google data center.
- Lesson 333 — Google Spanner Architecture
- Truly distributed
- No single point of contention for ID generation
- Lesson 1520 — Primary Key Selection: Auto-Increment vs UUID
- TSDB
- Stores time-series data efficiently on disk in compressed chunks
- Lesson 1198 — Prometheus Architecture and Data Model
- TTL
- matching your URL expiration policy
- Lesson 1539 — QR Code GenerationLesson 1664 — Timeline Caching Strategies
- TTL (Time To Live)
- controls how long DNS resolvers cache your records.
- Lesson 1440 — DNS and Traffic Management in DRLesson 1856 — DNS Resolution Fundamentals for Crawlers
- TTL (Time-To-Live)
- , sets an automatic expiration timer on cached data.
- Lesson 156 — Time-Based Expiration (TTL)Lesson 888 — Caching at the GatewayLesson 1525 — Cache Eviction Policy for URL Shortener
- TTL balancing
- Shorter TTLs (5-60 seconds) reduce staleness risk but increase PDP load.
- Lesson 951 — Caching Authorization Decisions
- TTL Management
- Lesson 185 — Purging and Cache Invalidation Strategies
- TTL Support
- Automatic cleanup of expired time windows
- Lesson 980 — Redis-Based Distributed Rate Limiting
- tunable consistency
- (ONE, QUORUM, ALL), and configurable replication strategies.
- Lesson 370 — Distributed Key-Value Store Architectures in PracticeLesson 563 — Tunable Consistency in Practice
- tune
- your consistency guarantees using three parameters:
- Lesson 365 — Tunable Consistency with Quorum Reads and WritesLesson 507 — Consistency is a Spectrum in Practice
- Two-phase commit (2PC)
- distributes the transaction across multiple nodes.
- Lesson 577 — 2PC vs Single-Node TransactionsLesson 1489 — Cross-Partition Transactions
- Type checking
- Each column accepts only its declared type (integers, strings, dates, etc.
- Lesson 301 — Schema Enforcement and Type SafetyLesson 773 — Prefect and Dagster for Modern Workflows
- Typical range
- 5-20% based on your sync frequency and acceptable variance
- Lesson 986 — Local Rate Limiting with Overage Buffers
U
- Ubiquitous Language
- Each bounded context uses consistent terminology understood by both developers and domain experts
- Lesson 815 — Domain-Driven Design and Bounded Contexts
- unavailable
- Lesson 508 — Availability is Also a SpectrumLesson 1318 — Defining Availability and Uptime
- Unbounded convergence
- means the system *will* converge, but provides no upper limit on how long it takes.
- Lesson 533 — Convergence Guarantees
- Unbounded Data
- Streams have no defined "end"—they keep flowing.
- Lesson 737 — What is Stream Processing?
- Unclear boundaries
- You don't yet know where service boundaries *should* be—premature splitting leads to the distributed monolith anti-pattern
- Lesson 820 — When a Monolith is the Right Choice
- Underutilization
- Idle resources in one bulkhead can't help an overwhelmed bulkhead next door
- Lesson 1076 — Bulkhead Tradeoffs: Complexity and Resource Overhead
- uneven data distribution
- .
- Lesson 241 — Range-Based ShardingLesson 1471 — Range Partitioning Fundamentals
- Uneven distribution
- If traffic is imbalanced, some nodes may reject requests while others sit idle
- Lesson 979 — Centralized vs Decentralized ApproachesLesson 1451 — Range-Based Partitioning
- Uneven load distribution
- – Popular users all route to the same node, creating hotspots
- Lesson 982 — Sticky Sessions and Rate Limiting
- Uneven Resource Utilization
- Lesson 77 — Why Load Balancers Are Necessary
- Unified APIs
- Write your processing logic once, then run it on historical data (batch mode) or real-time streams (streaming mode)
- Lesson 756 — Hybrid and Modern Alternatives
- Unified Runtime
- The application runs as one process (or a few identical copies for scaling).
- Lesson 779 — What is a Monolithic Architecture?
- Unified technology stack
- Team members can move freely across the codebase without context-switching
- Lesson 820 — When a Monolith is the Right Choice
- Uniform behavior
- All services retry the same way, report metrics identically
- Lesson 833 — Polyglot Microservices Support
- Uniform Technology Stack
- Lesson 780 — Characteristics of Monolithic Systems
- Unique
- across the entire system
- Lesson 1494 — Functional Requirements for a URL ShortenerLesson 1521 — Indexing Strategy for Fast Lookups
- Unique constraints
- Will this create a duplicate where none is allowed?
- Lesson 305 — Consistency Guarantees
- Unique Feature
- GCP's load balancers use Google's global network backbone, routing traffic along optimal paths *before* reaching your servers.
- Lesson 114 — Cloud Load Balancers (GCP and Azure)
- Units
- milliseconds, bytes, requests/second, percentage
- Lesson 1216 — Metric Documentation and Discovery
- Universal Parser Support
- Every programming language has mature, fast JSON libraries.
- Lesson 1138 — JSON as Log Format
- Unlimited scalability
- No capacity planning needed; add petabytes without provisioning volumes
- Lesson 1588 — Object Storage vs Block Storage
- Unlisted
- pastes are accessible to anyone *with the URL*, but aren't indexed or listed in user profiles.
- Lesson 1576 — Access Control and Privacy Settings
- Unpredictability
- Random identifiers prevent enumeration attacks
- Lesson 1516 — Counter-Based vs UUID Approaches
- Unpredictable
- (avoid sequential IDs)
- Lesson 1036 — Request Token Generation and ManagementLesson 1462 — The Uneven Distribution Problem
- Unpredictable dependencies
- Third-party APIs with unstable behavior
- Lesson 1076 — Bulkhead Tradeoffs: Complexity and Resource Overhead
- Unpredictable variance
- Adding or removing a node shifts load unpredictably
- Lesson 1462 — The Uneven Distribution Problem
- Unreliable
- (occasionally returns stale data due to replication lag or returns corrupt results from partial disk failures)
- Lesson 1322 — Availability vs Reliability: Key Differences
- Unsorted queues
- Finding the minimum is O(n) — unacceptable per fetch.
- Lesson 1847 — Heap-Based Priority Queue Implementation
- Update
- or **delete** that exact document
- Lesson 382 — Document IDs and Primary KeysLesson 387 — CRUD Operations on DocumentsLesson 1572 — Storage Tier MigrationLesson 1664 — Timeline Caching StrategiesLesson 1722 — Real-Time Preference Updates
- Update Layer
- Lesson 1477 — Directory Service Architecture
- Update metadata
- – inform the routing layer about the new boundary
- Lesson 1475 — Dynamic Range SplittingLesson 1557 — Hot vs Cold Storage Tiering
- Updates real-time views
- in fast-access stores (Redis, Cassandra, in-memory databases)
- Lesson 749 — Lambda Architecture: Speed Layer
- Upload
- Accept images (JPEG, PNG) and videos (MP4, MOV) from users
- Lesson 1584 — Image/Video Hosting: Problem Definition and Scale
- Upload bandwidth
- User posts a 2 MB photo → your servers receive it
- Lesson 26 — Bandwidth Estimation from Data Size
- Uptime
- is the actual duration your system was available during a measurement period.
- Lesson 1318 — Defining Availability and Uptime
- Urgency Level
- Lesson 1688 — Channel Selection Strategy
- URL Frontier
- (queue): Prioritized list of URLs to visit next
- Lesson 1732 — Crawling and Document CollectionLesson 1838 — URL Frontier: Definition and PurposeLesson 1840 — Politeness Requirements for Web Crawling
- URL paths
- `/api/users` goes to the user service, `/api/orders` goes to the order service
- Lesson 110 — Layer 7 (Application) Load Balancing
- URL versioning
- `/api/v1/orders` vs `/api/v2/orders` — explicit and visible, but creates route proliferation.
- Lesson 809 — Versioning and Backward Compatibility
- Usability through refresh tokens
- Lesson 926 — Access Tokens vs Refresh Tokens
- Usage
- Application executes queries while connection stays in active state
- Lesson 270 — Connection Lifecycle in a Pool
- Usage Metrics
- Lesson 1825 — Monitoring and Analytics Per Tenant
- Use a Queue when
- Lesson 664 — Choosing Between Queue and Pub-Sub
- Use an idempotency key
- to guarantee the core operation (like order creation) happens exactly once
- Lesson 1038 — Side Effect Management
- Use async logging frameworks
- that write to memory buffers and flush to disk on background threads.
- Lesson 1170 — Performance Impact of Logging
- Use asynchronous logging when
- Lesson 1134 — Synchronous vs Asynchronous Logging
- Use base units
- Prefer seconds over milliseconds, bytes over kilobytes.
- Lesson 1182 — Metric Naming Conventions
- Use case
- Last 7-30 days of backups for quick restores
- Lesson 1405 — Backup Storage TiersLesson 1439 — Data Replication for DR
- Use cases
- Lesson 89 — IP Hash AlgorithmLesson 213 — Acceptable Lag Windows by Use CaseLesson 451 — What is a Graph Database?Lesson 989 — Per-User vs Per-IP Rate LimitingLesson 1417 — Hot Standby vs Cold Standby
- Use consistent structure
- Most teams follow a hierarchical pattern like `<namespace>_<subsystem>_<metric>_<unit>`.
- Lesson 1182 — Metric Naming Conventions
- Use profiling tools
- to see which parts of your code consume the most resources
- Lesson 40 — Measure Before Optimizing
- Use Pub-Sub when
- Lesson 664 — Choosing Between Queue and Pub-Sub
- Use query timeouts
- Cancel queries exceeding a threshold (e.
- Lesson 1897 — Performance Considerations and Limits
- Use relabeling and dropping
- Lesson 1207 — Metrics Cardinality and Performance Impact
- Use relative timeouts
- Propagate "seconds remaining" instead of absolute timestamps when possible
- Lesson 1114 — Clock Skew and Time Synchronization
- Use span events
- for contextual data instead of creating new tag dimensions
- Lesson 1258 — Cardinality Explosion
- Use stale data
- temporarily (yesterday's inventory counts)
- Lesson 1083 — Graceful Degradation Strategies
- Use synchronous logging when
- Lesson 1134 — Synchronous vs Asynchronous Logging
- Use testing modes
- Many alerting systems support "test alerts" that don't page anyone
- Lesson 1295 — Testing Alerts and Dry Runs
- Use version/logical clocks
- Pass a version token with writes; only read from replicas that have processed at least that version.
- Lesson 1390 — Read-Your-Writes Consistency
- Use when
- You have a single data center and simple infrastructure.
- Lesson 424 — Replication Strategy and FactorLesson 1425 — Hot vs Cold vs Warm Backups
- User Actions
- Lesson 10 — Identifying Functional Requirements
- User activity timestamps
- Lesson 1653 — Selective Fanout Optimization
- User affinity
- (how often you interact with a specific user)
- Lesson 1644 — Feed Personalization and Ranking Requirements
- User attributes
- (department, region, subscription tier)
- Lesson 884 — Authorization and Policy EnforcementLesson 935 — Attribute-Based Access Control (ABAC) Introduction
- User authentication
- Mixed strategy (PA/EC): stay available during partitions but maintain consistency during normal ops for security
- Lesson 520 — Practical PACELC Analysis for Design Decisions
- User Authorization
- Your app redirects the user to the authorization server (e.
- Lesson 922 — Authorization Code Flow
- User context
- `user.
- Lesson 1225 — Span Attributes and TagsLesson 1237 — Baggage and Cross-Cutting ConcernsLesson 1530 — Analytics and Click TrackingLesson 1688 — Channel Selection Strategy
- User engagement
- Click-through rates, dwell time (how long users stay), and bounce rates reveal what actually satisfies users.
- Lesson 1755 — Relevance Tuning: Boosting and Signals
- User Expectations
- Start by understanding what actually impacts your users.
- Lesson 1276 — Setting Realistic SLOsLesson 1565 — Expiration Requirements and TTL Basics
- User experience suffers
- requests time out or return errors
- Lesson 105 — Graceful Degradation and Circuit Breaking
- User features
- your past likes, follows, typical engagement time
- Lesson 1668 — Machine Learning for Feed Ranking
- User feedback
- Direct customer complaints or support tickets reveal what *actually* frustrates users.
- Lesson 1284 — Iterating on SLIs and SLOs
- User Feeds Use NoSQL
- Lesson 330 — Real-World Decision Examples
- User History
- If you've searched for "coffee beans" ten times this month, that phrase gets a ranking boost when you type "coff" — even if globally it's less popular than "coffee shop.
- Lesson 1767 — Personalized Typeahead
- User ID
- (social networks, gaming platforms)
- Lesson 244 — Entity-Based ShardingLesson 1161 — Context-Rich LoggingLesson 1783 — Functional Requirements for Rate Limiter
- User IDs
- millions of users = millions of time series
- Lesson 1178 — Metric Cardinality and LabelsLesson 1211 — Avoiding High-Cardinality Labels
- User installs your app
- → App requests notification permission
- Lesson 1684 — Push Notifications: Mobile and Web
- User intent
- Merge non-overlapping changes; prioritize certain fields
- Lesson 1383 — Application-Level Conflict Resolution
- User logs in
- with credentials (username/password)
- Lesson 909 — Session-Based Authentication Fundamentals
- User Preferences
- Lesson 1688 — Channel Selection StrategyLesson 1694 — Channel Costs and EconomicsLesson 1699 — Notification Processing WorkersLesson 1703 — Channel Routing Logic
- User Profile
- team deploys avatar updates Thursday afternoon—completely independently.
- Lesson 791 — Independent Deployability
- User profile API
- Don't cache (or cache per-user with vary headers)
- Lesson 194 — CDN for API Acceleration
- User profile reads
- (AP): Show slightly stale data from local region rather than wait 200ms for cross-ocean consistency check
- Lesson 510 — Real Systems: Multi-Region Trade-offs
- User profile updates
- Strong consistency (W=ALL, R=ONE) ensures no stale reads after writes
- Lesson 563 — Tunable Consistency in Practice
- User profiles
- (retrieve by user ID)
- Lesson 338 — What is a Key-Value Store?Lesson 479 — Hybrid Architectures: Combining Graph and RelationalLesson 712 — Log Compaction
- User-agent rules
- Which paths are `Disallow`ed or `Allow`ed for your crawler
- Lesson 1861 — Robots.txt Caching and Parsing
- User-class segmentation
- Premium tiers may need higher thresholds than free users
- Lesson 997 — Testing and Monitoring Rate Limiters
- User-level limits
- prevent individual users from monopolizing resources (e.
- Lesson 973 — Multi-Tier Rate Limiting
- User-specific data
- (session info, personalized content) → Application-level cache or distributed cache
- Lesson 130 — Choosing the Right Caching Layer
- User-specified language
- is simpler: let users tag their paste explicitly.
- Lesson 1575 — Syntax Highlighting and Language Detection
- User-to-Role Assignment
- Mapping which users have which roles
- Lesson 933 — Role-Based Access Control (RBAC) Fundamentals
- UserInfo endpoint
- an API that returns extended profile information when presented with a valid access token.
- Lesson 929 — ID Tokens and the UserInfo Endpoint
- Users don't notice
- The difference between 99.
- Lesson 1310 — Embracing Risk: The 100% Availability Trap
- Users expect responsiveness
- a loading spinner that never completes is worse than slightly stale data
- Lesson 532 — Why Eventual Consistency Exists
- Utilization
- The percentage of time the resource is busy (e.
- Lesson 1189 — The USE MethodLesson 1264 — USE Method: Utilization, Saturation, Errors
- UUIDs
- offer global uniqueness and easy distribution but create much longer identifiers.
- Lesson 1516 — Counter-Based vs UUID ApproachesLesson 1520 — Primary Key Selection: Auto-Increment vs UUID
V
- Vacuum operations
- in PostgreSQL or compaction in NoSQL to reclaim disk space
- Lesson 1532 — Expiration and Time-to-Live
- Valid
- Gateway may add user context (user ID, roles) to request headers and forwards to appropriate service
- Lesson 883 — Authentication at the Gateway
- Validate assumptions
- about how your system behaves during failures
- Lesson 1343 — What is Chaos Engineering?Lesson 1345 — Starting with Game Days
- Validate format
- Ensure it's alphanumeric, appropriate length, no special characters (or define allowed patterns)
- Lesson 1514 — Custom Short URL Support
- Validate input
- Enforce length limits, character restrictions (alphanumeric only?
- Lesson 1531 — Custom Aliases and Vanity URLs
- Validate parameter combinations
- Reject requests where sort fields aren't in allowed filter contexts, preventing expensive full-table scans.
- Lesson 1896 — Combining Pagination, Filtering, and Sorting
- Validate redirect URIs strictly
- Don't allow wildcards; exact-match registered URIs only
- Lesson 931 — OAuth2 Security Best Practices
- Validate sort fields
- Only allow sorting on indexed columns; reject arbitrary field names
- Lesson 1897 — Performance Considerations and Limits
- Validation State
- Lesson 270 — Connection Lifecycle in a Pool
- validity
- (the agreed value was actually proposed by someone), and **termination** (the decision completes eventually)—all while nodes crash and networks partition.
- Lesson 599 — What Is Distributed Consensus?Lesson 608 — The Problem Paxos Solves
- Value frequency
- describes how often each key value appears.
- Lesson 1491 — Data Skew and Cardinality Issues
- Values
- can be anything: strings, JSON objects, binary data, or serialized structures
- Lesson 338 — What is a Key-Value Store?
- Variable latency patterns
- When percentile metrics (p99, p95) spike but you can't pinpoint the cause, traces reveal the specific code paths or service interactions responsible.
- Lesson 1260 — Cost-Benefit Analysis
- Variable request duration
- (some queries take 10ms, others 10 seconds)
- Lesson 87 — Least Connections Algorithm
- Variable-length paths
- let you specify a range of relationship traversals in a single pattern using special syntax like `[*1.
- Lesson 465 — Variable-Length Paths
- vector clock
- is a data structure that tracks the version history per replica.
- Lesson 367 — Vector Clocks and Conflict DetectionLesson 374 — Vector Clocks for Conflict DetectionLesson 548 — Causal Consistency ImplementationLesson 1396 — Implementing Consistency with Vector Clocks
- vector clocks
- for conflict detection, and **Merkle trees** for efficient anti-entropy.
- Lesson 370 — Distributed Key-Value Store Architectures in PracticeLesson 377 — Eventual Consistency and Application ReconciliationLesson 539 — Vector Clocks and CausalityLesson 547 — Causal Consistency FundamentalsLesson 1389 — Conflict Resolution in Practice
- Velocity anomalies
- Single entity involved in unusually many transactions in short time
- Lesson 474 — Fraud Detection Through Pattern Matching
- Vendor Flexibility
- Swap providers with configuration changes, not code rewrites
- Lesson 1690 — Channel Provider Abstraction
- Vendor support
- Professional support contracts and guarantees
- Lesson 108 — Hardware vs Software Load Balancers
- Verification
- Server validates the token's signature and extracts user information directly from the token payload
- Lesson 912 — Token-Based Authentication FundamentalsLesson 1408 — Backup Verification and TestingLesson 1430 — Backup Verification and TestingLesson 1437 — Failover and Failback Procedures
- Verify data completeness
- by checking how much data was lost
- Lesson 1419 — Measuring and Testing RPO/RTO Compliance
- Version compatibility matrices
- documenting which versions work together
- Lesson 810 — Deployment Complexity
- Version control
- via Git
- Lesson 774 — dbt for Analytics EngineeringLesson 1216 — Metric Documentation and Discovery
- Version control strategies
- (MVCC implementations differ)
- Lesson 582 — Transaction Isolation Across Systems
- Version mismatches
- between backup creation and restore environments
- Lesson 1430 — Backup Verification and Testing
- version vector
- (also called vector clock) is like a scoreboard where each replica tracks how many writes it has seen from every replica in the system.
- Lesson 562 — Version Vectors and Conflict DetectionLesson 1382 — Version Vectors and Causality
- Version vectors
- Track what version the client has seen; reject reads from lagging replicas
- Lesson 535 — Monotonic ReadsLesson 559 — Strong Consistency with QuorumsLesson 1382 — Version Vectors and Causality
- Version Vectors/Timestamps
- Your application tracks which version of data it's working with.
- Lesson 219 — Application-Level Consistency Patterns
- Versioning
- is the practice of explicitly marking API changes, and **backward compatibility** means new versions still support old clients.
- Lesson 809 — Versioning and Backward CompatibilityLesson 1015 — Conditional Writes for Idempotency
- Vertical Partitioning
- means splitting your table by **columns**.
- Lesson 231 — Vertical Partitioning vs Horizontal Partitioning
- vertical scaling
- (upgrading a single machine) and **horizontal scaling** (adding more machines), you're also choosing between two very different cost models.
- Lesson 45 — Comparing Cost StructuresLesson 229 — What is Sharding?
- VictorOps
- centralize alerting, escalation, and communication—but the real power comes from *automation*.
- Lesson 1305 — On-Call Tooling and Automation
- Video transcoding
- produces multiple bitrates (covered previously)
- Lesson 1602 — Adaptive Bitrate Streaming (ABR)
- Videos
- (MP4, WebM, streaming segments)
- Lesson 173 — Content Types Suited for CDNsLesson 1608 — Post-Processing and Metadata Extraction
- View and Engagement Metrics
- track video plays, view duration, completion rates, and pause/skip patterns.
- Lesson 1628 — Usage Analytics and Metrics
- Virtual Network integration
- for private connectivity to backend services
- Lesson 899 — Azure API Management Features
- virtual nodes
- .
- Lesson 372 — Consistent Hashing in DynamoLesson 1463 — Virtual Nodes SolutionLesson 1469 — Real- World ImplementationsLesson 1865 — Distributed URL Frontier Architecture
- Virtual nodes (vnodes)
- split each physical server into multiple "virtual" positions on the hash ring.
- Lesson 363 — Virtual Nodes and Load Distribution
- visibility timeout
- (or "lock duration").
- Lesson 657 — Message Ownership in QueuesLesson 669 — Amazon SQS ArchitectureLesson 687 — Dead Letter QueuesLesson 1604 — Message Queue for Processing Jobs
- Visual snapshots
- Links to pre-filtered dashboard views or embedded graphs
- Lesson 1293 — Alert Context and Enrichment
- Volatility risk
- Without persistence, server restart = data loss
- Lesson 349 — Redis In-Memory Storage Model
- Volume
- Each service generates its own logs, overwhelming traditional search tools
- Lesson 807 — Debugging and TroubleshootingLesson 1257 — Storage and Retention Costs
- Volume of writes
- Heavy write traffic creates a backlog.
- Lesson 208 — Replication Lag: What It Is and Why It Happens
W
- wait
- or **buffer** that operation.
- Lesson 548 — Causal Consistency ImplementationLesson 573 — The Blocking Problem in 2PCLesson 957 — Rate Limiting vs Throttling
- Wait Times
- Lesson 273 — Connection Pool Monitoring
- waits
- for the replica(s) to write the data and send back an acknowledgment
- Lesson 203 — Synchronous Replication ExplainedLesson 528 — Single-Leader Replication for Strong ConsistencyLesson 644 — Synchronous vs Asynchronous Communication
- Wall-clock time
- (system time) can jump backward due to NTP corrections, leap seconds, or admin changes.
- Lesson 1114 — Clock Skew and Time Synchronization
- warehouse
- excels.
- Lesson 765 — Choosing Lake vs WarehouseLesson 1550 — Object Storage for Paste Content
- Warehouse wins for
- Lesson 762 — Query Performance Tradeoffs
- Warehouses
- Compression is balanced with indexing and columnar organization for fast scans.
- Lesson 763 — Cost and Storage Efficiency
- warm standby
- might be provisioned but not actively running, requiring startup and sync time (minutes to hours RTO).
- Lesson 1417 — Hot Standby vs Cold StandbyLesson 1443 — DR Cost Optimization
- Warm storage
- (8-90 days): Slower, cheaper disks for occasional queries
- Lesson 1135 — Log Retention and Volume ManagementLesson 1428 — Backup Storage TiersLesson 1572 — Storage Tier MigrationLesson 1589 — Storage Tiering Strategy
- Warm storage (7-30 days)
- Retain only high-value traces: errors, slow requests (P99 latency), or specific user-flagged transactions.
- Lesson 1246 — Trace Data Retention Policies
- Warm tier
- Older logs (days to weeks) moved to slower, cheaper storage.
- Lesson 1156 — Indexing Strategies and RetentionLesson 1620 — Storage Tiering for Cost Optimization
- Warm tier (8-90 days)
- Compressed storage with slower query performance.
- Lesson 1165 — Log Retention Policies
- WARN
- Potentially problematic situations that don't stop execution
- Lesson 1141 — Log Levels in Structured Logs
- Warning (P2/P3)
- These signal degraded performance or impending problems that need attention within hours, not minutes.
- Lesson 1291 — Alert Severity Levels
- Wasted capacity
- Standby resources provide no value during normal operation
- Lesson 1436 — Active-Passive vs Active-Active DR
- Wasted effort
- computing feeds for inactive users who may never read them
- Lesson 1638 — Push (Write-Time) Feed Model
- Wasted resources
- Redundant work computing the same result hundreds of times
- Lesson 159 — Cache Stampede Problem
- Watch for changes
- Get notified when data updates
- Lesson 633 — ZooKeeper: Coordination Service Built on Consensus
- Web BFF
- Aggregates data for rich dashboards and complex UI requirements
- Lesson 902 — Backend-for-Frontend (BFF) Pattern Overview
- Web dashboard
- requests `/user/profile` → Gateway aggregates user data + recent activity from two services into one enriched response
- Lesson 875 — Client-Specific API Composition
- WebSocket → HTTP
- Long-lived WebSocket connections from clients translate to individual HTTP requests per message
- Lesson 881 — Protocol Translation
- WebSocket Gateway
- A specialized service layer that maintains open connections with active users.
- Lesson 1672 — WebSocket Architecture for Live Updates
- WebSocket/long-polling
- Push updates instantly (resource-intensive)
- Lesson 1671 — Real-Time Requirements for Social Feeds
- WebSockets
- are a prime example.
- Lesson 62 — When Stateful Services Are NecessaryLesson 1687 — In-App NotificationsLesson 1915 — GraphQL Subscriptions for Real-Time Data
- Weekly vs Daily Rotations
- Lesson 1297 — On-Call Fundamentals and Rotation Models
- Weighted
- variants handle heterogeneous servers well.
- Lesson 96 — Algorithm Selection TradeoffsLesson 880 — Request Routing and Load Balancing
- Weighted Distribution
- Lesson 226 — Load Distribution Across Replicas
- Weighted Round Robin
- solves this by assigning each server a weight that reflects its capacity.
- Lesson 86 — Weighted Round RobinLesson 88 — Weighted Least ConnectionsLesson 96 — Algorithm Selection Tradeoffs
- Well-Defined Schemas
- Lesson 320 — When SQL Is the Right Choice
- Well-understood patterns
- Decades of proven operational experience
- Lesson 71 — Single-Leader Replication Model
- what
- operation they were performing, and **how** this log relates to other events in your system.
- Lesson 1161 — Context-Rich LoggingLesson 1494 — Functional Requirements for a URL ShortenerLesson 1883 — Error Response Structure and Consistency
- What is wrong
- (The symptom, not just a metric name)
- Lesson 1287 — Actionability: Every Alert Needs a Runbook
- What Went Well
- Effective responses and mitigations during the incident
- Lesson 1350 — What is a Postmortem?
- What you gain
- Lesson 217 — Semi-Synchronous Replication Trade-offsLesson 1153 — Alternative: The Grafana Loki Approach
- What you sacrifice
- Lesson 217 — Semi-Synchronous Replication Trade-offsLesson 1153 — Alternative: The Grafana Loki Approach
- What's the acceptable compromise
- Can users tolerate 100ms extra latency if it means 99.
- Lesson 18 — Prioritizing Requirements Under Constraints
- What's the business priority
- A startup might prioritize fast launch over perfect reliability—you can improve later.
- Lesson 18 — Prioritizing Requirements Under Constraints
- When
- a partition happens (and it will), how should my system behave *during* those minutes or hours?
- Lesson 505 — The Partition Question: When, Not IfLesson 539 — Vector Clocks and CausalityLesson 1064 — Monitoring and MetricsLesson 1252 — Sampling Strategies OverviewLesson 1567 — Lazy vs Eager Deletion StrategiesLesson 1918 — gRPC vs REST vs GraphQL: When to Use Each
- When assigning a key
- , walk clockwise around the hash ring as usual
- Lesson 1468 — Bounded Loads Extension
- When combined with authentication
- The short URL just redirects; actual access requires login
- Lesson 1515 — Short URL Predictability Tradeoffs
- When precision matters more
- Lesson 32 — Rounding and Approximation Techniques
- When to avoid it
- Lesson 137 — Write-Behind: Risks and Use Cases
- When to escalate
- If the incident isn't resolved within the SLO response time for that severity, or if the on-call engineer needs expertise they don't have.
- Lesson 1298 — Incident Severity Levels and Escalation
- When to round aggressively
- Lesson 32 — Rounding and Approximation Techniques
- When to use
- Lesson 596 — Forward Recovery vs Backward RecoveryLesson 667 — RabbitMQ Exchange TypesLesson 683 — Consumer Acknowledgment TimingLesson 959 — Hard vs Soft LimitsLesson 1102 — Handling Zero or Negative TimeoutsLesson 1438 — DR Testing Strategies
- When unavoidable
- Lesson 1487 — Cross-Partition Queries
- Where to cache
- At the Policy Enforcement Point (PEP) in each service, at the API Gateway, or in a shared cache like Redis.
- Lesson 951 — Caching Authorization Decisions
- Which operations happened
- Lesson 527 — Consensus and Strong Consistency
- Whisper
- The fixed-size database format that stores time-series data on disk
- Lesson 1202 — Graphite Time-Series Database
- Whitebox Monitoring
- instruments your system internally, exposing detailed metrics, logs, and traces.
- Lesson 1266 — Blackbox vs Whitebox Monitoring
- Who
- Usually the end user (you).
- Lesson 921 — OAuth2 Roles: Resource Owner, Client, ServerLesson 1161 — Context-Rich Logging
- Who to page
- Start with the primary on-call for the affected service.
- Lesson 1298 — Incident Severity Levels and Escalation
- Why
- High-end hardware requires:
- Lesson 45 — Comparing Cost StructuresLesson 919 — Hybrid Session-Token PatternsLesson 1064 — Monitoring and MetricsLesson 1102 — Handling Zero or Negative TimeoutsLesson 1883 — Error Response Structure and Consistency
- Why accept this
- Lesson 560 — Eventual Consistency with Quorums
- Why does it matter
- (Impact on users or business)
- Lesson 1287 — Actionability: Every Alert Needs a Runbook
- Why eventual consistency exists
- is the direct answer to this constraint.
- Lesson 532 — Why Eventual Consistency Exists
- Why graphs shine here
- Fraudsters often create rings of fake accounts that share subtle connections—same phone number, overlapping IP addresses, or circular fund transfers.
- Lesson 458 — Use Cases: Fraud Detection and Knowledge Graphs
- Why it matters
- Crash-stop failures require simpler algorithms (like Raft or Paxos).
- Lesson 602 — Crash-Stop vs Byzantine Failures
- Why it works
- Indexes are sorted and compressed, making scanning and grouping much faster.
- Lesson 284 — Aggregation Query Optimization
- Why this works
- Lesson 215 — Sticky Sessions and Replica AffinityLesson 1541 — Sharding and Database Scaling
- Why TTL matters
- Lesson 1565 — Expiration Requirements and TTL Basics
- Wide rows
- occur when a single partition key accumulates too many clustering columns—imagine millions of columns in one row.
- Lesson 432 — Data Modeling Best Practices
- Wide-column stores
- (like Cassandra, HBase) organize data by row keys with sparse, flexible columns.
- Lesson 419 — Wide-Column vs Document Stores
- Wildcard and Tag-Based Purging
- Lesson 185 — Purging and Cache Invalidation Strategies
- Windowing
- Since streams are infinite, you often analyze chunks of time (e.
- Lesson 737 — What is Stream Processing?Lesson 741 — Windowing in Stream ProcessingLesson 744 — Stream Processing Frameworks
- Winston
- (Node.
- Lesson 1136 — Logging Libraries and StandardsLesson 1147 — Structured Logging Libraries
- With correlation IDs
- , you can filter all logs by that ID and see the complete story: the authentication attempt, the slow database query that caused it, and how it cascaded to the order service.
- Lesson 1132 — Correlation IDs and Request Tracing
- With Origin Shield
- All 500 stores call one regional warehouse (shield), which calls the factory once → 1 request hits the factory, warehouse serves the 500 stores
- Lesson 1614 — Origin Shield Pattern
- Within a region
- Lesson 1375 — Hybrid Topologies
- Without compaction
- Lesson 720 — Log Compaction
- Without Origin Shield
- 500 retail stores (edge nodes) each call the factory (origin) directly for the same product → 500 requests hit the factory
- Lesson 1614 — Origin Shield Pattern
- Without shielding
- , every edge location missing content would hit your origin directly.
- Lesson 1611 — Multi-Tier Caching Architecture
- Work Distribution
- In distributed crawlers, the frontier distributes URLs across multiple worker machines efficiently.
- Lesson 1838 — URL Frontier: Definition and Purpose
- Worker Health
- includes CPU/memory utilization, active connections, and heartbeat signals.
- Lesson 1871 — Monitoring Crawler Fleet Performance
- Worker Parallelization
- Deploy multiple identical worker instances that all pull from the same queue.
- Lesson 1708 — Scalability and Horizontal Expansion
- Worker Pools
- Multiple identical worker processes run in parallel.
- Lesson 659 — Queue Use Cases: Work Distribution
- Worker utilization
- Are your fanout workers idle or maxed out?
- Lesson 1657 — Measuring Fanout Performance
- Worker-Level DNS Cache
- Each crawler worker maintains its own DNS cache (building on the caching strategies from lesson 1857).
- Lesson 1869 — Scaling DNS Resolution
- Workers
- Multiple independent machines that receive URL assignments, perform HTTP requests, parse content, extract links, and report results back to the coordinator.
- Lesson 1863 — Coordinator-Worker Pattern for Crawling
- Working set size
- Fewer documents fit in RAM, reducing cache effectiveness
- Lesson 409 — Data Size and Storage Considerations
- Works behind firewalls
- or NAT without inbound connectivity
- Lesson 1197 — Pull vs Push Metrics Collection Models
- Works with retries
- Combines well with retry budgets and circuit breakers you've already learned
- Lesson 1031 — Hedged Requests and Speculative Execution
- Write amplification
- A single post triggers millions of database writes
- Lesson 1640 — Celebrity Problem in Push ModelsLesson 1649 — The Celebrity Problem in Fanout
- Write back
- the resolved version, which becomes the new authoritative state
- Lesson 377 — Eventual Consistency and Application Reconciliation
- Write bottleneck
- – The single leader can become overwhelmed
- Lesson 1365 — Single-Leader Replication Topology
- Write concern
- specifies how many replicas must acknowledge a write before MongoDB reports success:
- Lesson 395 — Read and Write Concerns
- Write efficiency
- Only new data is processed, not the entire corpus
- Lesson 1772 — Real-Time Index Updates
- Write events, not state
- When something happens (order placed, payment received), you write an immutable event describing what happened
- Lesson 586 — Alternative: Event Sourcing for Consistency
- Write latency
- Slower than async (must wait for network + replica write)
- Lesson 217 — Semi-Synchronous Replication Trade-offsLesson 296 — Write Amplification Costs
- Write latency increases
- because you must wait for both the cache write *and* the slower database write before responding.
- Lesson 134 — Write-Through Caching Pattern
- write operations
- (INSERT, UPDATE, DELETE) to the **primary database** and **read operations** (SELECT) to **replica databases**.
- Lesson 220 — Read-Write Splitting FundamentalsLesson 1118 — Per-Operation Timeout ConfigurationLesson 1548 — Read vs Write Path ArchitectureLesson 1636 — Capacity Estimation: Feed Reads vs Writes
- write path
- handles user submissions, validates content, generates unique IDs, and persists data.
- Lesson 1548 — Read vs Write Path ArchitectureLesson 1562 — Content Compression and Encoding
- Write Path Characteristics
- Lesson 1548 — Read vs Write Path Architecture
- Write phase
- Data is written to N *available* nodes (not necessarily the "right" ones)
- Lesson 1372 — Sloppy Quorums and Hinted Handoff
- Write queries
- (`INSERT`, `UPDATE`, `DELETE`) → Primary database
- Lesson 222 — Proxy-Based Read-Write Splitting
- Write replication
- New URL creation in one region propagates to others asynchronously
- Lesson 1535 — Multi-Region Deployment
- write throughput
- decreases.
- Lesson 135 — Write-Through: Latency and Consistency TradeoffsLesson 229 — What is Sharding?Lesson 1488 — Secondary Indexes in Partitioned Systems
- Write time
- User A posts → store post in database with `user_id` and `timestamp`
- Lesson 1647 — Fanout-on-Read (Pull Model)
- Write timeout
- controls how long your client will wait to *send* data to the server before giving up.
- Lesson 1089 — Read Timeout and Write Timeout
- Write to page cache
- Kafka appends the message to an in-memory buffer (the OS page cache) immediately
- Lesson 713 — Kafka's Write Path and Durability
- write-ahead log
- (called HLog) and **memstores** that flush to immutable files (called HFiles, similar to SSTables).
- Lesson 433 — What is HBase?Lesson 574 — Recovery Protocols and Logs
- Write-Ahead Log (WAL)
- Lesson 206 — Replication Logs and MechanismsLesson 415 — Write Path and LSM TreesLesson 436 — HBase Write Path and WALLesson 1849 — URL Frontier Persistence and Recovery
- Write-Ahead Logging (WAL)
- Before modifying any data in memory or on disk, the database first writes the change to a sequential log file.
- Lesson 313 — Durability: Surviving System FailuresLesson 470 — Transaction Model and ACID in Neo4j
- Write-Back
- ) pattern, writes are immediately accepted by the cache and acknowledged to the client *before* being persisted to the database.
- Lesson 136 — Write-Behind (Write-Back) Caching PatternLesson 1528 — Write-Through vs Write-Back for URL Creation
- Write-Behind
- (also called **Write-Back**) pattern, writes are immediately accepted by the cache and acknowledged to the client *before* being persisted to the database.
- Lesson 136 — Write-Behind (Write-Back) Caching Pattern
- Write-heavy scenarios
- Lesson 1483 — Celebrity User Problem
- Write-heavy workload
- Use W=1, R=N.
- Lesson 556 — Read and Write QuorumsLesson 1361 — Quorum-Based Replication
- Write-through
- Update the cache immediately when updating the database.
- Lesson 128 — Cache Coherence Across LayersLesson 1528 — Write-Through vs Write-Back for URL Creation
- Write-Through Caching
- (which you learned in the previous lesson), every write operation updates *both* the cache and the underlying database before returning success to the client.
- Lesson 135 — Write-Through: Latency and Consistency Tradeoffs
- Writes
- (INSERT, UPDATE, DELETE) → Primary database only
- Lesson 200 — Why Replicate: Read ScalingLesson 223 — Detecting Read vs Write QueriesLesson 349 — Redis In-Memory Storage ModelLesson 1362 — Chain ReplicationLesson 1636 — Capacity Estimation: Feed Reads vs Writes
- Writes (URL creation)
- Lesson 1496 — Read-Heavy vs Write-Heavy Characteristics
- Writes are blocked
- to prevent conflicting updates
- Lesson 511 — Banking Systems: Consistency Over Availability
- Writes-follow-reads
- consistency (also called "session causality") ensures that if a client reads some data and then performs a write, that write is guaranteed to happen *after* the values the client observed during the read.
- Lesson 537 — Writes-Follow-Reads ConsistencyLesson 545 — Writes-Follow-Reads Consistency
- Writes-follow-reads consistency
- (also called *session causality*) guarantees that if a client reads some data and then performs a write, that write will be applied to a system state that includes the data the client just read—or a later state.
- Lesson 1393 — Writes-Follow-Reads Consistency
X
- X happened before Y
- if every counter in X's vector ≤ Y's vector (and at least one is strictly less)
- Lesson 1382 — Version Vectors and Causality
- XA transaction managers
- in Java EE) implement 2PC across these services.
- Lesson 576 — When 2PC is Used in Practice
Y
- You avoid false positives
- High CPU might be fine if users are happy
- Lesson 1313 — Monitoring and Observability for SRE
- You lose parallelism
- With one partition, only one consumer can process messages at a time while maintaining order.
- Lesson 685 — Message Ordering Guarantees
- You upload
- your static assets (images, CSS, JS) to your origin server or directly to the CDN
- Lesson 192 — CDN for Static Asset Delivery
- You value operational simplicity
- Fewer moving parts mean less infrastructure to maintain, monitor, and debug.
- Lesson 755 — When to Choose Lambda vs Kappa
- Your comment isn't visible
- You panic and click "Post" again, creating duplicates
- Lesson 209 — Read-After-Write Consistency Problem
- Your feed
- A blend of pre-computed posts (from regular users you follow) plus real-time queries (for celebrities you follow).
- Lesson 1648 — Hybrid Fanout Strategy
- Your queries are simple
- If real-time and batch processing use similar logic (counting events, simple aggregations), Kappa's replay capability handles both without complexity.
- Lesson 755 — When to Choose Lambda vs Kappa
- Your service
- (the "client") has been pre-registered with the authorization server and given two pieces of secret information: a `client_id` and a `client_secret`
- Lesson 925 — Client Credentials Flow
- Your service's processing time
- (parsing responses, business logic)
- Lesson 1098 — Per-Hop Timeout Budgets
Z
- ZAB (ZooKeeper Atomic Broadcast)
- , which works similarly to Raft.
- Lesson 633 — ZooKeeper: Coordination Service Built on Consensus
- Zero application changes
- Legacy apps can gain read-write splitting without code modifications—just point them to the proxy.
- Lesson 222 — Proxy-Based Read-Write Splitting
- Zero coordination
- Any server can generate UUIDs independently
- Lesson 1520 — Primary Key Selection: Auto-Increment vs UUID
- Zero infrastructure
- No servers to provision or manage
- Lesson 895 — AWS API Gateway and Serverless Integration
- Zero-Trust Security Requirements
- Lesson 868 — When Service Mesh Adds Value
- Zipkin
- offer full control, zero licensing costs, and community support.
- Lesson 1251 — Choosing a Tracing System
- Zipkin wins on simplicity
- single binary deployment works for many use cases.
- Lesson 1242 — Zipkin Architecture and Design
- Zone failures
- Simulate an entire availability zone going dark
- Lesson 1342 — Testing Redundancy with Fault Injection
- Zookeeper
- is another PC/EC system, essential for distributed locking where correctness trumps speed.
- Lesson 521 — PACELC Tradeoffs in Real SystemsLesson 636 — Consensus for Leader ElectionLesson 637 — Distributed Locks via ConsensusLesson 638 — Configuration Management with ConsensusLesson 704 — Brokers and Cluster ArchitectureLesson 715 — ZooKeeper vs KRaft Mode
- ZooKeeper mode
- Lesson 715 — ZooKeeper vs KRaft Mode