Data Science Glossary

Key terms from the Data Science course, linked to the lesson that introduces each one.

5,145 terms.

#

`element_blank()`: Removes elements entirely; Lesson 1365 — Customizing Non-Text Elements Lesson 1366 — Theme() Function Deep Dive
`element_line()`: Controls line elements (grid lines, axis lines); Lesson 1365 — Customizing Non-Text Elements Lesson 1366 — Theme() Function Deep Dive
`element_rect()`: Controls rectangular elements (backgrounds, borders); Lesson 1365 — Customizing Non-Text Elements Lesson 1366 — Theme() Function Deep Dive
±3 standard deviations: captures about 99.; Lesson 1378 — Setting Z-Score Thresholds Lesson 1397 — Shewhart Control Chart Basics
1:1 to 3:1: Breaking even or marginally profitable, but likely not covering operational overhead, support costs, or providing adequate return on capital.; Lesson 1667 — LTV:CAC Ratio and Profitability Lesson 1756 — LTV:CAC Ratio as a Health Metric
80/20 rule: .; Lesson 191 — Pareto Principle and the 80/20 Rule Lesson 2116 — Diminishing Returns and the 80/20 Rule
80% power: , meaning an 80% chance of detecting a true effect if it exists.; Lesson 446 — Power and Sample Size for ANOVA Lesson 1495 — Power Analysis Fundamentals
95% confidence: The procedure captures the true parameter 95% of the time; Lesson 267 — Interpreting Confidence Levels Lesson 278 — Confidence Interval Formula for One Proportion
α (alpha): Controls the shape on the left side; Lesson 184 — Beta Distribution: Bounded Between 0 and 1 Lesson 761 — Double Exponential Smoothing (Holt's Method)
α close to 0: (e.; Lesson 758 — Simple Exponential Smoothing (SES)Lesson 759 — Choosing the Smoothing Parameter α
α close to 1: (e.; Lesson 758 — Simple Exponential Smoothing (SES)Lesson 759 — Choosing the Smoothing Parameter α
β (beta): Controls the shape on the right side; Lesson 184 — Beta Distribution: Bounded Between 0 and 1 Lesson 331 — Understanding Type II Error (False Negative)Lesson 761 — Double Exponential Smoothing (Holt's Method)
λ (lambda): , called the **rate parameter**.; Lesson 165 — Exponential Distribution: PDF and CDF Lesson 593 — Box-Cox Transformation
μ (mu): The mean, which determines where the center of the bell curve sits; Lesson 169 — The Normal Distribution: Definition and Properties Lesson 170 — Parameters: Mean (μ) and Standard Deviation (σ)Lesson 172 — Probability Density Function for Normal Distribution
σ (sigma): The standard deviation, which controls how spread out or "wide" the bell curve is; Lesson 169 — The Normal Distribution: Definition and Properties Lesson 170 — Parameters: Mean (μ) and Standard Deviation (σ)Lesson 172 — Probability Density Function for Normal Distribution

A

above: or **below** the hypothesized median.; Lesson 391 — The Sign Test for Medians Lesson 567 — Common Q-Q Plot Patterns: Heavy Tails and Light Tails Lesson 568 — Skewness in Q-Q Plots: Left and Right Deviations
Above 5:1: Excellent margins, but potentially signals underinvestment in growth.; Lesson 1667 — LTV:CAC Ratio and Profitability
Above average: `WHERE value > (SELECT AVG(value) FROM table)`; Lesson 964 — Subqueries with Aggregate Functions
Absolute counts: 500 users from Jan cohort were active in Week 3; Lesson 1647 — Building a Cohort Table
absolute difference: |p₁ - p₂|.; Lesson 413 — Effect Size and Practical Significance Lesson 1019 — Comparing Values to Window Aggregates
Academic scores: Assignments with varying point values; Lesson 43 — Weighted Mean and Its Applications
Acceleration: If the variance of your statistic changes across different data values (making the distribution asymmetric), BCa accounts for this skewness; Lesson 304 — BCa Bootstrap Intervals: Bias Correction
Accept limitations: Acknowledge when clean measurement isn't feasible; Lesson 1527 — Ignoring Network Effects
Accept or reject: if balance is good, proceed; if not, generate a new randomization; Lesson 1492 — Rerandomization and Practical Implementation
Accept trade-offs: consciously; Lesson 2121 — Timeboxing and Deadlines
Acceptable error types: Is a false positive worse than a false negative?; Lesson 2117 — Defining 'Good Enough' with Stakeholders
Access controls: Limit who can use sensitive features (building on GDPR principles you learned); Lesson 1925 — Mitigation Strategies and Responsible Disclosure
Access Date: When you retrieved the data.; Lesson 2063 — Essential Metadata to Capture
Access method: How you retrieved it (SQL query, API call, manual download, automated script); Lesson 1161 — Documenting Data Sources
Accessibility: Design for colorblind viewers, provide alt text, and use clear labels.; Lesson 1247 — The Ethics of Visualization Design Lesson 2086 — Stage 2: Data Acquisition and Assessment
Accountability: Lesson 1905 — Core Principles of GDPR
Accounting for censored observations: by including them in the "at-risk" count up until they're censored, then removing them; Lesson 809 — Introduction to the Kaplan-Meier Estimator
Accuracy: What percentage of predictions were correct?; Lesson 14 — Model Evaluation and Validation Lesson 243 — Choosing the Right Sampling Method Lesson 1863 — Data Quality Dimensions Lesson 1869 — Data Quality Metrics and SLAs Lesson 1878 — What is Bias in Data?Lesson 1905 — Core Principles of GDPR Lesson 1973 — Report Review and Quality Checklist Lesson 2086 — Stage 2: Data Acquisition and Assessment
Accuracy SLO: "Data quality checks will pass with <0.; Lesson 1860 — SLA and SLO Definitions
Accuracy varies: Ambiguous addresses ("Main Street") return less precise results; Lesson 1315 — Geocoding and Reverse Geocoding
ACF: (Autocorrelation Function) at lag k, you measure the total correlation between a time series and its k-period-ago self.; Lesson 728 — PACF vs ACF: Key Differences Lesson 733 — Using ACF and PACF Together Lesson 798 — SARIMA Model Selection
ACF of residuals: , you want to see:; Lesson 786 — ACF and PACF of Residuals
ACF plot: Should show rapid decay to zero, not slow tailing off; Lesson 741 — Testing Stationarity After Transformation Lesson 779 — The Box-Jenkins Methodology
ACF plots: decay very slowly (indicating non-stationarity); Lesson 734 — Why Differencing and Detrending Matter
ACF/PACF clues: When your first-differenced ACF still shows slow decay; Lesson 736 — Higher-Order Differencing
ACF/PACF of residuals: Should show no significant spikes (all patterns captured); Lesson 799 — Fitting and Diagnosing SARIMA Models
ACID: Lesson 1110 — What Are Database Transactions?
Acknowledge the constraint: with stakeholders—don't promise ML magic without the ingredients.; Lesson 2124 — Insufficient or Low-Quality Data
Acknowledge the limitation: State that the intercept has no practical interpretation in your context; Lesson 526 — When the Intercept Has No Meaning
Acknowledge uncertainty: Use phrases like "on average" or "typically" rather than absolute claims; Lesson 530 — Communicating Results to Non-Technical Audiences
Acknowledging limitations: Honest interpretation includes caveats—data gaps, model assumptions, confidence intervals.; Lesson 2090 — Stage 6: Interpretation and Insight Generation
Acquisition channels: are the various pathways through which potential users discover and arrive at your product, website, or service.; Lesson 1711 — What Are Acquisition Channels?
Actionable: What decisions will this answer inform?; Lesson 1166 — Defining the Business Question Lesson 1605 — Characteristics of Good North Star Metrics
Actionable across teams: – Different departments can influence it through their work; Lesson 1604 — What is a North Star Metric?
actions: (triggers that produce results).; Lesson 1774 — What is Apache Spark and Why Use It?Lesson 1780 — Transformations vs Actions in Spark
Activation benchmarks: Lesson 1697 — Time-to-Value and Activation Metrics
Active Customers: Engaged users who regularly use your product or make repeat purchases.; Lesson 1704 — Customer Lifecycle Stages
Actual value (Y ᵢ): = what you measured; Lesson 538 — What Are Fitted Values?
Actual(t): is the most recent observed value; Lesson 758 — Simple Exponential Smoothing (SES)
Acyclic: means no variable can cause itself through any path (no loops); Lesson 1468 — Introduction to Directed Acyclic Graphs (DAGs)Lesson 1833 — Introduction to Apache Airflow
Adapt immediately: Change a threshold or add a condition in minutes; Lesson 2128 — Data Distribution Shifts Frequently
Add a legend: Always include a size legend so viewers can interpret bubble magnitudes; Lesson 1229 — Bubble Charts for Three Variables
Add candidate predictors: one at a time or in meaningful blocks (e.; Lesson 703 — Sequential Model Building Strategy
Add context: Compare the effect size to something meaningful; Lesson 530 — Communicating Results to Non-Technical Audiences
Add Context and Clarity: Lesson 1217 — The Transition from Explore to Explain
Add redundant encoding: Don't rely on color alone.; Lesson 1248 — Color Blindness and Color Palette Design
Add to Cart: Lesson 1679 — Defining Funnel Steps and Events
Adding a constant: If you add a constant *c* to a random variable *X*, the expected value increases by exactly *c*:; Lesson 149 — Properties of Expectation and Variance
Adding intervals to dates: Lesson 1040 — Date Arithmetic and INTERVAL Operations
Additional detail: on methodology (statistical tests used, data cleaning steps); Lesson 1949 — Anticipating Questions: Building in Appendices
additive: .; Lesson 466 — Visualizing Interactions Lesson 710 — Additive vs Multiplicative Models Lesson 744 — Classical Decomposition Methods Lesson 765 — Introduction to Holt-Winters Method Lesson 767 — Holt- Winters Additive Model
Additive changes: are safest: adding new columns doesn't break existing queries that don't reference them.; Lesson 1876 — Schema Evolution and Backwards Compatibility
Additive forecasting formula: Lesson 771 — Forecasting with Holt-Winters
Additive model: `Observed = Trend + Seasonality + Irregular`; Lesson 710 — Additive vs Multiplicative Models Lesson 742 — Components of Seasonal Decomposition Lesson 748 — Seasonally Adjusted Data Lesson 749 — Using Decomposition for Forecasting Lesson 770 — Initializing Holt-Winters Components
Additive models: assume components are added together:; Lesson 743 — Additive vs Multiplicative Models
Additive seasonality: means seasonal fluctuations stay roughly constant in size regardless of the data's level.; Lesson 766 — Additive vs Multiplicative Seasonality
Adds a penalty: Includes a penalty term for each additional change-point to avoid over-segmenting (similar to regularization in regression); Lesson 1416 — PELT Algorithm: Pruned Exact Linear Time
Adequate range: Data should span a reasonable range of values; Lesson 480 — Scatterplots and Visual Assessment
Adequate sample size: Expected frequency ≥ 5 in every category; Lesson 419 — Assumptions and Minimum Expected Frequencies
ADF test: Low p-value (< 0.; Lesson 718 — Interpreting Stationarity Test Results
Adjust next sprint: Based on feedback, decide whether to refine features, try new models, or pivot; Lesson 2113 — Timeboxing and Sprint Planning for Data Projects
Adjust the scale: to emphasize meaningful differences—sometimes a gradient from 0-100% works, other times 20- 80% highlights actionable variations; Lesson 1649 — Visualizing Cohort Data with Heatmaps
Adjusted fences: use a skewness coefficient to shift boundaries asymmetrically; Lesson 1388 — Limitations and Alternatives to IQR Detection
Adjusted p-values: corrected for multiple testing (Tukey, Bonferroni, etc.; Lesson 462 — Interpreting and Reporting Post-Hoc Results
Adjusted R-squared: Quick model comparisons, reporting to non-technical audiences, when interpretability matters most.; Lesson 616 — Adjusted R-Squared vs Other Criteria Lesson 626 — Nested vs Non-Nested Models Lesson 632 — Parsimony and Occam's Razor
Adjustment: means including confounders directly in a regression model as additional predictors.; Lesson 1431 — Controlling for Confounders: Adjustment
Administrative records: Lists that don't capture informal workers or undocumented individuals; Lesson 249 — Coverage Error and Undercoverage
Administrative selection: gatekeepers assign treatment based on need or eligibility; Lesson 1444 — Selection Bias and Treatment Assignment
Adoption rate: = (Users who've used feature at least once) / (Total active users); Lesson 1696 — Feature Adoption and Usage Frequency
Adstock: (also called "advertising stock") is a transformation that captures two key phenomena:; Lesson 1739 — Adstock and Carryover Effects
Advanced composition: Mathematical techniques can provide tighter bounds, so the total might be less than simple addition; Lesson 1900 — Privacy Budget and Composition
Advantage: Easiest to implement, unbiased in expectation.; Lesson 1437 — Randomization Mechanisms
Advantages: Lesson 1620 — Single vs Shared Ownership Models
Adverse Impact Ratio: extends the 80% rule with confidence intervals; Lesson 1890 — Measuring Disparate Impact
Advocacy in analyst's clothing: Using your technical authority to push personal or organizational agendas; Lesson 1926 — The Honest Broker Role
aesthetic mappings: define *how* your data becomes *visible*.; Lesson 1341 — Data and Aesthetic Mappings Lesson 1348 — The Base Layer: ggplot() and Data Mapping
Aesthetics (aes): How variables map to visual properties like x-position, y-position, color, size, or shape; Lesson 1339 — What is the Grammar of Graphics?Lesson 1340 — The Seven Layers of Grammar
Affected by extremes: One very high or very low value (an outlier) can pull the mean in that direction; Lesson 39 — The Mean (Arithmetic Average)
After 3NF: Lesson 1066 — Third Normal Form (3NF)
After denormalizing: Lesson 1077 — Measuring Performance Impact of Denormalization
After testing: If your first-differenced series still fails stationarity tests (ADF, KPSS); Lesson 736 — Higher-Order Differencing
Age: .; Lesson 495 — Confounding Variables Lesson 1888 — Protected Classes and Sensitive Attributes
Age groups: A person in the "18-25" bracket cannot also be in the "26-35" bracket; Lesson 81 — Mutually Exclusive Events
Age in customer data: Even if someone's age of 150 falls within 3 standard deviations (passing the Z-score test), you know it's invalid—humans don't live that long.; Lesson 75 — Domain-Specific Outlier Rules
Aggregate: your data; Lesson 973 — Nested Subqueries in FROM Lesson 994 — CTEs for Simplifying Complex Joins Lesson 1827 — Transformation Patterns: Map, Filter, Aggregate
Aggregate Functions: – Count or measure each bin; Lesson 912 — Fundamental Difference: Filter Timing
Aggregate functions calculate: Totals, averages, counts are computed for each group; Lesson 915 — Combining WHERE and HAVING
Aggregated summaries: Storing `total_order_value` on a customer record instead of calculating it from order lines each time.; Lesson 1074 — Duplicating Data Across Tables
Aggregation: Switch to hexbin maps or heatmaps for dense datasets; Lesson 1310 — Point Maps and Scatter Plots on Maps
Aggregation problems: Lesson 1245 — Misleading Aggregations and Binning
Aggregations: Sum, count, mean calculations that don't need all data simultaneously; Lesson 1800 — Chunked Reading with read_csv
Agreed upon: Stakeholders buy in *before* analysis begins; Lesson 2094 — Defining Success Metrics Upfront
Agreement is confidence: When both tests agree (ADF rejects + KPSS doesn't reject), you can confidently call the series stationary.; Lesson 718 — Interpreting Stationarity Test Results
Agricultural data: might follow growing seasons that vary by region; Lesson 746 — Choosing Seasonal Period
AIC: , or **BIC** — but you cannot use the Partial F-Test; Lesson 626 — Nested vs Non-Nested Models Lesson 660 — Choosing the Polynomial Degree Lesson 700 — AIC and BIC for Model Selection Lesson 781 — Information Criteria: AIC and BIC Lesson 785 — Information Criteria: AIC and BIC
AIC (Akaike Information Criterion): and **BIC (Bayesian Information Criterion)** are scores that penalize models for using too many parameters while rewarding good fit to the data.; Lesson 781 — Information Criteria: AIC and BIC
AIC and BIC: explicitly trade off fit quality against model size; Lesson 632 — Parsimony and Occam's Razor Lesson 791 — Comparing Nested and Non-Nested Models
AIC/BIC: Formal model selection procedures, comparing non-nested models, automated selection algorithms.; Lesson 616 — Adjusted R-Squared vs Other Criteria
Airbnb: Nights booked (value = accommodations secured); Lesson 1604 — What is a North Star Metric?Lesson 1606 — Examples of North Star Metrics by Industry
Airflow: offers multiple ways to declare dependencies:; Lesson 1843 — Declaring Dependencies in Orchestration Tools
Alation: , and **Apache Atlas** maintain centralized inventories of your data assets.; Lesson 1164 — Tools for Lineage Tracking
Alert and Continue: Lesson 1866 — Handling Failed Quality Checks
Alertness: (coffee directly increases alertness); Lesson 1469 — Building a Simple Causal DAG
Algorithm initialization: Neural networks, k-means clustering, random forests all start with random states; Lesson 2055 — Why Randomness Matters in Data Science
Algorithmic amplification of harm: occurs when automated systems take existing problems—bias, misinformation, manipulation, or discrimination—and multiply their impact exponentially.; Lesson 1923 — Algorithmic Amplification of Harm
Align with business reality: Reflect how your sales team actually closes deals; Lesson 1731 — Custom Rule-Based Attribution
Aligned: with long-term business value; Lesson 1478 — Defining Success Metrics Lesson 2094 — Defining Success Metrics Upfront
Aligns Teams: Lesson 1605 — Characteristics of Good North Star Metrics
all: the uncertainty to one tail instead of splitting it between two tails.; Lesson 275 — One-Sided Confidence Bounds Lesson 729 — Calculating Partial Autocorrelations Lesson 866 — The AND Operator Lesson 928 — LEFT JOIN vs INNER JOIN: When to Use Each Lesson 963 — ANY and ALL Operators Lesson 1407 — The ESD Component Lesson 1513 — Always-Valid Inference and Confidence Sequences Lesson 1753 — Customer Acquisition Cost (CAC): Components and Calculation (+1 more)
All assumptions met: → Proceed with standard parametric t-test; Lesson 383 — Diagnostic Workflow: When to Proceed or Switch Tests
Allocate budget wisely: Identify which touchpoints assist vs.; Lesson 1719 — The Customer Journey and Touchpoints
Allow multiple users: to access data simultaneously; Lesson 842 — What is a Database?
Allowed Values: Valid ranges for numeric data or enumerated categories; Lesson 2064 — Creating Data Dictionaries
Alpha: controls how much weight recent observations get when updating the **baseline level** of your series.; Lesson 769 — Smoothing Parameters: Alpha, Beta, Gamma
Alpha (α): – your significance level (usually 0.; Lesson 344 — Power Analysis in Study Design
alphabetical order: to select the reference category.; Lesson 646 — Reference Categories in Statistical Software Lesson 1178 — Bar Charts for Categorical Data
Alt text: (alternative text) is a brief written description of a visualization that screen readers can announce.; Lesson 1250 — Text Alternatives and Screen Reader Compatibility
Altair charts: use `st.; Lesson 1333 — Displaying Charts and Tables in Streamlit
Alternative: The interaction coefficient differs from zero (it matters); Lesson 654 — Testing Interaction Significance
Alternative (H ₐ): The variables are associated; Lesson 433 — Conducting Fisher's Exact Test
Alternative (H₁): At least one group has different variance; Lesson 450 — Homogeneity of Variance (Homoscedasticity)Lesson 683 — Hypothesis Tests for Individual Coefficients Lesson 787 — Ljung-Box Test for Residual Autocorrelation
Alternative analyses: you considered but didn't choose (and why); Lesson 1949 — Anticipating Questions: Building in Appendices
Alternative hypothesis (H₁): The data does *not* come from a normal distribution; Lesson 205 — Shapiro-Wilk Test Lesson 311 — One-Sided vs Two-Sided Alternatives Lesson 354 — Setting Up Hypotheses for One-Sample t-Test Lesson 378 — Testing Normality: Statistical Tests Lesson 401 — Setting Up Hypotheses for Proportions Lesson 406 — Two-Sample Proportion Test Setup Lesson 500 — Hypothesis Testing Framework for Correlation Lesson 501 — T-Test for Pearson Correlation Significance (+5 more)
Always include confidence intervals: , not just point estimates.; Lesson 1928 — Communicating Uncertainty Honestly
Always increasing: As x increases, F(x) never decreases; Lesson 157 — Cumulative Distribution Functions (CDFs) for Continuous Variables
Always positive: Log-normal variables are strictly greater than zero; Lesson 178 — Log-Normal Distribution: Definition and Properties
Always qualify columns: in multi-table queries, even when names don't conflict—it makes your intent crystal clear; Lesson 922 — Selecting Columns from Joined Tables
Always specify join conditions: that relate the tables using foreign key relationships; Lesson 955 — Avoiding Cartesian Products
Always state units: when reporting slopes ("$150 per square foot," not just "150"); Lesson 525 — Units and Scale in Interpretation
Always try this first: Use pandas' built-in operations that work on entire columns at once.; Lesson 1806 — Parallel Processing with apply() Alternatives
Always unique: Unlike `RANK()` or `DENSE_RANK()`, ties receive different numbers based on arbitrary order; Lesson 1007 — ROW_NUMBER(): Assigning Unique Row Numbers
Always use parentheses: when mixing `AND` and `OR`, even if precedence would give the correct result.; Lesson 870 — Operator Precedence and Parentheses
Always-valid inference: provides p-values and confidence intervals that remain statistically valid *no matter when you stop* — whether you check once, continuously, or at random times you didn't plan ahead.; Lesson 1513 — Always-Valid Inference and Confidence Sequences
Amazon: Number of purchases per month — reflects both customer satisfaction and business sustainability.; Lesson 1606 — Examples of North Star Metrics by Industry
Ambiguity kills analysis: If you're studying "time to employee turnover," does the clock start at date of hire, end of training, or first promotion?; Lesson 803 — Defining the Event and Time Origin
Amplify historical inequities: baked into training data; Lesson 1888 — Protected Classes and Sensitive Attributes
Analogy: If your investment grows 10% one year and shrinks 10% the next, the arithmetic mean says 0% change—but you actually lost money!; Lesson 44 — Geometric and Harmonic Means Lesson 50 — Population vs Sample Variance Lesson 57 — Quantiles: Quartiles, Deciles, and Beyond Lesson 70 — Visual Methods: Box Plots and Scatter Plots Lesson 74 — Multivariate Outlier Detection Lesson 106 — Common Misconceptions About Independence Lesson 126 — From Bernoulli to Binomial: Multiple Trials Lesson 133 — Expectation and Variance of the Geometric Distribution (+57 more)
Analysis becomes consistent: You always know where to find variables; Lesson 1142 — What is Tidy Data?
Analysis cells: Alternate between explaining your approach (markdown) and executing it (code); Lesson 1982 — Literate Programming with Notebooks
Analysis plan: Statistical test you'll use, significance level (usually α = 0.; Lesson 1485 — Documentation and Pre-Registration
Analytical: "Which customer segments have the highest lifetime value, and what acquisition channels bring us those segments?; Lesson 2093 — Translating Business Questions into Analytical Questions
Analytical goal: Are you comparing values, showing distribution, revealing relationships, tracking change over time, or displaying composition?; Lesson 1230 — Choosing the Right Chart Type
Analytics: You need to understand trends and make informed decisions; Lesson 4 — Data Science vs Data Analytics vs Business Intelligence
Analyze and Test: Lesson 25 — The Scientific Method in Data Science
Anchor Member: The starting point—your initial row(s) with no dependencies.; Lesson 996 — Recursive CTEs: Introduction
Anderson-Darling test: is another statistical test that checks whether your data follows a normal distribution, but with a special feature: it gives **more weight to the tails** (the extreme values at both ends) than the K- S test does.; Lesson 207 — Anderson-Darling Test Lesson 449 — Normality of Residuals
Animation: Show changes over time or across a third variable sequentially; Lesson 1329 — Effective Use and Pitfalls of 3D Visualizations
Annotations: draw attention to specific data points or regions.; Lesson 1271 — Adding Legends, Annotations, and Text Lesson 1355 — Layer Order and Plot Composition
Anomalies: Flag anything unusual.; Lesson 1180 — Documenting Univariate Findings Lesson 2087 — Stage 3: Exploratory Data Analysis
Anonymize rather than delete: where possible for retained data; Lesson 1909 — Right to Erasure and Data Retention Policies
Anonymous participation options: when power dynamics exist; Lesson 1918 — Special Populations and Vulnerable Groups
Another example: If β = -0.; Lesson 681 — Interpreting Logistic Regression Coefficients
ANOVA framework: (Analysis of Variance), which decomposes total variation into parts explained by the model versus leftover residuals.; Lesson 618 — Global F-Test for Overall Model Significance
Anscombe's quartet: the famous cautionary tale where four datasets have identical summary statistics but wildly different relationships that only visualization reveals.; Lesson 1222 — Scatter Plots for Relationships
Answers to likely questions: based on past presentations or stakeholder concerns; Lesson 1949 — Anticipating Questions: Building in Appendices
Anticipation: occurs when units change behavior *before* treatment actually occurs.; Lesson 1458 — Common DiD Pitfalls
ANY: Returns `TRUE` if the comparison is true for *at least one* value returned by the subquery; Lesson 963 — ANY and ALL Operators Lesson 1506 — Benjamini-Hochberg Procedure
Any difference: (two-sided/non-directional); Lesson 345 — Directionality in Hypothesis Testing
Any matrix data: Where row/column relationships matter; Lesson 1224 — Heatmaps and Correlation Matrices
Any shape: The original population can be uniform, exponential, Poisson, or anything else.; Lesson 218 — What the Central Limit Theorem States
Apache Airflow: , **Prefect**, and **Dagster** log every execution step.; Lesson 1164 — Tools for Lineage Tracking
Apache Atlas: maintain centralized inventories of your data assets.; Lesson 1164 — Tools for Lineage Tracking
Apache Spark: emerged as a faster alternative, keeping data in memory when possible and supporting iterative algorithms (essential for machine learning).; Lesson 1764 — The Big Data Technology Landscape
Aperiodicity: The chain doesn't get stuck in cycles; Lesson 1589 — Markov Chains: The Foundation of MCMC
API (Application Programming Interface): is like a restaurant menu for data.; Lesson 21 — APIs and Web Scraping
API compatibility: with pandas.; Lesson 1792 — Familiar Pandas Operations in Dask
API limits: Most services cap free requests per day; Lesson 1315 — Geocoding and Reverse Geocoding
APIs: Requesting data through structured interfaces; Lesson 11 — Data Collection and Acquisition
Appendices: Technical details, additional charts, validation metrics; Lesson 1966 — Report Structure and Executive Summary
Appendix or Technical Supplement: Lesson 1947 — Handling Methodology and Technical Details
Application Logic Burden: Unlike foreign key constraints that enforce referential integrity automatically, you must manually keep denormalized data consistent through careful application code or database triggers.; Lesson 1075 — Handling Data Consistency in Denormalized Schemas
Apply a color scale: where higher retention rates get warmer colors (red, orange) and lower rates get cooler colors (blue, green); Lesson 1649 — Visualizing Cohort Data with Heatmaps
Apply conditional logic: "If the first touch was organic search AND a demo was booked, give search 40%"; Lesson 1731 — Custom Rule-Based Attribution
Apply domain knowledge: could this happen in reality?; Lesson 1209 — Outlier Detection and Investigation
Apply information criteria: Calculate AIC and BIC to balance fit and complexity; Lesson 633 — Practical Model Selection Strategy
Apply insights: Set warranty periods just beyond the steep part of the failure curve; flag high-risk product lines; Lesson 837 — Product Warranty and Failure Analysis
Apply intervention: Only the treatment group sees the new feature, pricing, or campaign; Lesson 1641 — Isolating Effects with Control Groups
Apply removal effect: Remove one channel completely, recalculate conversion probability; Lesson 1733 — Markov Chain Attribution Models
Apply the correction factor: The `n/((n-1)(n-2))` part adjusts for sample size, making the estimate more accurate for smaller datasets.; Lesson 65 — Calculating Skewness
Appropriate dimensions: for target medium; Lesson 1369 — Publication-Ready Plot Styling
AR (AutoRegressive) - p: Lesson 773 — Introduction to ARIMA: Components and Notation
AR (autoregressive) processes: and determining their order.; Lesson 731 — PACF for AR Process Identification
AR process: PACF cuts off sharply; ACF decays gradually; Lesson 731 — PACF for AR Process Identification
AR(1): Only the first lag is significant; all others fall within the confidence bounds; Lesson 731 — PACF for AR Process Identification Lesson 774 — Autoregressive (AR) Models
AR(2): First two lags are significant; lag 3 onward drops off; Lesson 731 — PACF for AR Process Identification Lesson 776 — Identifying AR Order (p) Using PACF
AR(p): First *p* lags are significant, then cutoff; Lesson 731 — PACF for AR Process Identification Lesson 732 — PACF Patterns for Common Models Lesson 774 — Autoregressive (AR) Models
Architectural discussion: Sharing skeleton code to validate design decisions; Lesson 2029 — Draft Pull Requests and WIP Workflows
Area: Crime counts in neighborhoods of different sizes; Lesson 692 — Offset Terms for Exposure Lesson 1232 — Perceptual Accuracy Hierarchy Lesson 1240 — Area and Volume Distortions
Area or volume: (acceptable since ratios are meaningful: "twice as much"); Lesson 1238 — Matching Encoding to Data Type Lesson 1240 — Area and Volume Distortions
ARIMA(1,1,2): , it means:; Lesson 773 — Introduction to ARIMA: Components and Notation
ARMA: models combine both components, so their PACF shows **gradual decay** (influenced by the MA part) rather than a clean cutoff.; Lesson 732 — PACF Patterns for Common Models
ARPU: (Average Revenue Per User) = Monthly Recurring Revenue / Number of Customers; Lesson 1666 — LTV for Subscription Businesses
ARR: is MRR × 12, representing the annualized value of subscriptions.; Lesson 1628 — SaaS Metrics: MRR, ARR, and Logo Churn
Artists: Everything visible on the plot—lines, text, patches, images—are "Artist" objects.; Lesson 1255 — The Anatomy of a Matplotlib Figure
as: Bob.; Lesson 853 — Column Aliases with AS Lesson 990 — Basic CTE Syntax and Structure
Ask: What happens if I reject H₀ when it's actually true?; Lesson 334 — Setting Alpha: Choosing Your Significance Level
Ask "Why" repeatedly: Use the "Five Whys" technique.; Lesson 2102 — Understanding Stakeholder Goals and Constraints
Ask a Question: Lesson 25 — The Scientific Method in Data Science
Ask clarifying questions: When told to "make it more accurate," probe what accuracy means in their context—speed?; Lesson 2105 — Translating Between Technical and Business Language
Ask domain experts: what size effect would matter; Lesson 609 — Practical vs Statistical Significance
Ask questions, don't demand: "Have you considered handling NaN values here?; Lesson 2024 — Code Review Best Practices
Ask specific questions: Lesson 1964 — Testing Visualizations with Audiences
Assess completeness: Are there known gaps, missing periods, or quality issues?; Lesson 2098 — Identifying Data Availability Gaps Early
Assess variance equality: compare standard deviations or use Levene's test (not yet covered formally, but intuitive: do the spreads look similar?; Lesson 290 — Assumptions and Diagnostics for Difference Intervals
Assigns new customers: to the right segment as soon as they arrive; Lesson 1710 — Operationalizing Segments: Scoring and Deployment
Assumes Normal Distribution: Z-scores interpret best when data follows a normal distribution.; Lesson 201 — Z-Score Applications and Limitations
Assuming NULLs = zeros: They don't!; Lesson 884 — AVG: Computing Averages
Assumption testing: Early scoping involves assumptions about what matters.; Lesson 2109 — Why Data Science is Inherently Iterative
Assumption Validation: means checking whether your model's prerequisites are met.; Lesson 2089 — Stage 5: Model Development and Validation
Assumptions: "Assumed all temperature readings are in Fahrenheit based on metadata; values outside -50°F to 150°F flagged as suspicious"; Lesson 1162 — Documenting Transformations Lesson 2100 — Documenting Assumptions and Open Questions
Assumptions are severely violated: extreme outliers dominate, variance explodes as X increases, or observations aren't independent; Lesson 555 — When Regression Is and Isn't Appropriate
Assumptions made: Did you assume missing data was random?; Lesson 1917 — Transparency in Analysis and Models
Assumptions matter more: Violations of homogeneity of variance become more problematic; Lesson 468 — Balanced vs Unbalanced Designs
Asymmetric: Unlike the normal distribution, it's not symmetric around its mean; Lesson 178 — Log-Normal Distribution: Definition and Properties
Asymptotic: (the tails approach but never touch zero—technically possible values extend infinitely in both directions); Lesson 169 — The Normal Distribution: Definition and Properties
Asymptotic p-values: rely on large-sample approximations (like the Central Limit Theorem).; Lesson 322 — Exact vs Asymptotic P-Values
Async: Anticipate questions proactively.; Lesson 1957 — Adapting Delivery Format: Live vs Async
Async formats: Must be self-explanatory.; Lesson 1957 — Adapting Delivery Format: Live vs Async
At least 4 accept: Use complement!; Lesson 130 — Calculating Binomial Probabilities
at least one: of the conditions.; Lesson 867 — The OR Operator Lesson 1501 — The Multiple Testing Problem
at most: or **greater than** a certain value using cumulative distribution functions.; Lesson 143 — Cumulative Poisson Probabilities Lesson 165 — Exponential Distribution: PDF and CDF Lesson 275 — One-Sided Confidence Bounds
At most 2 accept: Sum P(X=0) + P(X=1) + P(X=2); Lesson 130 — Calculating Binomial Probabilities
At-Risk Customers: Previously active users showing warning signs—declining usage, skipped payments, reduced session frequency, or negative support interactions.; Lesson 1704 — Customer Lifecycle Stages
Atomicity: All operations in a transaction succeed or all fail—no partial completion; Lesson 1110 — What Are Database Transactions?
ATT: the average effect of treatment *for those who actually received treatment*.; Lesson 1451 — Estimating Treatment Effects from Matched Samples
Attempted invalid insert: Lesson 1056 — Foreign Key Constraints in Practice
Attempted problematic delete: Lesson 1056 — Foreign Key Constraints in Practice
Attribute credit: The difference in conversion probability represents that channel's contribution; Lesson 1733 — Markov Chain Attribution Models
Attribution: You connect marketing spend to actual outcomes—which campaign drove that cohort with 60% Day-30 retention?; Lesson 1711 — What Are Acquisition Channels?Lesson 1736 — MMM vs Attribution: Key Differences Lesson 1744 — Incrementality vs Attribution
Attribution decay: models how influence weakens over time.; Lesson 1639 — Time Windows and Attribution Decay
Audience Engagement: Lesson 1292 — Introduction to Styling: Why Aesthetics Matter
Audience-specific reports: Executive summary vs technical deep-dive; Lesson 1984 — Parameterized Reports
Audit backups: erasure applies there too (eventually); Lesson 1909 — Right to Erasure and Data Retention Policies
Audit regularly: to catch unauthorized secondary uses; Lesson 1915 — Secondary Use and Scope Creep
Audit trail: See who changed what, when, and why through commit messages; Lesson 1990 — What is Version Control and Why Git?
Audit trails: Comply with regulations by tracking what data was used when; Lesson 1871 — Why Version Control for Data?Lesson 1925 — Mitigation Strategies and Responsible Disclosure
Auditability: Each run is logged and traceable; Lesson 1986 — Automated Report Generation Lesson 2123 — Simple Rules Beat Complex Models
Auditing: When stakeholders question your findings, you need to demonstrate data provenance.; Lesson 2062 — Why Data Source Documentation Matters
Augmented Dickey-Fuller (ADF) test: on your transformed series.; Lesson 741 — Testing Stationarity After Transformation
Augmented Dickey-Fuller test: gives you a rigorous, statistical answer.; Lesson 716 — Augmented Dickey-Fuller Test
Authentication Failures: occur when your credentials are wrong or insufficient.; Lesson 1093 — Troubleshooting Connection Issues
Author: Name and email of who made the commit; Lesson 1999 — Viewing Commit History
Auto-correct: known issues with logging (caution required); Lesson 1826 — Data Validation and Schema Enforcement
Autocommit mode: Each SQL statement is automatically committed (saved) immediately after it runs.; Lesson 1111 — Autocommit Mode vs Explicit Transactions
Autocorrelation: (also called serial correlation) is the most common violation.; Lesson 548 — Independence of Observations Lesson 562 — Index Plots and Time-Ordered Residuals Lesson 719 — What is Autocorrelation?Lesson 720 — The Autocorrelation Function (ACF)
Autocorrelation Function (ACF): takes this idea further by systematically calculating these relationships at multiple different lags.; Lesson 720 — The Autocorrelation Function (ACF)
Automate Setup: Lesson 2046 — Best Practices for Environment Management in Teams
Automate the process: write scripts that loop through randomizations and check balance; Lesson 1492 — Rerandomization and Practical Implementation
Automated collection: Setting up systems to continuously gather data; Lesson 11 — Data Collection and Acquisition
Automated pipelines: from raw data to final output; Lesson 1981 — What Makes a Report Reproducible?
Automated validation frameworks: solve this by letting you define expectations once and apply them consistently across datasets, pipelines, and time.; Lesson 1158 — Automated Validation Frameworks
Automatic Deduplication: Duplicate rows are removed automatically; Lesson 999 — UNION: Combining Distinct Results
Automatic derivatives: Calculating gradients for optimization becomes straightforward; Lesson 670 — Why Exponential Family Matters for GLMs
Automating documentation: means writing scripts that inspect your data and generate complete documentation automatically.; Lesson 2067 — Automating Documentation with Code
AutoRegressive Integrated Moving Average: .; Lesson 773 — Introduction to ARIMA: Components and Notation
Availability: Actual operating time ÷ planned production time (accounting for breakdowns, changeovers); Lesson 1636 — Manufacturing Metrics: OEE, Yield, and Cycle Time
Availability SLO: "The pipeline will successfully complete 99.; Lesson 1860 — SLA and SLO Definitions
Average balance method: Use `(Start + End) / 2` to account for growth; Lesson 1671 — Churn Rate Calculation Methods
Average Order Value (AOV): Revenue divided by number of orders; Lesson 1516 — Business Metrics: Definition and Examples Lesson 1625 — Cross-Functional Metric Dependencies
Average performers: 25th to 75th percentile; Lesson 61 — Using Percentiles for Comparison and Benchmarking
Average Purchase Value: is the mean revenue per transaction.; Lesson 1663 — Simple LTV: Average Revenue Per Customer
Average those cubes: Sum them all up—this is your "third moment.; Lesson 65 — Calculating Skewness
Average Treatment Effect (ATE): , which answers: "On average, how much did the treatment change the outcome compared to no treatment?; Lesson 1440 — Treatment Effect Estimation
AVG: , **MIN**, and **MAX**—together with **GROUP BY** to create rich summaries of grouped data.; Lesson 892 — GROUP BY with Different Aggregate Functions Lesson 894 — NULL Values in GROUP BY
AVG(salary): calculates average for each department; Lesson 903 — Combining WHERE and HAVING
Avoid: computing intermediate results you never use; Lesson 1780 — Transformations vs Actions in Spark Lesson 2073 — Naming Conventions for Files and Functions
Avoid "security through obscurity": Don't assume hiding risks makes them disappear; Lesson 1925 — Mitigation Strategies and Responsible Disclosure
Avoid conditioning on colliders: which would create spurious associations; Lesson 1475 — Using DAGs to Guide Analysis
Avoid conditioning on mediators: on the causal path — which would block part of the effect you want to measure; Lesson 1475 — Using DAGs to Guide Analysis
Avoid extrapolation: Don't use your model to predict Y values for X values far from your observed range; Lesson 526 — When the Intercept Has No Meaning
Avoid manipulation: You've learned about truncated axes, area distortions, and cherry-picked ranges—these aren't just technical errors, they're ethical violations when done knowingly.; Lesson 1247 — The Ethics of Visualization Design
Avoid problematic pairs: Red-green, blue-purple, and light green-yellow combinations are particularly troublesome.; Lesson 1248 — Color Blindness and Color Palette Design
Avoid redundant evaluations: Don't call the same function multiple times within different WHEN clauses.; Lesson 1037 — CASE Best Practices and Performance
Avoid unnecessary CTEs: If a simple subquery suffices and is clearer, use it; Lesson 997 — CTE Best Practices and Performance
Avoid vague names like: `script.; Lesson 2073 — Naming Conventions for Files and Functions
Avoiding double-counting: When your data has intentional duplicates but you need unique-value statistics; Lesson 887 — Aggregates with DISTINCT
Axes: (not "axis"!; Lesson 1255 — The Anatomy of a Matplotlib Figure
Axis: objects—the x-axis and y-axis with their tick marks, labels, and scales.; Lesson 1255 — The Anatomy of a Matplotlib Figure
Axis Limits: Control what range of data appears using `set_xlim()` and `set_ylim()`.; Lesson 1270 — Customizing Axes: Labels, Limits, and Scales
Azimuth: The horizontal rotation angle around your plot.; Lesson 1326 — Viewing Angles and Projection Types

B

b(θ): Lesson 665 — Canonical Form of Exponential Family Distributions Lesson 667 — Mean and Variance in the Exponential Family
Backfilling corrupts data: Re-processing historical data could add duplicate aggregations; Lesson 1847 — What is Idempotency?
Background geoms: Large shapes, reference regions, or filled areas; Lesson 1355 — Layer Order and Plot Composition
Bad: `"Pipeline failed processing file"`; Lesson 1857 — Logging Best Practices
Bad (chronological): "We collected transaction data from 2020-2024, cleaned 847 outliers, ran correlation analysis, built three models, and found churn is predicted by login frequency.; Lesson 1942 — The Pyramid Principle: Starting with the Conclusion
Bad (curved pattern): Suggests non-linear relationship; linear regression isn't appropriate; Lesson 557 — The Residuals vs Fitted Values Plot
Bad (funnel shape): Indicates heteroscedasticity; variance increases or decreases with fitted values; Lesson 557 — The Residuals vs Fitted Values Plot
Bad (outliers): Points far from the rest may be influential observations; Lesson 557 — The Residuals vs Fitted Values Plot
Bad example: "Chart about sales"; Lesson 1250 — Text Alternatives and Screen Reader Compatibility
Bad hypothesis: "The new button will improve engagement.; Lesson 1479 — Formulating Hypotheses
Balance: means mixing high-volume/low-margin channels with low-volume/high-ROI ones; Lesson 1716 — Channel Mix and Portfolio Thinking
Balance depth and breadth: .; Lesson 2143 — Continuous Learning and Skill Development
Balance Index Overhead: Every index speeds reads but slows writes.; Lesson 1086 — Index Maintenance and Monitoring
Balance inference: remember that rerandomization changes your p-values slightly (though often negligibly in practice); Lesson 1492 — Rerandomization and Practical Implementation
Balance point: The mean is the value where positive and negative distances from it cancel out perfectly; Lesson 39 — The Mean (Arithmetic Average)
Balance Tables: Create side-by-side summaries showing mean (or proportion) of each covariate in treatment vs.; Lesson 1491 — Covariate Balance and Diagnostics
Balancing groups: Good matches ensure treatment and control groups look similar *before* treatment; Lesson 1445 — The Matching Framework
bar charts: to see frequency distributions.; Lesson 1208 — Distribution Checks for All Variables Lesson 1219 — Bar Charts and Column Charts Lesson 1343 — Statistical Transformations Lesson 1959 — Choosing Familiar Chart Types
Bars: (`geom_bar` or `geom_col`) showing magnitudes as vertical rectangles; Lesson 1342 — Geometric Objects (geoms)
Bartlett's Test: is more **powerful** when your data is truly normal, but it's very sensitive to non-normality—it might reject equal variances simply because your data isn't perfectly bell-shaped, not because variances actually differ.; Lesson 380 — Testing Equal Variances: Levene's and Bartlett's Tests
base layer: created by the `ggplot()` function.; Lesson 1348 — The Base Layer: ggplot() and Data Mapping Lesson 1355 — Layer Order and Plot Composition
Baseball batting averages: (hits per at-bat); Lesson 184 — Beta Distribution: Bounded Between 0 and 1
baseline: to compare our data against.; Lesson 307 — Defining the Null Hypothesis (H₀)Lesson 636 — The Reference Category Lesson 642 — What is a Reference Category?
Baseline variance: Higher variability requires more data; Lesson 1692 — Statistical Significance and Iteration
Basemaps: solve this by providing pre-rendered background images that give your audience familiar reference points—like roads, rivers, city names, and borders.; Lesson 1314 — Basemaps and Map Tiles
Basic execution example: Lesson 2080 — Usage Examples and Running Your Code
Basic Pattern: Lesson 984 — NOT EXISTS for Finding Missing Relationships
Basic syntax: Lesson 859 — IN Operator for Multiple Values Lesson 860 — BETWEEN Operator for Ranges
Batch: (hours-to-days) permits scheduled ETL/ELT runs during off-peak hours.; Lesson 1825 — Designing Pipeline Architecture
Batch is ideal when: Lesson 1824 — Batch vs Streaming Pipelines
Batch pipelines: work like a postal service—collect mail throughout the day, then deliver it all at scheduled times (hourly, daily, nightly).; Lesson 1824 — Batch vs Streaming Pipelines
Bayes' theorem: , combining:; Lesson 1417 — Bayesian Change-Point Detection
Bayesian: "There's a 95% probability the true conversion rate is between 12% and 18%.; Lesson 1564 — Comparing Bayesian and Frequentist Proportion Inference
Bayesian A/B testing: treats the conversion rate as a random variable with a probability distribution.; Lesson 1580 — Bayesian vs Frequentist A/B Testing
Bayesian inference: is the extension of this idea into a full statistical methodology.; Lesson 116 — From Bayes' Theorem to Bayesian Inference
Bayesian Information Criterion (BIC): is a model selection tool that helps you choose between competing regression models.; Lesson 630 — Bayesian Information Criterion (BIC)
Bayesian interpretation: treats probability as a **degree of belief** or **quantification of uncertainty**.; Lesson 1540 — Comparing Bayesian and Frequentist Interpretations
Be honest about uncertainty: "High confidence" or "preliminary estimate" builds trust without undermining your conclusion.; Lesson 1944 — Executive Summary Best Practices
Be selective: Test only coefficients you care about based on theory, not all of them exploratorily.; Lesson 624 — Multiple Testing Considerations
Be specific: Select only columns you need instead of `SELECT *`; Lesson 880 — Performance Considerations and Best Practices Lesson 1679 — Defining Funnel Steps and Events
Be specific and actionable: Instead of "this is confusing," try "Consider renaming `df2` to `customer_features` to clarify what this dataframe contains.; Lesson 2024 — Code Review Best Practices
Be specific and consistent: Lesson 2073 — Naming Conventions for Files and Functions
Bed utilization rate: = (occupied bed-days / available bed-days) measures capacity efficiency.; Lesson 1633 — Healthcare Metrics: Patient Outcomes and Operational Efficiency
Before 3NF (redundant): Lesson 1066 — Third Normal Form (3NF)
Before denormalizing: Lesson 1077 — Measuring Performance Impact of Denormalization
Before-After Measurements: Lesson 369 — When to Use a Paired t-Test
Behavior: feature usage, purchase frequency, engagement level; Lesson 1701 — What is Customer Segmentation?
Behavioral signals: First-time vs.; Lesson 1689 — Multivariate Testing and Personalization
Behavioral traits: New vs.; Lesson 1682 — Segmenting Funnels by User Attributes
below: the hypothesized median.; Lesson 391 — The Sign Test for Medians Lesson 567 — Common Q-Q Plot Patterns: Heavy Tails and Light Tails Lesson 568 — Skewness in Q-Q Plots: Left and Right Deviations
Below 1:1: You're losing money on every customer.; Lesson 1667 — LTV:CAC Ratio and Profitability Lesson 1756 — LTV:CAC Ratio as a Health Metric
Below maximum: `WHERE value < (SELECT MAX(value) FROM table)`; Lesson 964 — Subqueries with Aggregate Functions
Benchmarking salaries: across companies while maintaining confidentiality; Lesson 1903 — Secure Multi-Party Computation
Benchmarks: Compare against industry standards, baseline models, or competitors.; Lesson 1939 — Context and Comparison: Making Numbers Meaningful Lesson 1962 — Contextualizing Numbers
Benefits: Lesson 342 — Alpha Level Trade-offs
Benjamini-Hochberg: (exploratory, control FDR); Lesson 1508 — Pre-Registration and Correction Strategy
Benjamini-Hochberg (BH) procedure: takes a different approach.; Lesson 1506 — Benjamini-Hochberg Procedure
Benjamini-Hochberg (FDR): When you're exploring metrics and can tolerate some false positives; Lesson 1507 — Multiple Testing in A/B Test Variations
Bernoulli trial: is a single experiment or observation that can result in exactly two outcomes: we call one outcome "success" and the other "failure.; Lesson 123 — Bernoulli Trial Definition and Properties Lesson 126 — From Bernoulli to Binomial: Multiple Trials
Bernoulli/Binomial: → Logit link (log(p/(1-p)) = Xβ); Lesson 676 — Canonical vs Non-Canonical Links
Bessel's correction: .; Lesson 50 — Population vs Sample Variance
Best for: Lesson 1226 — Stacked and Grouped Bar Charts
Best practice: Sort only when necessary for your analysis or presentation.; Lesson 880 — Performance Considerations and Best Practices
Best practices: Group only by dimensions you truly need.; Lesson 911 — Performance Considerations with Multiple Groups Lesson 1995 — Committing Changes with git commit
Best use cases: Lesson 1727 — Linear Attribution Model
Beta: controls how quickly the **trend component** (upward or downward direction) updates.; Lesson 769 — Smoothing Parameters: Alpha, Beta, Gamma
Beta distribution: is the natural choice because it:; Lesson 1581 — Setting Priors for A/B Tests
beta posterior: no complex integrals required.; Lesson 1557 — The Beta-Binomial Model Lesson 1579 — Practical Computation of Credible Intervals
Beta-Binomial: conjugacy:; Lesson 1554 — Updating Conjugate Priors with Data
Beta-Binomial conjugate pair: , your posterior is a Beta distribution: `Beta(α + successes, β + failures)`.; Lesson 1562 — Credible Intervals for Proportions
Beta-Binomial model: (proportion problems), if your posterior is `Beta(α, β)`:; Lesson 1561 — Posterior Mean and Mode
Beta(2, 8) prior: (you think a conversion rate is probably low).; Lesson 1560 — Computing the Posterior Distribution
Beta(α, β) prior: representing your initial belief; Lesson 1560 — Computing the Posterior Distribution
Better analysis: When controlling for smoking status, the correlation disappeared or even reversed, showing coffee might be protective.; Lesson 1426 — Real-World Examples: Correlation vs Causation
Better objective: "Deliver a seamless first-time user experience by Q2"; Lesson 1609 — Setting Effective Objectives
between: events (union → use addition)?; Lesson 91 — Combining Rules in Multi-Step Problems Lesson 441 — Sum of Squares: Total, Between, and Within
Between Groups: (or "Treatment"): Variation explained by differences among your group means; Lesson 444 — The ANOVA Table
Between-group variance (numerator): Measures how spread out the group means are from the overall mean; Lesson 440 — The F-Statistic and Its Distribution
Betweenness centrality: How often a node lies on shortest paths between others (the "bridges"); Lesson 1320 — Network Metrics and Visual Analysis
Beware these traps: Lesson 1694 — Daily Active Users (DAU) and Monthly Active Users (MAU)
BI: You need regular reports on key business metrics; Lesson 4 — Data Science vs Data Analytics vs Business Intelligence
Bias: It ignores everything that happened after the first click—potentially undervaluing nurture campaigns and retargeting that actually closed the deal.; Lesson 1723 — Comparing Single-Touch Models
Bias and noise: Sensor errors, bot traffic, or sampling issues; Lesson 1762 — Extended Dimensions: Veracity and Value
Bias Correction: If your bootstrap distribution is systematically shifted from the sample statistic, BCa corrects for this; Lesson 304 — BCa Bootstrap Intervals: Bias Correction
biased: .; Lesson 552 — Zero Conditional Mean of Errors Lesson 553 — Exogeneity: X Must Be Independent of Errors Lesson 554 — Consequences of Violating Assumptions
Biased assignment: Certain user types might be systematically excluded or included; Lesson 1524 — Sample Ratio Mismatch (SRM)
BIC: but you cannot use the Partial F-Test; Lesson 626 — Nested vs Non-Nested Models Lesson 660 — Choosing the Polynomial Degree Lesson 700 — AIC and BIC for Model Selection Lesson 781 — Information Criteria: AIC and BIC Lesson 785 — Information Criteria: AIC and BIC
BIC (Bayesian Information Criterion): are scores that penalize models for using too many parameters while rewarding good fit to the data.; Lesson 781 — Information Criteria: AIC and BIC
Big Compute: problems occur when the calculations themselves are expensive, even with modest data sizes.; Lesson 1765 — Big Data vs Big Compute
Big Data: problems arise when you have so much data that it won't fit in memory or takes too long to read/write.; Lesson 1765 — Big Data vs Big Compute
Biggest impact: Which step affects the most users?; Lesson 1685 — Actionable Insights from Funnel Analysis
BigQuery: Serverless model; Google manages all infrastructure automatically; Lesson 1813 — Modern Cloud Data Warehouses: Snowflake, BigQuery, Redshift
Bimodal: Two distinct peaks, suggesting two subgroups (e.; Lesson 1175 — Histograms for Distribution Shape
Bimodal or multimodal patterns: (multiple peaks); Lesson 1286 — Violin Plots and Distribution Shape
Bin data: `stat_bin()` aggregates continuous data into intervals; Lesson 1352 — Statistical Transformations with stat_* Layers
Binary assets: that must be versioned with code; Lesson 2033 — Git Large File Storage (LFS) for Data Assets
Binary or semi-structured: Git can't show meaningful diffs, so every change duplicates the entire file; Lesson 2070 — Separating Data from Code
Binary outcomes: Success or failure; Lesson 131 — Real-World Applications of Binomial Distributions Lesson 435 — McNemar's Test: Paired Categorical Data Lesson 678 — Choosing the Right Link Function
Binning matters: Too few bins and you miss important details; too many bins and you see noise instead of pattern.; Lesson 1220 — Histograms for Continuous Distributions
Binning problems: Lesson 1245 — Misleading Aggregations and Binning
Binomial: tracks "k successes in n trials with probability p each"; Lesson 142 — Poisson as Limit of Binomial Lesson 154 — Real-World Use Cases: Customer Behavior and Events Lesson 664 — What is the Exponential Family of Distributions?
Binomial data: k successes in n trials; Lesson 1560 — Computing the Posterior Distribution
binomial distribution: enters the picture.; Lesson 126 — From Bernoulli to Binomial: Multiple Trials Lesson 127 — Binomial Distribution PMF Lesson 153 — Real-World Use Cases: Quality Control and Defects Lesson 154 — Real-World Use Cases: Customer Behavior and Events Lesson 669 — The Dispersion Parameter φ
Biological Gradient: Is there a dose-response relationship?; Lesson 498 — Bradford Hill Criteria for Causation
Blended CAC: weighted average across all channels; Lesson 1716 — Channel Mix and Portfolio Thinking Lesson 1754 — Blended CAC vs Paid CAC
Block randomization: divides the assignment process into small "blocks" of fixed size, ensuring balance within each block.; Lesson 1488 — Block Randomization
Blood pressure: readings across populations; Lesson 179 — When Variables Are Log-Normally Distributed
BLUE: the Best Linear Unbiased Estimator.; Lesson 521 — Properties of Least Squares Estimators
Bonferroni: Divide your α by the number of tests (conservative, appropriate for critical decisions); Lesson 1507 — Multiple Testing in A/B Test Variations Lesson 1508 — Pre-Registration and Correction Strategy
Bonferroni Correction: is a conservative, straightforward method to control this risk.; Lesson 458 — Bonferroni Correction Lesson 512 — Testing Significance in Correlation Matrices Lesson 624 — Multiple Testing Considerations Lesson 824 — Multiple Group Comparisons
Books: table has a primary key `book_id`; Lesson 1051 — Introduction to Foreign Keys
BOOLEAN: True/False values; Lesson 846 — Tables, Schemas, and Data Types
Boolean logic: every condition produces either TRUE or FALSE.; Lesson 865 — Introduction to Logical Operators in SQL
bootstrap distribution: shows you the range and variability of what your estimate might be, forming the foundation for confidence intervals.; Lesson 299 — How Bootstrap Resampling Works Lesson 300 — Bootstrap Distribution of a Statistic
Bootstrap methods: work by resampling *with replacement* from your observed data thousands of times.; Lesson 291 — Non-Parametric Alternatives for Difference Intervals
Bootstrap shines: when:; Lesson 305 — When to Use Bootstrap vs Traditional Methods
Bootstrapping: Resampling methods generate different samples; Lesson 2055 — Why Randomness Matters in Data Science
Boston's coefficient: doesn't appear (it's built into the intercept); Lesson 643 — Interpreting Coefficients Relative to Reference
both: domains.; Lesson 7 — The Data Science Skill Stack Lesson 87 — Multiplication Rule for Independent Events Lesson 276 — Sampling Distribution of a Proportion Lesson 570 — Q-Q Plots vs Formal Normality Tests: When Visual Checks Matter Lesson 688 — Effect Size and Practical Significance Lesson 939 — FULL OUTER JOIN with Multiple Conditions Lesson 1001 — INTERSECT: Finding Common Rows
Both ACF and PACF: Gradual decay (no clear cutoff); Lesson 733 — Using ACF and PACF Together
Both are Aces: P(A ∩B) = (4/52) × (3/51) ≈ 0.; Lesson 104 — Dependent Events and Joint Probability
Both say non-stationary: Apply differencing or detrending.; Lesson 718 — Interpreting Stationarity Test Results
Both say stationary: Proceed with modeling—no transformation needed.; Lesson 718 — Interpreting Stationarity Test Results
Both transformed: With `log(Y) = β₀ + β₁log(X)`, β₁ becomes an *elasticity*—the percent change in Y per 1% change in X.; Lesson 594 — Interpreting Models After Transformation
Bottom layers: Technical details, methodology, data sources—available as appendices if questioned.; Lesson 1952 — The Pyramid Principle: Leading with Conclusions
Bottom/reference: Common ancestor (the base); Lesson 2019 — Using Diff Tools for Conflict Resolution
box plot: (or box-and-whisker plot) turns the five-number summary into a visual.; Lesson 59 — The Five-Number Summary and Box Plots Lesson 70 — Visual Methods: Box Plots and Scatter Plots Lesson 1176 — Box Plots for Spread and Outliers Lesson 1268 — Box Plots and Violin Plots
Box plots: show the distribution through **five key numbers**: the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum.; Lesson 1223 — Box Plots and Violin Plots Lesson 1268 — Box Plots and Violin Plots Lesson 1343 — Statistical Transformations
Box-Cox transformation: solves this by testing a *family* of power transformations, controlled by a single parameter called **lambda (λ)**.; Lesson 214 — Box-Cox Transformation Lesson 593 — Box-Cox Transformation
boxplot: draws a box from the first quartile (Q1) to the third quartile (Q3), with a line at the median.; Lesson 55 — Visualizing Spread Lesson 1285 — Categorical Plots: stripplot, swarmplot, boxplot
Boy Scout Rule: Leave code slightly cleaner than you found it whenever you touch it; Lesson 2137 — Refactoring Strategies and Debt Paydown
Branch A: changes line 42 of `analysis.; Lesson 2010 — Merge Conflicts: What They Are
Branch B: changes that *same* line 42 a different way; Lesson 2010 — Merge Conflicts: What They Are
Branches solve this problem: You can:; Lesson 2005 — What are Branches and Why Use Them?
Branching logic: After analyzing a dataset, you might trigger different validation pipelines depending on data quality scores or record counts.; Lesson 1844 — Dynamic Dependencies
Brand awareness campaigns: where every impression matters similarly; Lesson 1727 — Linear Attribution Model
Brand awareness efforts: Which channels are best at introducing new prospects?; Lesson 1720 — First-Touch Attribution Model
Breadth: means splitting into many parallel branches at fewer levels.; Lesson 1623 — Depth vs Breadth in Metric Trees
break-even ROAS: (the minimum ROAS needed to cover all costs) is critical.; Lesson 1751 — Return on Ad Spend (ROAS): Definition and Calculation Lesson 1752 — Target ROAS and Break-Even Analysis
Breaking changes: (renaming, deleting columns, changing data types) require careful handling:; Lesson 1876 — Schema Evolution and Backwards Compatibility
Breaking it down: Lesson 2017 — Understanding Merge Conflicts
Breaking point: Above 10-20 GB (or ~50% of available RAM), Pandas becomes unreliable or crashes; Lesson 1783 — Data Size Thresholds: When Pandas Isn't Enough
Brief Mention with Signpost: Lesson 1947 — Handling Methodology and Technical Details
Bright spots: Anomalously high retention cohorts teach you what worked; Lesson 1649 — Visualizing Cohort Data with Heatmaps
Broken LTV:CAC ratio: If churn is too high, you may never recover acquisition costs; Lesson 1670 — What is Churn and Why It Matters
Bubble charts: extend this by encoding a third numeric variable through the **size of each point (bubble)**.; Lesson 1229 — Bubble Charts for Three Variables
Bucket users: into tiers (e.; Lesson 1699 — Engagement Scoring Systems
Budget: What resources are available?; Lesson 2102 — Understanding Stakeholder Goals and Constraints
Budget optimization: Shift resources to channels with real impact; Lesson 1718 — Introduction to Marketing Attribution Lesson 1742 — Budget Optimization Using MMM
Bug fixes: Create a `hotfix` branch to quickly patch issues; Lesson 2005 — What are Branches and Why Use Them?
Build a bootstrap distribution: of the test statistic under H₀; Lesson 396 — Bootstrap Hypothesis Testing
Build backup slides: with technical details for deep dives; Lesson 1956 — Anticipating and Addressing Audience Questions
Build comprehensive models: Capture the full story your data tells; Lesson 1190 — Introduction to Multivariate Analysis
Build intuition: before diving into formulas; Lesson 259 — Simulating Sampling Distributions
Build the transition graph: Map all observed customer journeys as state transitions (e.; Lesson 1733 — Markov Chain Attribution Models
Build trust incrementally: Regular check-ins demonstrate progress and keep stakeholders engaged.; Lesson 2111 — Fast Feedback Loops with Stakeholders
Building confidence intervals: using standard formulas; Lesson 202 — Why Test for Normality?Lesson 265 — Using Standard Error in Practice
Built-in transformations: (ggplot2, some Seaborn):; Lesson 1373 — Statistical Transformations: Built-in vs Manual
Burden of proof: The prosecution must prove guilt beyond reasonable doubt; Lesson 312 — Hypothesis Testing as a Legal Analogy
Burn-in: refers to discarding the first portion of your MCMC samples—typically the first 10-50% of iterations.; Lesson 1592 — Burn-in, Thinning, and Convergence Diagnostics
Business → Technical: When a stakeholder says "We need to reduce customer churn," you translate this into: "Build a classification model predicting 30-day cancellation probability, optimized for recall since false negatives cost more than false positives, using histori...; Lesson 2105 — Translating Between Technical and Business Language
Business decisions can't wait: Supply chain adjustments based on current demand patterns; Lesson 1788 — Streaming Data and Real-Time Requirements
Business documentation: Process flows, compliance rules, product specs; Lesson 1201 — Domain Knowledge as a Hypothesis Source
Business impact: Would a 0.; Lesson 1480 — Minimum Detectable Effect (MDE)Lesson 1858 — Alerting Strategies Lesson 2141 — Building a Portfolio and Personal Brand
Business implication: If you're a pool safety company, don't target ice cream shops for partnerships based on this correlation—focus on seasonal weather patterns instead.; Lesson 1426 — Real-World Examples: Correlation vs Causation
Business intelligence: "What's our cheapest product?; Lesson 885 — MIN and MAX: Finding Extremes
Business Intelligence (BI) professional: creates a dashboard showing last quarter's sales by region; Lesson 4 — Data Science vs Data Analytics vs Business Intelligence
Business KPIs: Sales, transactions, or user activity with weekly or monthly seasonality; Lesson 1411 — Applications and Limitations
Business logic violations: Withdrawing more money than available; Lesson 1109 — Input Validation and Defense in Depth
Business metrics: A handful of products generate most revenue; Lesson 191 — Pareto Principle and the 80/20 Rule Lesson 1522 — Balancing Speed and Accuracy in Metric Selection
Business needs evolve: What mattered last quarter might not matter now; Lesson 15 — Deployment, Monitoring, and Iteration
Business processes: How does data flow through the organization?; Lesson 1168 — Understanding Domain Context
Business question: "Why are customers leaving?; Lesson 2085 — Stage 1: Problem Definition and Scoping
Business relevance: It forces you to ask: "What size of change would actually move the needle for our business?; Lesson 1494 — Effect Size: The Minimum Detectable Effect
Business requirements: Must results be explainable to non-technical stakeholders?; Lesson 1169 — Clarifying Assumptions and Constraints
Business rule checks: Lesson 1211 — Domain Validation and Sanity Checks
Business rules: (minimum/maximum prices, valid categories); Lesson 75 — Domain-Specific Outlier Rules
Business strategy shifts: If your company pivots from growth-at-all-costs to sustainable profitability, your North Star metric and its supporting branches must change.; Lesson 1626 — Maintaining and Evolving Metric Trees
Business Understanding: Knowing how organizations actually work helps you focus on problems that matter, not just technically interesting puzzles.; Lesson 7 — The Data Science Skill Stack
Business Value Side: Lesson 2118 — Cost-Benefit Analysis for Continued Work
Business-defined thresholds: E.; Lesson 1669 — LTV Segmentation and Targeting
Business-Friendly Labels: Instead of "Cluster 3," assign meaningful names like "High-Value Loyalists," "At-Risk Champions," or "New Bargain Hunters.; Lesson 1709 — Segment Profiling and Interpretation
Busy executives: get the answer immediately; Lesson 1942 — The Pyramid Principle: Starting with the Conclusion
Buttons: trigger specific actions when clicked:; Lesson 1332 — Streamlit Widgets: Inputs and Controls
By callable function: Lesson 1801 — Column Selection and Usecols
By cohort/channel: Some channels may have better LTV:CAC but slower payback, affecting budget allocation; Lesson 1757 — Payback Period: Definition and Importance
By Hand: Lesson 542 — Computing Fitted Values and Residuals
by how much: A confidence interval for the difference gives you a range of plausible values for the true difference between two population proportions.; Lesson 412 — Confidence Interval for Difference Lesson 1955 — Framing Insights in Business Language

C

C(n-1, r-1): counts arrangements of those *r-1* successes; Lesson 135 — The Negative Binomial Distribution: Waiting for r Successes
C(n, k): The number of ways to choose k successes from n trials (called "n choose k" or the binomial coefficient); Lesson 127 — Binomial Distribution PMF
Caching: means storing the results of expensive computations so you can reuse them instead of recalculating.; Lesson 1337 — Dashboard Performance and Caching Lesson 1782 — Spark Performance Basics: Partitions and Caching
Calculate: standardized residuals for each cell after your significant Chi-Squared test; Lesson 428 — Post-Hoc Analysis and Residuals Lesson 994 — CTEs for Simplifying Complex Joins
Calculate a p-value: If you observe an extreme imbalance (e.; Lesson 391 — The Sign Test for Medians
Calculate cumulative contribution: to the total metric; Lesson 1698 — Power User Curves and Engagement Distribution
Calculate difference: Treatment effect = (Treatment metric) - (Control metric); Lesson 1641 — Isolating Effects with Control Groups
Calculate differences: Subtract the hypothesized median from each observation; Lesson 391 — The Sign Test for Medians
Calculate error metrics: Compare predictions to actual values; Lesson 790 — Out-of-Sample Forecast Evaluation
Calculate forecast errors: on your data; Lesson 772 — Holt-Winters Parameter Optimization
Calculate incrementality: Lift = (Test performance - Expected baseline) ÷ Expected baseline; Lesson 1746 — Geo-Lift Experiments Lesson 1747 — Ghost Ads and PSA Tests
Calculate intercept β₀: Use β₀ = ȳ - β₁x̄; Lesson 522 — Implementing Least Squares from Scratch
Calculate LTV per cohort: by summing or projecting total revenue per customer in that group; Lesson 1664 — Cohort-Based LTV Calculation
Calculate P(A and B): the probability both events occur together; Lesson 102 — Testing for Independence
Calculate P(A): the probability of event A occurring; Lesson 102 — Testing for Independence
Calculate P(B): the probability of event B occurring; Lesson 102 — Testing for Independence
Calculate paired differences: (just like the paired t-test or Sign Test); Lesson 392 — Wilcoxon Signed-Rank Test
Calculate probabilities: Convert to the standard normal distribution (from your previous lesson) to find exact probabilities; Lesson 195 — Z-Score Definition and Interpretation
Calculate slope β₁: Use the formula involving sums of products and squared deviations from the mean; Lesson 522 — Implementing Least Squares from Scratch
Calculate statistics: like mean, median, min, max, std, or count for each group; Lesson 1185 — Grouped Summary Statistics
Calculate the business impact: in dollars, time, or customers; Lesson 1956 — Anticipating and Addressing Audience Questions
Calculate the expected value: E(X) = Σ [outcome × probability]; Lesson 152 — Decision Making Under Uncertainty
Calculate the F-Statistic: Lesson 447 — Conducting One-Way ANOVA in Practice
Calculate the mean: of that sample; Lesson 222 — Visualizing the CLT with Simulations
Calculate the p-value: as the proportion of permuted statistics as extreme or more extreme than your observed value; Lesson 395 — Permutation Tests for Means and Beyond
Calculate the rolling average: at each point using that window; Lesson 739 — Moving Average Detrending
Calculate the tail probabilities: For 95%, that's (1 - 0.; Lesson 1575 — Computing Equal-Tailed Credible Intervals
Calculate the test statistic: using only b and c; Lesson 436 — Conducting McNemar's Test
Calculate the treatment effect: within each stratum; Lesson 1430 — Controlling for Confounders: Stratification
Calculate the U statistic: based on these rank sums; Lesson 393 — Mann-Whitney U Test (Wilcoxon Rank-Sum)
Calculate transition probabilities: For each state, determine the likelihood of moving to the next state; Lesson 1733 — Markov Chain Attribution Models
Calculate your statistic: (median, correlation, ratio, etc.; Lesson 306 — Bootstrap for Non-Standard Problems
Calculate your test statistic: (you learned this in lesson 316); Lesson 319 — Calculating P-Values from Test Statistics
Calculate your Z-score: from raw data (you've already learned this!; Lesson 198 — Using Z-Tables for Probability
Calculated fields: Storing computed values (like `order_total`) instead of recalculating from line items every time.; Lesson 1071 — When to Denormalize: Performance Trade-offs
Calculates the average rank: for each group; Lesson 471 — Kruskal-Wallis H Test: The Non-Parametric One-Way ANOVA
Calculating date differences: Lesson 1040 — Date Arithmetic and INTERVAL Operations
Calculating differences: Compare current vs previous values (sales growth, price changes); Lesson 1023 — Introduction to Window Functions: LAG and LEAD
Calculating the posterior distribution: means applying Bayes' theorem to compute exactly how probable each parameter value is, given both your starting assumptions and the observed data.; Lesson 1545 — Calculating the Posterior Distribution
Calculations: Computing percentages or ratios using aggregates; Lesson 967 — Subqueries in the SELECT Clause
Calibration (Predictive Parity): Lesson 1887 — Defining Fairness in Data Science
Caliper Matching: adds a safety rule: only match if the propensity scores are within a maximum distance (the "caliper").; Lesson 1448 — Propensity Score Matching Methods
Call Centers: A help desk receives 30 calls per day on average.; Lesson 144 — Poisson Applications: Arrivals and Events
Call-in polls: Only passionate viewers with free time participate; Lesson 246 — Volunteer and Self-Selection Bias
Campaign A: 70% chance of $50,000 profit, 30% chance of $0; Lesson 152 — Decision Making Under Uncertainty
Campaign B: 40% chance of $100,000 profit, 60% chance of -$10,000 loss; Lesson 152 — Decision Making Under Uncertainty
cannot: compare unstandardized coefficients across predictors with different units:; Lesson 605 — Units and Scaling of Coefficients Lesson 899 — HAVING vs WHERE: Key Differences Lesson 1011 — Filtering on Window Function Results Lesson 1574 — Credible Intervals vs Confidence Intervals Lesson 1906 — Legal Bases for Processing Personal Data Lesson 1932 — Building Trust Through Transparency
canonical link: for binomial (binary) outcomes, meaning it naturally pairs with the exponential family representation of the binomial distribution.; Lesson 673 — The Logit Link Lesson 676 — Canonical vs Non-Canonical Links Lesson 678 — Choosing the Right Link Function Lesson 690 — The Poisson Distribution as a GLM
Capture non-linear monotonic patterns: (a curved upward trend still gets positive correlation); Lesson 486 — Spearman's Rank Correlation Coefficient
Cardinality: = number of unique values.; Lesson 1080 — When to Create an Index Lesson 1083 — Index Selectivity and Cardinality Lesson 1867 — Data Profiling and Monitoring
Career advancement: Publishing sensational findings (even if overstated) could boost your reputation.; Lesson 1930 — Managing Conflicts of Interest
Carryover effect: Advertising impact persists and decays over time, like a drug slowly leaving your bloodstream; Lesson 1739 — Adstock and Carryover Effects
Cartesian product: first—every row from `orders` paired with every row from `customers`—then filters it.; Lesson 925 — INNER JOIN vs WHERE: Join Order Matters Lesson 942 — Understanding CROSS JOIN Syntax and Mechanics Lesson 943 — CROSS JOIN Results: Size and Structure Lesson 955 — Avoiding Cartesian Products
CartoDB: Clean, minimal styles for data-first presentations; Lesson 1314 — Basemaps and Map Tiles
CASCADE: automatically propagates the change to child records:; Lesson 1054 — Cascading Actions: DELETE and UPDATE Lesson 1057 — ON DELETE and ON UPDATE Actions
Case Studies: simulate real problems: "How would you measure the success of a new feature?; Lesson 2142 — Interviewing: Technical and Behavioral Prep
Cash-constrained companies: prioritize rapid payback above all, even if it means higher CAC or lower ROAS, because they literally can't afford to wait.; Lesson 1759 — Optimizing ROAS, CAC, and Payback Together
categorical: and **numerical variables**, making your **data cleaning and preparation** work much easier than dealing with messy external sources.; Lesson 20 — Primary Data Sources: Databases and Data Warehouses Lesson 634 — Categorical Variables in Regression
categorical × categorical: interaction captures whether the combined effect of two categories differs from their individual additive effects—like whether a specific treatment works differently depending on disease severity level.; Lesson 687 — Categorical Predictors and Interactions in Logistic Models Lesson 1182 — Choosing Analysis Methods by Variable Types
Categorical comparisons: Use `geom_bar` or `geom_col`; Lesson 1342 — Geometric Objects (geoms)
categorical data: or when testing **variance**.; Lesson 315 — Common Test Statistics: Z, t, Chi-Square, and F Lesson 426 — Assumptions and Sample Size Requirements Lesson 430 — Common Applications and Pitfalls
Categorical plots: Compare groups (box plots, violin plots, bar plots); Lesson 1281 — Introduction to Seaborn's Statistical Plots
Categorical-to-Categorical: Build contingency tables and apply association measures like Cramér's V or chi-square tests.; Lesson 1210 — Relationship Exploration: Correlation and Association
Category or product line: if your queries consistently filter these; Lesson 1812 — Partitioning and Clustering Strategies
Causal Chain Mapping: Lesson 1602 — Identifying Leading Indicators for Your Metrics
Causal clarity: Can you tie drop-off to specific friction (forms too long, unclear CTAs)?; Lesson 1685 — Actionable Insights from Funnel Analysis
Causal question: Does traffic *cause* revenue, or do successful companies simply attract both?; Lesson 1426 — Real-World Examples: Correlation vs Causation
Causal reasoning: Ask "what causes this feature to have predictive power?; Lesson 1883 — Protected Classes and Proxy Variables
Causation: means one variable *directly causes* changes in another.; Lesson 1420 — Defining Correlation and Causation
Cause must precede effect: If A causes B, then A must happen before B.; Lesson 1425 — Identifying Potential Causal Relationships
Caused by: both your treatment and outcome; Lesson 1432 — Colliders and Bad Controls
CC-BY: Requires attribution when data is used; Lesson 2082 — Choosing a License for Data Science Projects
CC-BY-SA: Requires derivatives to use the same license (like GPL for data); Lesson 2082 — Choosing a License for Data Science Projects
CC0 (Public Domain): Maximum openness, no restrictions; Lesson 2082 — Choosing a License for Data Science Projects
CDF: For x in [a, b], F(x) = (x - a)/(b - a) — a straight line from 0 to 1; Lesson 161 — The Continuous Uniform Distribution
Cell proportion: 50/100 = 0.; Lesson 98 — Conditional Probability with Tables
Cell proportions: divide each cell by the grand total, giving you joint probabilities like P(A and B).; Lesson 98 — Conditional Probability with Tables
Cell sizes: in organisms; Lesson 179 — When Variables Are Log-Normally Distributed
Cells: Metrics like user count, retention rate, cumulative revenue, or conversion rate; Lesson 1647 — Building a Cohort Table
Censored observations: subjects still "at risk" but whose outcome is unknown (they left the study, were lost to follow-up, or the study ended); Lesson 812 — Handling Event Times and Censoring Lesson 839 — Time-to-Conversion in Marketing Funnels
Censored observations contribute: to the "at risk" count up until their censoring time, then they're removed from the calculation.; Lesson 812 — Handling Event Times and Censoring
censoring: .; Lesson 802 — What is Survival Analysis?Lesson 835 — Customer Churn Prediction with Survival Analysis
Census data: Does your sample reflect regional population proportions?; Lesson 421 — Applications: Uniform, Genetic Ratios, and Distributions
Center Line (CL): The process mean or target value; Lesson 1396 — Introduction to Control Charts Lesson 1397 — Shewhart Control Chart Basics Lesson 1398 — Control Charts for Means (X-bar Charts)
Center pane: The result you're building; Lesson 2019 — Using Diff Tools for Conflict Resolution
Centered: at the true population mean (μ); Lesson 252 — Sampling Distribution of the Sample Mean
Centered around zero: positive and negative deviations should balance out; Lesson 709 — Irregular Component: Random Noise
Centered moving averages: use data points from *both* before and after the target time.; Lesson 753 — Centered vs Trailing Moving Averages
Centering: solves this by transforming each predictor to have a mean of zero.; Lesson 656 — Centering Variables in Interactions Lesson 661 — Centering Predictors for Polynomials
Centers: Which group has the highest median?; Lesson 1186 — Box Plots and Violin Plots by Group
Central Limit Theorem: (which you'll learn later) shows that averages tend to be normally distributed; Lesson 169 — The Normal Distribution: Definition and Properties Lesson 223 — Standard Error and the CLT
Central Limit Theorem (CLT): is one of the most important results in statistics.; Lesson 218 — What the Central Limit Theorem States
Central tendency: is the statistical concept of finding a single representative value that describes the "center" or "typical" value of a dataset.; Lesson 38 — What is Central Tendency?Lesson 1172 — What is Univariate Analysis?Lesson 1220 — Histograms for Continuous Distributions
Centrality measures: identify important nodes:; Lesson 1320 — Network Metrics and Visual Analysis
Centralized storage with structure: ensures documentation lives where everyone can find it.; Lesson 2068 — Data Provenance Best Practices
ceteris paribus: (Latin for "other things being equal").; Lesson 604 — Marginal Effects and Ceteris Paribus Lesson 637 — Interpreting Dummy Variable Coefficients
Change detection: Compare current vs.; Lesson 1024 — LAG Function: Accessing Previous Row Values
Change parameters: (e.; Lesson 1302 — Interactive Controls: Dropdown Menus and Buttons
Change-point detection: identifies moments in time when the statistical properties of your data fundamentally shift.; Lesson 1412 — What is Change-Point Detection?
Changes to be committed: (staged): Files you've added to the staging area with `git add` but haven't committed yet.; Lesson 1997 — Viewing Repository State with git status
Changing seasonality: (e.; Lesson 745 — STL Decomposition (Seasonal-Trend Loess)
Changing spread: The fluctuations get wider or narrower over time (violates constant variance); Lesson 715 — Visual Tests for Stationarity
Channel concentration: percentage of volume from top channel (lower is safer); Lesson 1716 — Channel Mix and Portfolio Thinking
Channels: (repositories) to search; Lesson 2052 — Conda Environments and Dependencies
Chart type: "Bar chart showing.; Lesson 1250 — Text Alternatives and Screen Reader Compatibility
Chartjunk: refers to anything in a visualization that doesn't represent data or support comprehension:; Lesson 1246 — Visual Clutter and Chartjunk
Check access permissions: Can you actually query these databases or files?; Lesson 2098 — Identifying Data Availability Gaps Early
Check associations with outcome: Does it also correlate with your dependent variable?; Lesson 1429 — Identifying Confounders in Practice
Check associations with treatment: Does the potential confounder correlate with your independent variable?; Lesson 1429 — Identifying Confounders in Practice
Check assumptions: using visual tools (histograms, Q-Q plots) and tests (Shapiro-Wilk, Levene's); Lesson 398 — Choosing Between Parametric and Non-Parametric Tests Lesson 447 — Conducting One-Way ANOVA in Practice Lesson 542 — Computing Fitted Values and Residuals Lesson 633 — Practical Model Selection Strategy
Check assumptions first: Lesson 368 — Common Pitfalls and Best Practices
Check connection parameters: Validate host, port, database name, and connection string format; Lesson 1093 — Troubleshooting Connection Issues
Check context: does it appear in a cluster of suspicious records?; Lesson 1209 — Outlier Detection and Investigation
Check contrast ratios: between text/elements and backgrounds; Lesson 1254 — Testing Visualizations for Accessibility
Check covariate balance: against your threshold; Lesson 1492 — Rerandomization and Practical Implementation
Check for cycles: (remove them—DAGs are acyclic!; Lesson 1469 — Building a Simple Causal DAG
Check it: Plot log-odds against each continuous predictor; look for straight-line patterns, not curves.; Lesson 686 — Assumptions and Diagnostics in Logistic Regression
Check normality: Q-Q plots or Shapiro-Wilk test per group; Lesson 290 — Assumptions and Diagnostics for Difference Intervals
Check response patterns: Low response rates (under 50%) often signal nonresponse bias.; Lesson 250 — Strategies for Bias Detection and Mitigation
Check result counts: if you expect hundreds of rows but get millions, investigate immediately; Lesson 955 — Avoiding Cartesian Products
Check retention schedules: what *must* you keep by law?; Lesson 1909 — Right to Erasure and Data Retention Policies
Check source documentation: or file metadata when available; Lesson 1135 — Detecting and Fixing Encoding Issues
Check statistical significance: Use t-tests and F-tests to identify meaningful predictors; Lesson 633 — Practical Model Selection Strategy
Check the lineage: Use your pipeline's metadata to identify which upstream tables, files, or APIs fed into the problematic dataset; Lesson 1870 — Root Cause Analysis for Quality Issues
Check the p-value: (statistical significance); Lesson 609 — Practical vs Statistical Significance
Checkout Started: Lesson 1679 — Defining Funnel Steps and Events
Checks each row: in the main query to see if its column value matches *any* value from the subquery results; Lesson 961 — IN Operator with Subqueries
Cherry-picking time ranges: means deliberately selecting start and end dates that support a preferred narrative while hiding inconvenient context.; Lesson 1241 — Cherry-Picking Time Ranges
Chi-square tests: compare outcome distributions across groups; Lesson 1890 — Measuring Disparate Impact
chi-squared distribution: is another special case.; Lesson 182 — Special Cases: Exponential and Chi-Squared Lesson 254 — Sampling Distribution of the Sample Variance Lesson 628 — Likelihood Ratio Tests Lesson 684 — Likelihood Ratio Tests for Model Comparison Lesson 699 — The Likelihood Ratio Test
Chi-Squared test: uses an approximation based on a mathematical distribution, while **Fisher's Exact Test** calculates the exact probability by considering all possible table arrangements.; Lesson 434 — Fisher's Exact vs Chi-Squared: When to Use Each
Chi-Squared Test of Independence: helps you answer questions like: "Is there a relationship between gender and product preference?; Lesson 422 — Introduction to Chi-Squared Test of Independence
Children and minors: They lack legal capacity and cognitive maturity to understand data implications; Lesson 1918 — Special Populations and Vulnerable Groups
Choose a meaningful baseline: Lesson 645 — Changing the Reference Category
Choose a window size: (e.; Lesson 739 — Moving Average Detrending
Choose Dagster when: You're managing complex data transformations, need strong guarantees about data quality, or want asset-centric workflows.; Lesson 1839 — Alternative Orchestration Tools
Choose exponential smoothing when: Lesson 764 — Exponential Smoothing vs Moving Averages
Choose intensity metrics: Frequency (daily visits), depth (features used), or duration (session length); Lesson 1693 — Defining User Engagement
Choose Kendall's Tau when: Lesson 490 — Kendall's Tau vs Spearman's Rho
Choose Luigi when: You have simpler pipelines, want minimal infrastructure, or need quick prototyping without heavy tooling.; Lesson 1839 — Alternative Orchestration Tools
Choose moving averages when: Lesson 764 — Exponential Smoothing vs Moving Averages
Choose natural keys when: Lesson 1050 — Choosing Effective Primary Keys
Choose Prefect when: You want rapid development, need dynamic pipelines, or prefer writing pure Python without Airflow's constraints.; Lesson 1839 — Alternative Orchestration Tools
Choose Spearman's Rho when: Lesson 490 — Kendall's Tau vs Spearman's Rho
Choose surrogate keys when: Lesson 1050 — Choosing Effective Primary Keys
Choose your confidence level: (e.; Lesson 1562 — Credible Intervals for Proportions
Choosing measures: Remember comparing mean, median, and mode?; Lesson 63 — Understanding Distribution Shape
Choosing references wisely: pick a meaningful baseline for comparison; Lesson 643 — Interpreting Coefficients Relative to Reference
Choosing Weak Leading Indicators: Lesson 1603 — Common Pitfalls in Indicator Selection
Choosing α before analysis: means deciding your threshold for rejecting the null hypothesis—typically 0.; Lesson 329 — Choosing α Before Analysis
Churn: is when customers stop doing business with you.; Lesson 1670 — What is Churn and Why It Matters
Churn analysis: measures the percentage who *stop using* your product in a given period (Week 1: 10% churned, Week 2: 8% churned).; Lesson 1660 — Retention Curves vs Churn Analysis Lesson 1678 — What is Funnel Analysis?
Churn prediction: becomes more accurate when built separately for high-value versus low-value segments; Lesson 1701 — What is Customer Segmentation?
Churn Rate: Percentage of customers who leave; Lesson 1516 — Business Metrics: Definition and Examples Lesson 1613 — Raw Counts vs. Rates and Ratios
Churn reason: (from attribution analysis): If they left due to missing features, notify them when those ship; Lesson 1676 — Win-Back and Retention Strategies
Churned Customers: Those who've stopped paying, canceled subscriptions, or haven't engaged in your defined inactivity window.; Lesson 1704 — Customer Lifecycle Stages
Circular: Emphasizing connections over clustering; Lesson 1318 — Network Layout Algorithms
City populations: A few megacities dwarf most towns; Lesson 190 — The Pareto Distribution: Heavy Tails and Power Laws
City sizes: A few megacities contain most urban population; Lesson 191 — Pareto Principle and the 80/20 Rule
Claim: "Mobile users convert at lower rates than desktop users"; Lesson 1946 — Supporting Your Claims with Evidence
clarity: (easy to understand), and **narrative** (answers "so what?; Lesson 1215 — Characteristics of Explanatory Visualizations Lesson 1973 — Report Review and Quality Checklist
Class attributes: represent **columns**; Lesson 1117 — What is an ORM and Why Use It?
Classic retention: User was active in that *exact* period; Lesson 1648 — Cohort Retention Rates Lesson 1654 — Classic vs Unbounded Retention
Clean experimentation: Test new packages without risking your system-wide installation; Lesson 2039 — Virtual Environments: Concept and Benefits
Clean working directory: When nothing needs attention; Lesson 1998 — Checking Repository Status
Clean, documented code: with clear README files; Lesson 2091 — Stage 7: Communication and Handoff
Cleaner pipelines: Data arrives pre-formatted; Lesson 1802 — Filtering During Read with dtype and Converters
Clear: "Achieve 95% recall on fraud cases while maintaining false positive rate below 2%"; Lesson 2094 — Defining Success Metrics Upfront
Clear dependencies: on packages, data sources, and environments; Lesson 1981 — What Makes a Report Reproducible?
Clear factorial design: (every combination of variables); Lesson 1482 — Control and Treatment Design
Clear labels: and titles; Lesson 1369 — Publication-Ready Plot Styling
Clear metrics: (not "satisfaction," but "NPS score"); Lesson 2093 — Translating Business Questions into Analytical Questions
Clear outputs before committing: Use "Restart & Clear Output" before staging your notebook.; Lesson 2030 — Version Control for Notebooks: Challenges and Solutions
Clear problem statement: What question did you answer?; Lesson 2141 — Building a Portfolio and Personal Brand
Click-through rates: in digital marketing (proportion of clicks); Lesson 184 — Beta Distribution: Bounded Between 0 and 1
Climate data: Policy impacts or environmental shifts; Lesson 1412 — What is Change-Point Detection?
Closeness centrality: How quickly a node can reach all others (the "efficient communicators"); Lesson 1320 — Network Metrics and Visual Analysis
Cloud data warehouses: (Snowflake, BigQuery, Redshift) providing scalable compute; Lesson 1821 — Hybrid Approaches and Modern Data Stacks
Cluster: or **multistage sampling** concentrates your effort geographically.; Lesson 243 — Choosing the Right Sampling Method Lesson 1481 — Unit of Randomization
Cluster randomization: Randomize by groups (e.; Lesson 1527 — Ignoring Network Effects
Cluster sampling: is a technique where you divide your population into groups (called **clusters**), randomly select some of those clusters, and then survey all or some members within the chosen clusters.; Lesson 237 — Cluster Sampling Lesson 243 — Choosing the Right Sampling Method
Clustered Data: Students within the same classroom, patients from the same hospital, or measurements from the same family are often more similar to each other than to observations from different clusters.; Lesson 381 — Independence Assumption and Its Violations Lesson 548 — Independence of Observations
Clustering: Different groups have noticeably different spreads; Lesson 559 — Detecting Heteroscedasticity (Non-Constant Variance)Lesson 1812 — Partitioning and Clustering Strategies
Clustering coefficient: measures how tightly a node's neighbors are connected to each other—like whether your friends also know each other.; Lesson 1320 — Network Metrics and Visual Analysis
clusters: ), randomly select some of those clusters, and then survey all or some members within the chosen clusters.; Lesson 237 — Cluster Sampling Lesson 584 — Correlation Matrices for Predictors Lesson 1179 — Identifying Missing Values Patterns Lesson 1189 — Detecting Nonlinear Relationships Lesson 1222 — Scatter Plots for Relationships
Clusters of high correlations: reveal groups of variables that measure similar underlying concepts.; Lesson 511 — Reading and Interpreting Correlation Matrices
Clusters or trends: Independence assumption might be violated; Lesson 556 — What Are Residuals and Why Plot Them?
Coarsen: Temporarily bin continuous variables into meaningful categories (e.; Lesson 1449 — Coarsened Exact Matching (CEM)
Coarsened Exact Matching: solves this through a clever three-step process:; Lesson 1449 — Coarsened Exact Matching (CEM)
Code: (scripts, notebooks, functions); Lesson 2082 — Choosing a License for Data Science Projects
Code and reproducibility: Lesson 1971 — Appendices and Technical Details
Code chunks: Sections of R code enclosed in special delimiters that execute and display results; Lesson 1983 — R Markdown for Dynamic Reports
Code clarity: `src/` contains reusable functions and scripts.; Lesson 2032 — Organizing Repository Structure for Data Science
Code contribution process: Should contributors fork your repo?; Lesson 2083 — Contributing Guidelines and Contact Information
Code debt: Copy-pasting notebook cells instead of writing reusable functions; Lesson 2131 — What is Technical Debt in Data Science?
Code Licenses: (your scripts and algorithms):; Lesson 2082 — Choosing a License for Data Science Projects
Code management: means tracking changes to your scripts and notebooks, usually with tools like version control systems.; Lesson 29 — Code and Environment Management
Code references: Scripts or notebook cells that performed each transformation; Lesson 2065 — Tracking Data Lineage
Code review: Share branches for review before merging into `main`; Lesson 2005 — What are Branches and Why Use Them?
Code review happens: Team members examine your changes, spot bugs, suggest improvements, and ensure standards are met; Lesson 2022 — Understanding Pull Requests
Code standards: What style guide do you follow (PEP 8)?; Lesson 2083 — Contributing Guidelines and Contact Information
Code versions: Git commit hashes, script versions, or package versions; Lesson 1988 — Embedding Data Lineage and Metadata
Coefficient of Variation: is your tool when comparing datasets with different units or scales (e.; Lesson 54 — When to Use Each Measure
Coefficient of Variation (CV): solves this by expressing variability as a *percentage of the mean*.; Lesson 53 — Coefficient of Variation
Coefficient p-values: Statistical significance of specific dummies shifts because you're testing different comparisons; Lesson 647 — Impact on Model Results and Reporting
Coefficient values: Each dummy variable coefficient represents the difference from the reference, so new reference = new differences; Lesson 647 — Impact on Model Results and Reporting
Coffee: → **Alertness** (coffee directly increases alertness); Lesson 1469 — Building a Simple Causal DAG
Cohen's d: for t-tests (difference between means in standard deviation units); Lesson 384 — What is Effect Size?
Coherence: Does the causal interpretation align with existing theory and evidence?; Lesson 498 — Bradford Hill Criteria for Causation Lesson 1563 — Sequential Updating with New Data
Cohort analysis: is a technique that divides users or customers into groups—called cohorts—based on a shared characteristic or experience within a defined time window.; Lesson 1644 — What is Cohort Analysis?Lesson 1661 — What is Customer Lifetime Value (LTV)?Lesson 1678 — What is Funnel Analysis?Lesson 1701 — What is Customer Segmentation?Lesson 1715 — Comparing Channel Performance
Cohort comparison: Use log-rank tests to compare retention across pricing tiers or customer segments; Lesson 838 — Subscription and Membership Duration Modeling
Cohort-based payback analysis: breaks down payback periods by customer segment (acquisition channel, geography, plan type, etc.; Lesson 1758 — Cohort-Based Payback Analysis
Coin flip: Sample space: {Heads, Tails}; Lesson 82 — Collectively Exhaustive Events
Collaborate: with domain experts or specialists; Lesson 34 — Recognizing Boundaries of Competence
Collaboration: Multiple team members can work with consistent datasets; Lesson 1871 — Why Version Control for Data?Lesson 1990 — What is Version Control and Why Git?Lesson 2005 — What are Branches and Why Use Them?Lesson 2047 — What is Dependency Management?Lesson 2062 — Why Data Source Documentation Matters Lesson 2074 — Notebooks vs Scripts: When to Use Each Lesson 2142 — Interviewing: Technical and Behavioral Prep
Collaboration actions: (weight: 0.; Lesson 1699 — Engagement Scoring Systems
Collaboration-friendly: Everyone knows where to find things.; Lesson 2032 — Organizing Repository Structure for Data Science
Collaborative fraud detection: across banks without sharing customer data; Lesson 1903 — Secure Multi-Party Computation
Collect more data: to increase sample size; Lesson 426 — Assumptions and Sample Size Requirements
Collectively exhaustive: Lesson 82 — Collectively Exhaustive Events Lesson 83 — Partitions of the Sample Space Lesson 89 — The Complement Rule
Collectively exhaustive events: are a group of events whose union contains *every possible outcome* in the sample space— nothing is left out.; Lesson 82 — Collectively Exhaustive Events
College attended: may proxy for race, class, and family wealth; Lesson 1883 — Protected Classes and Proxy Variables
Collibra: , **Alation**, and **Apache Atlas** maintain centralized inventories of your data assets.; Lesson 1164 — Tools for Lineage Tracking
collider: is a variable that sits at the convergence of two causal arrows.; Lesson 1432 — Colliders and Bad Controls Lesson 1468 — Introduction to Directed Acyclic Graphs (DAGs)Lesson 1471 — Mediators and Colliders Lesson 1473 — Conditioning on Colliders: Selection Bias Lesson 1476 — Common DAG Patterns and Pitfalls
Colliders: Where paths meet (X → C ← Y).; Lesson 1471 — Mediators and Colliders
Collinearity: makes models unstable and coefficients hard to interpret; Lesson 1197 — Identifying Variable Importance and Redundancy
Color: determines what color your line appears.; Lesson 1258 — Customizing Lines: Colors, Styles, and Markers Lesson 1297 — Font Properties and Text Styling Lesson 1341 — Data and Aesthetic Mappings Lesson 1364 — Customizing Text Elements
Color (hue): Different colors stand out immediately (red among blues); Lesson 1235 — Pre-Attentive Attributes Lesson 1310 — Point Maps and Scatter Plots on Maps
Color (intensity): Sequential scales for continuous variables (temperature, risk level); Lesson 1310 — Point Maps and Scatter Plots on Maps
Color blindness simulators: (like Coblis or Chrome DevTools) show your chart through the lens of deuteranopia, protanopia, or other color vision deficiencies; Lesson 1254 — Testing Visualizations for Accessibility
Color choices: Ensure colorblind-friendly palettes and grayscale compatibility.; Lesson 1369 — Publication-Ready Plot Styling
Color encoding: Use color to represent the third dimension on a 2D plot (like heatmaps); Lesson 1329 — Effective Use and Pitfalls of 3D Visualizations Lesson 1362 — When to Use Facets vs. Other Approaches
Color mapping: adds another dimension, using different hues or intensity to show groupings or continuous scales.; Lesson 1265 — Scatter Plots: Relationships Between Variables
Color saturation/density: (e.; Lesson 1232 — Perceptual Accuracy Hierarchy
Color Scales Matter Immensely: Lesson 1309 — Choropleth Maps: Basics and Best Practices
Color-coding: in scatter plots can reveal when different groups show different trends; Lesson 1195 — Interaction Effects Between Variables
ColorBrewer palettes: offer scientifically-designed color schemes for categorical, sequential, or diverging data:; Lesson 1368 — Color Scales and Palettes
Colors: can be specified multiple ways in Matplotlib:; Lesson 1272 — Colors, Markers, and Line Styles
column: The value you want to retrieve from a future row; Lesson 1025 — LEAD Function: Accessing Next Row Values Lesson 1358 — facet_grid() for Two Variables
Column charts: arrange categories along the horizontal axis with vertical bars extending upward.; Lesson 1219 — Bar Charts and Column Charts
Column Count Must Match: Both queries must return the same number of columns; Lesson 999 — UNION: Combining Distinct Results
Column Names: The result uses column names from the first SELECT; Lesson 999 — UNION: Combining Distinct Results Lesson 1151 — Schema Validation
Column order: (optional, if order matters for your workflow); Lesson 1151 — Schema Validation
Column presence: All required columns exist; Lesson 1151 — Schema Validation
Column proportion: 50/70 = 0.; Lesson 98 — Conditional Probability with Tables
Column proportions: divide each cell by its column total.; Lesson 98 — Conditional Probability with Tables
Column Total: | 110 | 90 | 200 |; Lesson 423 — Contingency Tables and Expected Frequencies
Column Types: map Python objects to SQL data types.; Lesson 1121 — Column Types, Constraints, and Relationships
column_name: Which column's value to retrieve; Lesson 1023 — Introduction to Window Functions: LAG and LEAD Lesson 1024 — LAG Function: Accessing Previous Row Values
columns: Lesson 1117 — What is an ORM and Why Use It?Lesson 1647 — Building a Cohort Table
Columns (Fields/Attributes): Each column represents a specific property or feature.; Lesson 843 — Relational Database Concepts
Combination rule: Require both high probability (e.; Lesson 1585 — Early Stopping in Bayesian Tests
Combine: Aggregate results back together; Lesson 1768 — Data Parallelism Fundamentals
Combine adjacent categories: to increase expected counts; Lesson 419 — Assumptions and Minimum Expected Frequencies
Combine all observations: from both groups; Lesson 393 — Mann-Whitney U Test (Wilcoxon Rank-Sum)
Combine all rows: from both queries; Lesson 998 — Introduction to Set Operations
Combine categories: if logically defensible; Lesson 426 — Assumptions and Sample Size Requirements
Combine wisely: `WHERE .; Lesson 880 — Performance Considerations and Best Practices
Combined: `E(aX + b) = a · E(X) + b`; Lesson 149 — Properties of Expectation and Variance
Comfortable zone: Datasets under 1-2 GB work smoothly in Pandas on typical machines; Lesson 1783 — Data Size Thresholds: When Pandas Isn't Enough
Command: The script or code to run; Lesson 1874 — DVC Pipelines and Stages
Command-line tools: that accept parameters and integrate with schedulers; Lesson 2074 — Notebooks vs Scripts: When to Use Each
commit: it to save your work.; Lesson 1112 — Starting and Committing Transactions Lesson 1995 — Committing Changes with git commit
Commit hash: A unique 40-character identifier (like `a3f2b8c.; Lesson 1999 — Viewing Commit History
Commit the merge: with `git commit` (Git will provide a default merge commit message); Lesson 2011 — Resolving Merge Conflicts
Commit thoughtfully: Make atomic commits after completing logical units of work, not after every cell execution.; Lesson 2030 — Version Control for Notebooks: Challenges and Solutions
Common data sources: and their quirks in that sector; Lesson 2145 — Transitioning Between Industries and Domains
Common Pattern: Lesson 977 — Correlated Subqueries in WHERE Clauses Lesson 978 — Correlated Subqueries in SELECT Clauses
Common patterns: Lesson 1017 — Moving Averages with Window Frames Lesson 1033 — CASE with Aggregation Functions
Common Table Expression (CTE): is a named temporary result set that you define at the beginning of a query using the `WITH` clause.; Lesson 989 — What are Common Table Expressions (CTEs)?
Common Time-to-Value metrics: Lesson 1697 — Time-to-Value and Activation Metrics
Common use cases: Lesson 1838 — XComs and Passing Data Between Tasks
Common violations: Lesson 553 — Exogeneity: X Must Be Independent of Errors
Communicate timeline risks early: If you discover the analysis will take longer than expected, flag it immediately.; Lesson 2099 — Aligning with Business Timelines and Decision Points
Communicating results: with stakeholders who benefit from narrative + code + visuals in one document; Lesson 2074 — Notebooks vs Scripts: When to Use Each
Communication: You must explain complex findings to people who don't speak "data.; Lesson 7 — The Data Science Skill Stack
Communication bridge: Owner translates technical nuances for business stakeholders; Lesson 1619 — What is Metric Ownership?
Communication Protocols: Lesson 1643 — Building Attribution Frameworks
Community channels: Link to Slack, Discord, or discussion forums; Lesson 2083 — Contributing Guidelines and Contact Information
Community detection: algorithms group nodes into clusters based on connection patterns, revealing natural subdivisions in the network.; Lesson 1320 — Network Metrics and Visual Analysis
Company Level: Your North Star Metric becomes the top-level objective.; Lesson 1608 — Connecting North Star Metrics to OKRs
Compare: Does P(A and B) equal P(A) × P(B)?; Lesson 102 — Testing for Independence Lesson 395 — Permutation Tests for Means and Beyond Lesson 1185 — Grouped Summary Statistics Lesson 1353 — Position Adjustments: Dodge, Stack, and Jitter Lesson 1590 — The Metropolis-Hastings Algorithm
Compare across cohorts: to identify trends, improvements, or degradation; Lesson 1664 — Cohort-Based LTV Calculation
Compare across tables: Filter rows in one table based on criteria from another; Lesson 959 — Introduction to Subqueries in WHERE
Compare apples to apples: Compare January sales to July sales fairly; Lesson 748 — Seasonally Adjusted Data
Compare apples to oranges: Compare test scores from different exams with different scales; Lesson 195 — Z-Score Definition and Interpretation
Compare cohorts instantly: Did the January cohort retain better than February's?; Lesson 1656 — Visualizing Retention Curves
Compare costs: Which error would cause more harm?; Lesson 334 — Setting Alpha: Choosing Your Significance Level
Compare effects: across strata—if the relationship disappears or reverses, the confounder was key; Lesson 1430 — Controlling for Confounders: Stratification
Compare expected values: across alternatives; Lesson 152 — Decision Making Under Uncertainty
Compare nested models: Use partial F-tests when adding/removing specific variables; Lesson 633 — Practical Model Selection Strategy
Compare posteriors: to see which hypothesis is most supported by the evidence.; Lesson 113 — Multiple Hypotheses and Total Probability Lesson 1572 — Sensitivity Analysis and Prior Robustness
Compare stratified analyses: Calculate effects within each confounder level—are they consistent or wildly different?; Lesson 1429 — Identifying Confounders in Practice
Compare the smaller sum: to critical values or compute a p-value; Lesson 392 — Wilcoxon Signed-Rank Test
Compare visually and numerically: Do summary statistics (mean, variance, extreme values) of simulated data match your observed data?; Lesson 1596 — Posterior Predictive Checks and Model Comparison
Compare your observed statistic: to this distribution to get a p-value; Lesson 396 — Bootstrap Hypothesis Testing
Comparing datasets: Detect records in a source system missing from a target; Lesson 1002 — EXCEPT: Finding Differences
Comparing groups: Use SE to gauge if observed differences are substantial; Lesson 265 — Using Standard Error in Practice
Comparing means across categories: (like average sales by quarter); Lesson 1288 — Point Plots for Trend Visualization
Comparing metrics: Find records where one value exceeds another; Lesson 947 — Self-Joins for Comparisons Within a Table
Comparing models: requires matching units (you can't directly compare slopes from different scales); Lesson 525 — Units and Scale in Interpretation
Comparing multiple curves: (different cohorts or product versions) reveals which changes improved stickiness; Lesson 1653 — What are Retention Curves?
Comparing Values Within Rows: Lesson 948 — Self-Joins with Inequality Conditions
Comparisons: between a row and an aggregate (e.; Lesson 1005 — Introduction to Window Functions
compatible data types: .; Lesson 998 — Introduction to Set Operations Lesson 999 — UNION: Combining Distinct Results Lesson 1001 — INTERSECT: Finding Common Rows Lesson 1003 — Set Operation Requirements and Rules
Competence: Lesson 1913 — Elements of Valid Consent
Complement (A ᶜ or A'): "**not** A"; Lesson 80 — Set Operations: Union, Intersection, and Complement
Complement Rule: gives you a shortcut:; Lesson 89 — The Complement Rule
Complementary events: save work when one tail is shorter.; Lesson 130 — Calculating Binomial Probabilities
Complementary probabilities: Using P(A') = 1 - P(A) for efficiency; Lesson 130 — Calculating Binomial Probabilities
Complete rows: If entire rows are missing, perhaps certain groups weren't measured; Lesson 1179 — Identifying Missing Values Patterns
completeness: , **consistency**, **timeliness**, **validity**, and **uniqueness**.; Lesson 1863 — Data Quality Dimensions Lesson 1865 — Data Quality Checks in Pipelines Lesson 1867 — Data Profiling and Monitoring Lesson 1869 — Data Quality Metrics and SLAs Lesson 1973 — Report Review and Quality Checklist Lesson 2086 — Stage 2: Data Acquisition and Assessment
Completeness checks: are your detective work for finding exactly where data is missing, how much is missing, and whether the missingness follows patterns.; Lesson 1153 — Completeness Checks
Completion Rate: Percentage of content finished by viewers.; Lesson 1635 — Media and Content Metrics: Watch Time and Content Performance
Complex aggregations: Multiple groupBy operations with window functions over large groups; Lesson 1784 — Computation Complexity: Beyond Data Size
Complex constraints: Stan's type system handles parameter boundaries and transformations elegantly; Lesson 1595 — Stan: High-Performance Bayesian Inference
Complex layouts: Use `constrained_layout=True`; Lesson 1277 — Adjusting Subplot Spacing and Layout
Complex models: Multi-parameter models where conjugacy breaks down anyway; Lesson 1556 — Choosing Between Conjugate and Non-Conjugate Priors
Complex queries: When you need multiple derived tables or nested subqueries; Lesson 974 — When to Use FROM Subqueries vs CTEs
Complex trends: that aren't straight lines; Lesson 745 — STL Decomposition (Seasonal-Trend Loess)
Complexity costs: Adding that tenth feature interaction makes your model unmaintainable; Lesson 2116 — Diminishing Returns and the 80/20 Rule
Complexity penalty: A term that increases with the number of parameters (k); Lesson 629 — Akaike Information Criterion (AIC)
Compliance: Meet regulations like GDPR while still enabling data-driven work; Lesson 1901 — Synthetic Data Generation
Compliance and Legal Teams: care about:; Lesson 1951 — Understanding Stakeholder Priorities and Constraints
Composite keys: Multiple columns together, like `(order_id, product_id)`; Lesson 1048 — What Are Primary Keys?
Compositional changes: occur when the *makeup* of your treatment or control groups changes over time.; Lesson 1458 — Common DiD Pitfalls
Compounding growth drag: Even with strong acquisition, high churn prevents the compounding effects of a growing base; Lesson 1670 — What is Churn and Why It Matters
Comprehension: Lesson 1913 — Elements of Valid Consent
Compression: Parquet (best) > Feather > CSV (gzip) > JSON > Excel; Lesson 1133 — Performance Considerations Across Formats Lesson 1811 — Columnar Storage and Query Optimization
Computational complexity: the number and cost of operations you perform—can make processing even modest-sized datasets painfully slow on a single machine.; Lesson 1784 — Computation Complexity: Beyond Data Size
Computational complexity increases: Different methods (Type I, II, III sums of squares) can give different results; Lesson 468 — Balanced vs Unbalanced Designs
Computational efficiency: Process data in manageable chunks; Lesson 1538 — Updating Beliefs with Sequential Data
Computational resources: Can you process millions of rows or just thousands?; Lesson 1169 — Clarifying Assumptions and Constraints
Computational simplicity: No need for sampling algorithms or numerical integration; Lesson 1555 — Advantages and Limitations of Conjugate Priors
Computationally efficient: You only need the last forecast and the new observation; Lesson 757 — Introduction to Exponential Smoothing
Compute baseline conversion probability: The chance a random user converts given the current channel mix; Lesson 1733 — Markov Chain Attribution Models
Compute means: Find x̄ (mean of x values) and ȳ (mean of y values); Lesson 522 — Implementing Least Squares from Scratch
Compute on encrypted values: using special arithmetic that preserves secrecy; Lesson 1903 — Secure Multi-Party Computation
Compute summaries: `stat_summary()` calculates means, medians, or custom functions; Lesson 1352 — Statistical Transformations with stat_* Layers
Computer Science: builds software systems and algorithms.; Lesson 1 — Defining Data Science
Computer Science & Programming: Lesson 1 — Defining Data Science
CONCAT: glues strings together; Lesson 1044 — String Manipulation: CONCAT, LENGTH, and SUBSTRING
Concentration of values: (wider sections = more data points); Lesson 1286 — Violin Plots and Distribution Shape
Conclusion: The die doesn't appear to follow a uniform distribution; it's likely biased; Lesson 420 — Interpreting Chi-Squared Test Results Lesson 733 — Using ACF and PACF Together
Conclusion cells: Summarize findings and recommendations; Lesson 1982 — Literate Programming with Notebooks
Conditional dependencies: Some tools support dynamic dependency creation; Lesson 1843 — Declaring Dependencies in Orchestration Tools
Conditional distributions: (e.; Lesson 1187 — Contingency Tables and Cross-Tabulations Lesson 1197 — Identifying Variable Importance and Redundancy
Conditional probability: captures exactly this: the probability of event A happening when we *already know* event B has occurred.; Lesson 92 — Definition and Notation of Conditional Probability Lesson 96 — Conditional Probability in Tree Diagrams
Conditional values: Different logic per row based on related data; Lesson 967 — Subqueries in the SELECT Clause
Confidence: Higher confidence (e.; Lesson 295 — Trade-offs: Precision, Confidence, and Cost Lesson 1158 — Automated Validation Frameworks
Confidence bands: Usually shown as blue shaded regions or dashed lines (typically at ±2/√n).; Lesson 722 — ACF Plots and Interpretation
Confidence interval: for the effect size (shows uncertainty); Lesson 389 — Reporting Effect Sizes in Practice Lesson 412 — Confidence Interval for Difference Lesson 607 — Confidence Intervals for Coefficients Lesson 621 — Interpreting t-Statistics and Confidence Intervals
Confidence intervals: may be too narrow or too wide; Lesson 202 — Why Test for Normality?Lesson 227 — Practical Applications of the CLT Lesson 300 — Bootstrap Distribution of a Statistic Lesson 462 — Interpreting and Reporting Post-Hoc Results Lesson 625 — Practical Workflow: Testing and Interpreting Predictors Lesson 730 — Interpreting PACF Plots Lesson 800 — Generating Forecasts with SARIMA Lesson 815 — Survival Curve Plots and Interpretation (+7 more)
Confidence level: Higher confidence (e.; Lesson 271 — Margin of Error Lesson 289 — Sample Size Requirements for Difference Intervals Lesson 292 — Sample Size for Estimating a Mean Lesson 294 — Margin of Error and Its Components
Confirm Long-Term Trends: Lesson 1598 — Characteristics of Lagging Indicators
Confirmation bias: Analyzing data only until it supports a desired conclusion; Lesson 1926 — The Honest Broker Role
Conflict (insight): What surprising or important pattern did you discover?; Lesson 1933 — The Power of Narrative in Data Communication
confounded: .; Lesson 1526 — Selection Bias in Opt-In Tests Lesson 1531 — Interference from Concurrent Tests
confounder: appears as a node with arrows pointing to both treatment and outcome; Lesson 1468 — Introduction to Directed Acyclic Graphs (DAGs)Lesson 1470 — Confounders in DAGs Lesson 1476 — Common DAG Patterns and Pitfalls
confounding variable: (or confounder) is a third variable that influences both your variables of interest, creating a spurious (fake) correlation between them.; Lesson 509 — Confounding Variables and Control Lesson 1194 — Simpson's Paradox and Confounding Lesson 1423 — The Third Variable Problem Lesson 1426 — Real-World Examples: Correlation vs Causation Lesson 1427 — What is a Confounding Variable?
Confounding variables: A hidden third factor causes both (like temperature above); Lesson 493 — The Fundamental Difference: Association vs Cause-and-Effect Lesson 495 — Confounding Variables Lesson 510 — Correlation Matrices: Construction and Display Lesson 1201 — Domain Knowledge as a Hypothesis Source Lesson 1487 — Simple Random Assignment
Confusing logic: Code must constantly check `type` to interpret what's valid; Lesson 1148 — Handling Multiple Types in One Table
Confusion: New team members (or your future self) waste time trying to understand if old experiments are still relevant; Lesson 2135 — Dead Experimental Code and Feature Sprawl
Confusion matrices: See model prediction patterns; Lesson 1224 — Heatmaps and Correlation Matrices
conjugate prior: is a prior distribution that, when combined with a specific likelihood function, produces a posterior distribution from the same probability family as the prior.; Lesson 1550 — What Are Conjugate Priors?Lesson 1551 — Beta-Binomial Conjugacy
Connecting to objectives: Every insight should tie back to the problem you scoped at the start.; Lesson 2090 — Stage 6: Interpretation and Insight Generation
Connection pooling: is like a parking lot for database connections.; Lesson 1092 — Connection Pooling Basics
Connection to normal: If *ln(X)* ~ Normal(μ, σ²), then *X* ~ Log-Normal; Lesson 178 — Log-Normal Distribution: Definition and Properties
Cons: Stale data between refreshes, storage overhead, refresh time on large datasets; Lesson 1076 — Materialized Views and Summary Tables
Consecutive rankings: for categorization (like price tiers: budget, mid-range, premium); Lesson 1009 — DENSE_RANK(): Ranking Without Gaps
Consent: Lesson 1906 — Legal Bases for Processing Personal Data
Conservative Estimates: Lesson 297 — Handling Unknown Population Parameters
Consider `nbdime` or `jupytext`: Tools like `nbdime` provide notebook-aware diffs.; Lesson 2030 — Version Control for Notebooks: Challenges and Solutions
Consider accessibility: Approximately 8% of men and 0.; Lesson 1961 — Color as Communication Tool
Consider adversarial users: Who benefits from gaming your system?; Lesson 1924 — Red Team Thinking for Data Scientists
Consider d=2 cautiously: If d=1 didn't work, try second-order differencing (differencing the already-differenced series).; Lesson 778 — Determining Differencing Order (d)
Consider JOINs instead: Correlated subqueries can often be rewritten as LEFT JOINs with GROUP BY, executing more efficiently; Lesson 969 — Performance Considerations for SELECT Subqueries
Consider JOINs instead when: You have many conditions (10+) or conditions change frequently.; Lesson 1037 — CASE Best Practices and Performance
Consider ramp-up periods: Exclude the first few days from analysis; Lesson 1525 — Novelty and Primacy Effects
Consider robustness: With n > 30-40, t-tests handle mild violations well (Central Limit Theorem); Lesson 398 — Choosing Between Parametric and Non-Parametric Tests
Consider simpler alternatives: regression with strong domain priors, expert-designed scoring systems, or rule-based logic.; Lesson 2124 — Insufficient or Low-Quality Data
Consider the confidence interval: width; Lesson 609 — Practical vs Statistical Significance
Consider Transformation: Lesson 579 — What to Do with Influential Points
Consider UUID/GUID: for distributed systems where different databases generate records independently; Lesson 1050 — Choosing Effective Primary Keys
Consider variance: to assess risk; Lesson 152 — Decision Making Under Uncertainty
Consider WHERE filters when: You only need to include/exclude rows, not transform values.; Lesson 1037 — CASE Best Practices and Performance
Consistency: Has the relationship been found repeatedly, across different studies, populations, and settings?; Lesson 498 — Bradford Hill Criteria for Causation Lesson 1110 — What Are Database Transactions?Lesson 1158 — Automated Validation Frameworks Lesson 1822 — What is a Data Pipeline?Lesson 1863 — Data Quality Dimensions Lesson 1865 — Data Quality Checks in Pipelines Lesson 1986 — Automated Report Generation Lesson 2059 — Seeds in Train-Test Splits
Consistency risks: if updates fail partially; Lesson 1071 — When to Denormalize: Performance Trade-offs
Consistency with benchmarks: Does your entire interval fall in the "large effect" range, or does it span from "small" to "large"?; Lesson 387 — Confidence Intervals for Effect Sizes
Consistent analysis syntax: Functions like `groupby()`, `pivot_table()`, and aggregation operations work the same way across different datasets.; Lesson 1149 — Benefits of Tidy Data for Downstream Work
Consistent spread: The scatter shouldn't fan out or compress at one end; Lesson 480 — Scatterplots and Visual Assessment
constant: across all trials; Lesson 126 — From Bernoulli to Binomial: Multiple Trials Lesson 648 — What are Interaction Terms?
Constant autocorrelation structure: the relationship between observations at different time lags remains stable; Lesson 712 — What is Stationarity?
Constant mean: the average value doesn't drift up or down; Lesson 712 — What is Stationarity?
constant over time: (proportional hazards); Lesson 823 — Log-Rank Test vs Other Tests Lesson 825 — What is the Cox Proportional Hazards Model?
Constant p: Same probability throughout; Lesson 131 — Real-World Applications of Binomial Distributions
Constant variance: the spread or volatility stays the same; Lesson 712 — What is Stationarity?
Constant variance (homoscedasticity): Do residuals spread evenly across all predicted values, or do they fan out or compress?; Lesson 544 — The Role of Residuals in Diagnostics
Constant-width seasonal swings: → additive; Lesson 710 — Additive vs Multiplicative Models
Constraints: Time limits, budget, available data, ethical considerations; Lesson 10 — Problem Definition and Scoping Lesson 1121 — Column Types, Constraints, and Relationships Lesson 1151 — Schema Validation
Consultation: involving your Data Protection Officer and potentially data subjects; Lesson 1910 — Data Protection Impact Assessments (DPIAs)
Consume massive memory: your database must store or stream millions of rows; Lesson 943 — CROSS JOIN Results: Size and Structure
Consume memory: holding all unique combinations; Lesson 911 — Performance Considerations with Multiple Groups
Consumer Mobile Apps: Lesson 1657 — Day-1, Day-7, Day-30 Benchmarks
Contact information: Who to reach with questions; Lesson 1989 — Best Practices for Sharing Reproducible Reports Lesson 2063 — Essential Metadata to Capture Lesson 2091 — Stage 7: Communication and Handoff
Contact/Contribution: Who maintains this and how to get involved; Lesson 2077 — The Purpose and Anatomy of a Good README
Container tools: that package code *and* environment together; Lesson 29 — Code and Environment Management
Content Acquisition Cost (CAC): Total spend (licensing or production) divided by content hours.; Lesson 1635 — Media and Content Metrics: Watch Time and Content Performance
Content created: (weight: 0.; Lesson 1699 — Engagement Scoring Systems
Content Library Depth: Number of titles and hours of available content.; Lesson 1635 — Media and Content Metrics: Watch Time and Content Performance
Content marketing: (blog posts, videos, podcasts); Lesson 1711 — What Are Acquisition Channels?
Content Platform: Discover Content → Click → Watch/Read → Like/Share; Lesson 1678 — What is Funnel Analysis?
Content platforms: Account creation or premium upgrade; Lesson 1686 — Defining Conversions and Conversion Rate
Context: "Is this sample mean different from a known population mean?; Lesson 315 — Common Test Statistics: Z, t, Chi-Square, and F Lesson 342 — Alpha Level Trade-offs Lesson 1247 — The Ethics of Visualization Design
Context expertise: Owner knows when the metric is actionable vs.; Lesson 1619 — What is Metric Ownership?
Context matters: Remember why you're testing.; Lesson 210 — Combining Visual and Statistical Methods Lesson 1659 — Comparing Retention Across Cohorts
Context-aware metrics: Lesson 1691 — Mobile vs Desktop Conversion Analysis
Contextual understanding: Some intersections carry unique historical disadvantages that single-attribute analysis misses entirely; Lesson 1893 — Intersectionality in Fairness
contingency table: (rows = one variable, columns = another); Lesson 422 — Introduction to Chi-Squared Test of Independence Lesson 423 — Contingency Tables and Expected Frequencies Lesson 1187 — Contingency Tables and Cross-Tabulations
Continue collecting data: Lesson 1511 — Sequential Probability Ratio Test (SPRT)
Continue the rebase: Run `git rebase --continue` to move to the next commit; Lesson 2018 — Resolving Conflicts During Rebase
Continue with Warnings: Lesson 1866 — Handling Failed Quality Checks
Continuity: The eye naturally follows smooth, continuous paths.; Lesson 1236 — Gestalt Principles in Visualization
Continuity correction: For small counts (b + c < 25), use the corrected formula: χ² = (|b - c| - 1)² / (b + c); Lesson 436 — Conducting McNemar's Test
Continuous (water): You might have 250ml, or 250.; Lesson 18 — Numerical Variables: Discrete and Continuous
Continuous data: mapping numeric ranges to positions or gradients; Lesson 1344 — Scales and Coordinate Systems
Continuous monitoring required: IoT sensors tracking equipment failures need instant alerts; Lesson 1788 — Streaming Data and Real-Time Requirements
Continuous numerical data: represents *measurements* that can take any value within a range, including decimals.; Lesson 18 — Numerical Variables: Discrete and Continuous
continuous positive values: that tend to be skewed rather than symmetric.; Lesson 183 — Applications of the Gamma Distribution Lesson 678 — Choosing the Right Link Function
Continuous predictors: (like age, blood pressure, or income) take numerical values along a scale, while **categorical predictors** (like treatment group, gender, or risk category) represent distinct groups.; Lesson 829 — Continuous and Categorical Predictors
Continuous relationships: Use `geom_point` or `geom_line`; Lesson 1342 — Geometric Objects (geoms)
Continuous unbounded data: The **identity link** (standard linear regression) is appropriate.; Lesson 678 — Choosing the Right Link Function
Contour plots: Display 3D surfaces as 2D contour lines, like topographic maps; Lesson 1329 — Effective Use and Pitfalls of 3D Visualizations
Contract: Lesson 1906 — Legal Bases for Processing Personal Data
Contracting funnel: The opposite—wide on the left, narrow on the right.; Lesson 559 — Detecting Heteroscedasticity (Non-Constant Variance)
Contradicts defined success metrics: (e.; Lesson 2107 — Saying No and Pushing Back Constructively
Contrast checkers: verify that your text and visual elements meet minimum visibility standards; Lesson 1254 — Testing Visualizations for Accessibility
control: (version A—the current state or baseline).; Lesson 1477 — Core Principles of A/B Testing Lesson 1482 — Control and Treatment Design
Control backfills: Re-running a task may require re-running its entire downstream chain; Lesson 1841 — Upstream and Downstream Dependencies
Control charts: for process stability without strong seasonality; Lesson 1411 — Applications and Limitations
Control for confounders: Isolate true relationships from spurious ones; Lesson 1190 — Introduction to Multivariate Analysis
Control for confounding variables: you learned about in partial correlation; Lesson 595 — From Simple to Multiple Linear Regression
control group: or **standard treatment** as reference.; Lesson 644 — Choosing a Reference Category Lesson 1435 — What is a Randomized Controlled Trial?Lesson 1641 — Isolating Effects with Control Groups Lesson 1677 — Measuring Churn Reduction Impact Lesson 1688 — A/B Testing for Conversion Optimization
Control group, after intervention: Lesson 1452 — The Difference-in-Differences Setup
Control group, before intervention: (baseline); Lesson 1452 — The Difference-in-Differences Setup
Control Limits: are the "voice of the process.; Lesson 1400 — Control Limits vs Specification Limits
Control or Baseline Group: Lesson 644 — Choosing a Reference Category
controlled experiment: solves this problem through **randomization**.; Lesson 499 — Why Controlled Experiments Are Needed Lesson 1477 — Core Principles of A/B Testing
Controlling deletes: Cascading actions (DELETE and UPDATE) you learned about help maintain integrity when parent records change; Lesson 1055 — What is Referential Integrity?
Controls: Account for confounding variables (income, membership duration); Lesson 1204 — From Hypothesis to Analysis Plan
Controls for attention effects: Users still see *something* in the ad slot; Lesson 1747 — Ghost Ads and PSA Tests
Convenience: or **quota sampling** may be pragmatic (but acknowledge the bias risk).; Lesson 243 — Choosing the Right Sampling Method
Convenience sampling gone wrong: Surveying only people easy to reach (like students in your class) when you want to understand all adults.; Lesson 244 — Selection Bias and Its Causes
Convenience wins: Metrics like clicks, page views, or session duration give fast feedback.; Lesson 1530 — Mismatched Metrics and Goals
Convention: Most SQL developers write keywords in UPPERCASE to distinguish them from table/column names, but lowercase works equally well.; Lesson 847 — Basic SQL Syntax Rules
conversion: is any desired action a user completes that moves them closer to delivering value to your business.; Lesson 1686 — Defining Conversions and Conversion Rate Lesson 1690 — Landing Page and CTA Optimization
Conversion probability curves: What percentage converts by day 7, 30, or 90?; Lesson 839 — Time-to-Conversion in Marketing Funnels
Conversion Rate: Percentage of visitors who complete a desired action; Lesson 1516 — Business Metrics: Definition and Examples Lesson 1613 — Raw Counts vs. Rates and Ratios Lesson 1625 — Cross-Functional Metric Dependencies Lesson 1680 — Measuring Drop-off and Conversion Rates Lesson 1714 — Channel-Level Metrics
conversion rates: (Did they stay?; Lesson 1676 — Win-Back and Retention Strategies Lesson 1723 — Comparing Single-Touch Models
Convert between zones: Transform a timestamp from one timezone to another (e.; Lesson 1042 — Working with Timestamps and Time Zones
Cook's Distance: .; Lesson 578 — Visualizing Leverage and Influence Lesson 587 — Identifying Outliers in Regression Context Lesson 589 — Deciding Whether to Remove Outliers
Cookie banners: are designed to extract "consent" through friction and confusion.; Lesson 1914 — Consent in Digital Contexts
Cookiecutter: is a command-line tool that creates projects from templates.; Lesson 2076 — Code Organization Templates and Cookiecutter
Cookiecutter Data Science: , which implements best practices you've already learned: separating raw from processed data, organizing notebooks, keeping configuration separate, and more.; Lesson 2076 — Code Organization Templates and Cookiecutter
Coordinated disclosure: If external disclosure is needed, work with security/ethics experts to time and frame it appropriately; Lesson 1925 — Mitigation Strategies and Responsible Disclosure
Coordinates (coord): The space where data is plotted (Cartesian, polar, map projections); Lesson 1340 — The Seven Layers of Grammar
Coordinating dependencies: some tasks must wait for others (e.; Lesson 1769 — Task Parallelism and Work Distribution
Copy Elements: Lesson 1690 — Landing Page and CTA Optimization
Copyleft licenses: (like GPL, AGPL) require that derivative works also be open source under the same license.; Lesson 2081 — Understanding Open Source Licenses
Core: is the foundation level, and the **ORM** (Object-Relational Mapper) is built on top of it.; Lesson 1118 — SQLAlchemy Core vs ORM
Core feature usage: (weight: 0.; Lesson 1699 — Engagement Scoring Systems
Correct: "We are 95% confident the population proportion lies between 0.; Lesson 281 — Interpreting Proportion Confidence Intervals
Correct interpretation: "Being sick causes people to go to the hospital.; Lesson 496 — Reverse Causality
Correct period: Algorithm properly separates normal seasonal peaks from true anomalies; Lesson 1409 — Setting Detection Parameters
Correctly explaining results: "NYC homes cost $15k more than Boston homes" (not "NYC homes cost $15k"); Lesson 643 — Interpreting Coefficients Relative to Reference
Correctness first: Does the logic actually work?; Lesson 2024 — Code Review Best Practices
Correlated approach: Lesson 980 — Converting Correlated to Non-Correlated Subqueries
Correlated SELECT subqueries: run repeatedly:; Lesson 969 — Performance Considerations for SELECT Subqueries
Correlated subqueries: reference columns from the outer query.; Lesson 968 — Correlated vs Non-Correlated Subqueries in SELECT
correlated subquery: references the outer query and must run for *every row* being evaluated.; Lesson 966 — Performance Considerations for WHERE Subqueries Lesson 975 — What is a Correlated Subquery?
Correlated vs. Uncorrelated Subqueries: Lesson 966 — Performance Considerations for WHERE Subqueries
Correlation: means two variables move together in a statistically observable pattern.; Lesson 1420 — Defining Correlation and Causation
Correlation Analysis: Lesson 1602 — Identifying Leading Indicators for Your Metrics Lesson 1883 — Protected Classes and Proxy Variables
correlation coefficient: ?; Lesson 306 — Bootstrap for Non-Standard Problems Lesson 476 — What is Pearson Correlation?Lesson 719 — What is Autocorrelation?Lesson 721 — Computing ACF Values
Correlation coefficient (r): for relationships between variables; Lesson 384 — What is Effect Size?
Correlation condition: (`inner.; Lesson 976 — Basic Correlated Subquery Syntax
Correlation IDs: to trace a record through multiple systems; Lesson 1857 — Logging Best Practices
Correlation matrices and heatmaps: (Lesson 1192) reveal pairs of highly correlated variables—strong candidates for redundancy; Lesson 1197 — Identifying Variable Importance and Redundancy
correlation matrix: solves this by computing correlations between *every pair* of variables and organizing them into a grid.; Lesson 510 — Correlation Matrices: Construction and Display Lesson 513 — Applications: Feature Selection and Multicollinearity Lesson 1192 — Correlation Matrices and Heatmaps
Correlation matrix examination: Drop one variable from pairs with correlation > 0.; Lesson 585 — Remedies: Variable Selection
Correlations: between two variables; Lesson 306 — Bootstrap for Non-Standard Problems Lesson 1191 — Scatter Plot Matrices and Pairplots
cost: , and **feasibility**.; Lesson 243 — Choosing the Right Sampling Method Lesson 295 — Trade-offs: Precision, Confidence, and Cost Lesson 1677 — Measuring Churn Reduction Impact
Cost efficiency: How much you spend to operate; Lesson 1516 — Business Metrics: Definition and Examples
Cost forecasting: Knowing the hazard function helps finance teams predict warranty claim volumes and budget accordingly.; Lesson 837 — Product Warranty and Failure Analysis
Cost less to maintain: No retraining pipelines, drift monitoring, or GPU compute; Lesson 2128 — Data Distribution Shifts Frequently
Cost numbers: Estimates of work required.; Lesson 1084 — Reading and Interpreting Query Execution Plans
Cost of false positives: vs false negatives; Lesson 324 — Common Significance Levels: 0.05, 0.01, and 0.10
Cost Per Acquisition (CPA): by targeting cheaper traffic sources, they might inadvertently decrease **Average Order Value (AOV)** that the sales team tracks.; Lesson 1625 — Cross-Functional Metric Dependencies Lesson 1714 — Channel-Level Metrics Lesson 1715 — Comparing Channel Performance
Cost per patient encounter: aggregates all expenses divided by patient visits or admissions—a critical profitability metric.; Lesson 1633 — Healthcare Metrics: Patient Outcomes and Operational Efficiency
Cost Side: Lesson 2118 — Cost-Benefit Analysis for Continued Work
Cost-benefit analysis: Is the effect large enough to justify intervention costs?; Lesson 386 — Effect Size Interpretation Guidelines Lesson 2126 — Cost and Complexity Exceed Benefit
Cost-effective: Reduces travel and administrative costs by concentrating data collection in selected clusters; Lesson 238 — Multistage Sampling
Costs: Lesson 342 — Alpha Level Trade-offs
COUNT: , **SUM**, **AVG**, **MIN**, and **MAX**—together with **GROUP BY** to create rich summaries of grouped data.; Lesson 892 — GROUP BY with Different Aggregate Functions
Count data: (number of events, purchases, visits); Lesson 213 — Square Root and Cube Root Transformations Lesson 678 — Choosing the Right Link Function Lesson 689 — When to Use Poisson Regression Lesson 690 — The Poisson Distribution as a GLM Lesson 1552 — Gamma-Poisson Conjugacy
Count the significant spikes: before the cut-off; Lesson 777 — Identifying MA Order (q) Using ACF
Count the signs: Ignore zeros; count how many differences are positive (+) and how many are negative (−); Lesson 391 — The Sign Test for Medians
COUNT(column_name): counts only the rows where that specific column has a **non-null value**.; Lesson 882 — COUNT: Counting Rows and Non-Null Values Lesson 894 — NULL Values in GROUP BY
COUNT(right_table_column): ignores NULLs → correct for "how many matches"; Lesson 933 — Aggregating with LEFT JOINs
Counter-example: Drawing two cards from a deck *without replacement* creates dependence.; Lesson 101 — Defining Statistical Independence
Counter-metrics: and **guardrails** are defensive metrics designed to catch these problems before they damage your business.; Lesson 1624 — Counter-Metrics and Guardrails Lesson 1635 — Media and Content Metrics: Watch Time and Content Performance
Counting distinct performance levels: rather than absolute positions; Lesson 1009 — DENSE_RANK(): Ranking Without Gaps
Counting unique entities: How many different customers placed orders?; Lesson 873 — Understanding DISTINCT: Removing Duplicate Rows Lesson 887 — Aggregates with DISTINCT
Course-correct quickly: Discover that your chosen metric doesn't align with business goals *before* building a complete pipeline.; Lesson 2111 — Fast Feedback Loops with Stakeholders
Courses: (CourseID, CourseName); Lesson 1065 — Second Normal Form (2NF)
Cov(X, ε) = 0: Lesson 553 — Exogeneity: X Must Be Independent of Errors
covariance: as the "raw" measure of how two variables move together.; Lesson 478 — The Formula for Pearson's r Lesson 519 — Computing β₁: The Slope Estimate
Covariate balance: means the distribution of baseline characteristics—age, prior purchase behavior, device type, etc.; Lesson 1491 — Covariate Balance and Diagnostics
Covariates: (the variables you believe affect survival); Lesson 828 — Fitting the Cox Model Lesson 835 — Customer Churn Prediction with Survival Analysis Lesson 840 — Loan Default Timing and Credit Risk
Coverage error: occurs when your **sampling frame**—the actual list or method you use to select your sample— doesn't include everyone in the target population.; Lesson 249 — Coverage Error and Undercoverage
Cox & Snell R²: Based on likelihood ratios but capped below 1; Lesson 702 — Pseudo R-Squared Measures
Cox models: to identify which covariates (customer age, past purchases, email open time) predict faster responses; Lesson 841 — Campaign Response Time Analysis
Cox Proportional Hazards Model: (or Cox regression) is a semi-parametric method that lets you predict how covariates (like age, treatment, or risk factors) affect survival time **without** assuming what the baseline hazard distribution looks like.; Lesson 825 — What is the Cox Proportional Hazards Model?Lesson 835 — Customer Churn Prediction with Survival Analysis Lesson 836 — Employee Turnover and Retention Analysis Lesson 839 — Time-to- Conversion in Marketing Funnels Lesson 840 — Loan Default Timing and Credit Risk
Cramér's V: is the standard effect size measure for chi-squared tests of independence.; Lesson 429 — Effect Size: Cramér's V and Phi
Crash queries: some databases have row limits or timeouts; Lesson 943 — CROSS JOIN Results: Size and Structure
Create a narrative flow: Guide readers from question → exploration → findings → conclusions in a linear, readable format; Lesson 1982 — Literate Programming with Notebooks
Create disparate impact: even without intent (your model systematically disadvantages a protected group); Lesson 1888 — Protected Classes and Sensitive Attributes
Create dynamic task groups: that adapt to data; Lesson 1836 — Task Dependencies and Flow Control
Create predictable rhythms: Schedule recurring meetings at the project's start.; Lesson 2104 — Communication Cadence and Updates
Create strata: by grouping units with identical covariate values; Lesson 1489 — Stratified Randomization Fundamentals
Create unexpected results: accidental CROSS JOINs are a common SQL mistake; Lesson 943 — CROSS JOIN Results: Size and Structure
Creates a smoother series: that's easier to analyze; Lesson 755 — Moving Averages for Trend Estimation
Creating: and modifying table structures; Lesson 844 — What is SQL?
Creating new features: from existing ones: combining columns, extracting date components (day of week, month), or calculating ratios that encode domain knowledge.; Lesson 2088 — Stage 4: Feature Engineering and Preparation
Creative costs: design, copywriting, video production; Lesson 1753 — Customer Acquisition Cost (CAC): Components and Calculation
Credentials: `.; Lesson 1996 — The .gitignore File
Credentials and secrets: Lesson 2031 — Using .gitignore for Data Science Projects
credible interval: is the Bayesian alternative to a confidence interval.; Lesson 1562 — Credible Intervals for Proportions Lesson 1573 — What is a Credible Interval?
credible intervals: (e.; Lesson 1417 — Bayesian Change-Point Detection Lesson 1539 — Interpreting Posterior Probabilities Lesson 1574 — Credible Intervals vs Confidence Intervals
Credit history location: → historical discrimination effects; Lesson 1889 — Proxy Variables and Redlining
Credit scoring: built on historical lending bias against minorities; Lesson 1881 — Historical and Societal Bias
Criminal justice data: reflecting decades of discriminatory policing practices; Lesson 1881 — Historical and Societal Bias
critical: .; Lesson 376 — The Assumption of Normality in t-Tests Lesson 970 — Subqueries in the FROM Clause (Derived Tables)Lesson 1857 — Logging Best Practices Lesson 1910 — Data Protection Impact Assessments (DPIAs)
Critical caveat: Due to Jensen's inequality, `exp(E[log(Y)])` ≠ `E[Y]`.; Lesson 594 — Interpreting Models After Transformation
Critical insight: Controlling for a collider *opens* a spurious path between X and Y, creating bias where none existed.; Lesson 1471 — Mediators and Colliders
Critical pipeline failures: Page on-call engineer via PagerDuty; Lesson 1851 — Error Logging and Notifications
Critical point: `AVG()` automatically *ignores* NULL values.; Lesson 884 — AVG: Computing Averages
Critical requirement: MVT demands significantly more traffic than A/B testing because you're splitting visitors across many more variants.; Lesson 1689 — Multivariate Testing and Personalization
Critical rule: Each step must map to at least one clear event.; Lesson 1679 — Defining Funnel Steps and Events
critical value: is the multiplier that determines how wide your confidence interval is.; Lesson 268 — Critical Values and the t-Distribution Lesson 271 — Margin of Error Lesson 326 — Critical Values Lesson 607 — Confidence Intervals for Coefficients
Critical Value Approach: Lesson 327 — Decision Rules: Reject or Fail to Reject
critical values: .; Lesson 325 — The Rejection Region Lesson 345 — Directionality in Hypothesis Testing Lesson 355 — Finding Critical Values and P-Values
Critical/Page: Data corruption, complete pipeline failure, SLA breach—requires immediate human intervention; Lesson 1858 — Alerting Strategies
Critically: You *cannot* say "there's a 95% probability the true proportion lies in this interval" under the frequentist interpretation—the parameter either is or isn't in that specific interval.; Lesson 1564 — Comparing Bayesian and Frequentist Proportion Inference
Cross-filtering: Filtering data in one view updates all views; Lesson 1304 — Subplots and Linked Interactions
Cross-tabulations: Visualize frequency patterns across two categorical variables; Lesson 1224 — Heatmaps and Correlation Matrices
Cross-validation: When prediction is paramount, sufficient data exists, and computational resources allow it.; Lesson 616 — Adjusted R-Squared vs Other Criteria Lesson 632 — Parsimony and Occam's Razor Lesson 1463 — RDD Bandwidth Selection and Local Estimation Lesson 2055 — Why Randomness Matters in Data Science
Crossing or converging lines: → Interaction present; one factor's effect changes depending on the other; Lesson 466 — Visualizing Interactions
CRS: defines exactly how coordinates relate to positions on Earth.; Lesson 1308 — Geographic Data Types and Coordinate Systems
CSV: is human-readable and universal but slow to parse and memory-intensive.; Lesson 1133 — Performance Considerations Across Formats
CSV files: Lesson 1779 — Reading and Writing Data in Spark
CTA click: → **Conversion**; Lesson 1690 — Landing Page and CTA Optimization
CTA Performance: Lesson 1690 — Landing Page and CTA Optimization
CTE name: (like `cte_name`) that you choose; Lesson 990 — Basic CTE Syntax and Structure
CTEs: are named and defined upfront (we'll cover these in detail soon):; Lesson 974 — When to Use FROM Subqueries vs CTEs Lesson 991 — CTEs vs Subqueries: When to Use Each
Cube Root Transformation: (`x^(1/3)`) is useful for:; Lesson 213 — Square Root and Cube Root Transformations
Cube the z-scores: For each value, subtract the mean, divide by standard deviation, then cube it.; Lesson 65 — Calculating Skewness
Cubing: keeps the sign, so values below the mean contribute negatively and values above contribute positively.; Lesson 65 — Calculating Skewness
Cultural buy-in: Stakeholders may question "peeking" at results, requiring education; Lesson 1515 — Trade-offs: Sample Size, Speed, and Complexity
Cumulative Distribution Function (CDF): tells you the probability of getting *that value or anything smaller*.; Lesson 120 — Cumulative Distribution Functions (CDF) for Discrete Variables Lesson 157 — Cumulative Distribution Functions (CDFs) for Continuous Variables Lesson 162 — Uniform Distribution: PDF and CDF
Cumulative metrics: $12,500 total revenue from Jan cohort by Week 3; Lesson 1647 — Building a Cohort Table
Cumulative probabilities: P(X ≤ k) — at most k successes, or P(X ≥ k) — at least k successes; Lesson 130 — Calculating Binomial Probabilities
Cumulative probability: requires summing multiple exact probabilities.; Lesson 130 — Calculating Binomial Probabilities
Curiosity: Great data scientists ask "why?; Lesson 7 — The Data Science Skill Stack
CURRENT ROW: Start or end at the current row being processed; Lesson 1020 — UNBOUNDED and CURRENT ROW Keywords
Curved patterns: Your relationship isn't actually linear (violates linearity assumption); Lesson 556 — What Are Residuals and Why Plot Them?Lesson 1189 — Detecting Nonlinear Relationships
Custom functions: Tailor spending to your business needs; Lesson 1512 — Group Sequential Testing
Custom metrics: unique to your business problem; Lesson 306 — Bootstrap for Non-Standard Problems
Custom or minimal theme: (e.; Lesson 1369 — Publication-Ready Plot Styling
Custom order: Based on domain meaning (e.; Lesson 1178 — Bar Charts for Categorical Data
Customer Arrivals: A coffee shop averages 15 customers per hour.; Lesson 144 — Poisson Applications: Arrivals and Events
Customer behavior: Whether users stay, leave, or convert; Lesson 1516 — Business Metrics: Definition and Examples
Customer demographics: Do purchases align with market segment sizes?; Lesson 421 — Applications: Uniform, Genetic Ratios, and Distributions
Customer Lifespan: measures how long customers stay active (in the same time unit).; Lesson 1663 — Simple LTV: Average Revenue Per Customer
Customer Lifetime: = 1 / Monthly Churn Rate; Lesson 1666 — LTV for Subscription Businesses
Customer Lifetime Value (CLV): Predicted total revenue from a customer; Lesson 1516 — Business Metrics: Definition and Examples
Customer Lifetime Value (LTV): is the total revenue a customer generates over their entire relationship with a business—from their first purchase to their last interaction before churning.; Lesson 1661 — What is Customer Lifetime Value (LTV)?
Customer preferences: Your marketing theory predicts 40% will choose red, 35% blue, 25% green.; Lesson 414 — Introduction to Chi-Squared Goodness of Fit Test
Customer Retention Rate: Percentage of customers who remain active; Lesson 1516 — Business Metrics: Definition and Examples
Customer segments: A customer classified as "new" cannot simultaneously be "returning"; Lesson 81 — Mutually Exclusive Events
Customer service calls arriving: If no one has called in the last 5 minutes, that doesn't make a call in the next minute more or less likely; Lesson 167 — Memoryless Property of Exponential
Customer success failures: (poor onboarding, lack of support); Lesson 1675 — Churn Attribution and Root Cause Analysis
Customers: .; Lesson 1051 — Introduction to Foreign Keys
Customers table: customer ID, name, email; Lesson 918 — What is an INNER JOIN?
CUSUM: tracks the *cumulative sum* of deviations from a target value.; Lesson 1403 — CUSUM and EWMA Charts Lesson 1415 — CUSUM: Cumulative Sum Control Chart
cut off sharply: to near-zero beyond that lag.; Lesson 731 — PACF for AR Process Identification Lesson 776 — Identifying AR Order (p) Using PACF
Cut technical debt: Features with low adoption *and* low frequency are candidates for removal; Lesson 1696 — Feature Adoption and Usage Frequency
Cycle through each parameter: , sampling from its conditional distribution:; Lesson 1591 — Gibbs Sampling for Multivariate Posteriors
Cycle Time: tracks how long it takes to complete one unit from start to finish, while **throughput** measures actual units produced per time period.; Lesson 1636 — Manufacturing Metrics: OEE, Yield, and Cycle Time
Cycles: Does website traffic peak on weekends?; Lesson 19 — Temporal Data and Time Series Lesson 1846 — Testing and Validating Dependency Graphs
Cyclical: Variable period (could be 3 years, then 5 years, then 4 years); Lesson 708 — Cyclical Patterns: Non-Fixed Fluctuations
Cyclical patterns: repeating waves suggesting seasonal or periodic effects; Lesson 562 — Index Plots and Time-Ordered Residuals Lesson 708 — Cyclical Patterns: Non-Fixed Fluctuations

D

DAGs (Directed Acyclic Graphs): are the heart of Airflow.; Lesson 1833 — Introduction to Apache Airflow
Dagster: log every execution step.; Lesson 1164 — Tools for Lineage Tracking
Daily Active Users (DAU): counts how many unique users performed a meaningful action in your product on a given day.; Lesson 1694 — Daily Active Users (DAU) and Monthly Active Users (MAU)
Daily data: with weekly seasonality → period = 7; Lesson 746 — Choosing Seasonal Period
Damage vulnerable populations: through automated decisions; Lesson 1888 — Protected Classes and Sensitive Attributes
Damaged trust: in data science as a field; Lesson 34 — Recognizing Boundaries of Competence
Damped trend methods: add a *damping parameter* (usually denoted φ, pronounced "phi") that gradually flattens the trend over time.; Lesson 762 — Damped Trend Methods
Dampens irregular fluctuations: (the noise component); Lesson 755 — Moving Averages for Trend Estimation
Dark patterns: are interface designs that deliberately trick users into giving up data or privacy rights.; Lesson 1914 — Consent in Digital Contexts
Dash: (by Plotly) offers **more control and flexibility**.; Lesson 1330 — Introduction to Interactive Dashboards
dashboard: is an interactive, real-time (or near-real-time) monitoring tool that updates automatically as underlying data changes.; Lesson 1974 — Defining Dashboards and Reports Lesson 1997 — Viewing Repository State with git status
Dashboards excel at monitoring: , letting stakeholders track KPIs, spot anomalies, and maintain situational awareness.; Lesson 1980 — Hybrid Approaches and When to Use Both
data: , never as executable SQL commands.; Lesson 1105 — Parameter Placeholders: Question Marks Lesson 1339 — What is the Grammar of Graphics?Lesson 1340 — The Seven Layers of Grammar Lesson 1348 — The Base Layer: ggplot() and Data Mapping Lesson 1552 — Gamma-Poisson Conjugacy Lesson 1557 — The Beta-Binomial Model Lesson 2082 — Choosing a License for Data Science Projects
Data access: Are there privacy restrictions or missing historical data?; Lesson 1169 — Clarifying Assumptions and Constraints
Data Analyst: investigates *why* the Northeast region underperformed and identifies the key factors; Lesson 4 — Data Science vs Data Analytics vs Business Intelligence
Data Analysts: focus on *understanding the past and present*.; Lesson 2138 — Data Analyst vs Data Scientist vs ML Engineer
Data auditing: Find customers who placed orders in 2023 but not in 2024; Lesson 1002 — EXCEPT: Finding Differences
Data augmentation: Add more representative examples from underrepresented groups to balance your dataset.; Lesson 1894 — Auditing and Remediation Strategies
Data availability: You might have a task that checks which data sources updated today, then spawns processing tasks only for those sources.; Lesson 1844 — Dynamic Dependencies
Data Cleaning & Preparation: Lesson 9 — The Data Science Lifecycle Overview
Data Cleaning and Preparation: to extract meaningful insights—for example, analyzing customer reviews (text) to understand sentiment, or processing images to detect patterns.; Lesson 16 — Structured vs Unstructured Data Lesson 20 — Primary Data Sources: Databases and Data Warehouses Lesson 38 — What is Central Tendency?
Data cleaning matters: "St.; Lesson 1315 — Geocoding and Reverse Geocoding
Data Collection: Lesson 9 — The Data Science Lifecycle Overview Lesson 1169 — Clarifying Assumptions and Constraints Lesson 1878 — What is Bias in Data?
Data Collection and Acquisition: (which you learned earlier), you'll encounter both types.; Lesson 16 — Structured vs Unstructured Data Lesson 20 — Primary Data Sources: Databases and Data Warehouses
Data Completeness: The percentage of expected records that successfully arrived.; Lesson 1856 — Key Metrics to Monitor
Data consistency risk: Aggregates can become stale or incorrect if updates fail; Lesson 1073 — Storing Computed Values and Aggregates
Data corruption during export: Special characters lost when saving to formats that don't support Unicode.; Lesson 1139 — Dealing with Special Characters and Unicode
Data debt: Undocumented preprocessing steps that become "tribal knowledge"; Lesson 2131 — What is Technical Debt in Data Science?
Data decays quickly: Real-time bidding for ads loses value after milliseconds; Lesson 1788 — Streaming Data and Real-Time Requirements
Data documentation: Lesson 1971 — Appendices and Technical Details
Data drift: New patterns emerge that your model hasn't seen; Lesson 15 — Deployment, Monitoring, and Iteration
Data exploration: Understanding the range of values in a column; Lesson 873 — Understanding DISTINCT: Removing Duplicate Rows
Data files: Lesson 2031 — Using .gitignore for Data Science Projects
Data Freshness: The time lag between when data is generated and when it's available for use.; Lesson 1856 — Key Metrics to Monitor
Data Freshness SLO: "Dashboard data will be no more than 4 hours old during business hours"; Lesson 1860 — SLA and SLO Definitions
Data Infrastructure: Lesson 1643 — Building Attribution Frameworks
data integrity: you can't have an order pointing to a non-existent customer.; Lesson 921 — Primary and Foreign Key Relationships Lesson 1810 — Snowflake Schema and Normalization Trade-offs
Data isolation: Keep `data/` and `outputs/` in `.; Lesson 2032 — Organizing Repository Structure for Data Science
Data Licenses: (your datasets):; Lesson 2082 — Choosing a License for Data Science Projects
Data lineage: is the documented history of data from its original source through every transformation, merge, filter, and calculation until it reaches its final form in a report, model, or dashboard.; Lesson 1159 — What is Data Lineage?Lesson 1875 — Data Lineage and Provenance Lesson 1908 — Data Subject Access Requests (DSARs)
Data locality: means running your computation where the data already lives.; Lesson 1772 — Data Locality and Network Bottlenecks
Data minimization: Collect only what you actually need (goodbye "vacuum up everything" strategies); Lesson 1904 — What is GDPR and Why It Matters Lesson 1905 — Core Principles of GDPR
Data parallelism: is like having five chefs all chopping vegetables using the same technique.; Lesson 1769 — Task Parallelism and Work Distribution
Data pipeline issues: Some users' data might not be logged correctly; Lesson 1524 — Sample Ratio Mismatch (SRM)
Data pipeline maintenance: Sources change schemas, APIs deprecate, databases get restructured; Lesson 1979 — Maintenance and Sustainability Considerations
Data points overlap: Points in front hide those behind, potentially concealing important patterns; Lesson 1329 — Effective Use and Pitfalls of 3D Visualizations
Data Poisoning: Adversaries might deliberately feed corrupted data into your pipeline to manipulate model outputs (e.; Lesson 1920 — Anticipating Misuse of Data Products
Data Protection Impact Assessment: "; Lesson 1931 — When to Push Back on Requests
Data provenance: is the documented history of your data: where it came from, who collected it, when it was gathered, and every transformation it underwent.; Lesson 23 — Data Provenance and Metadata Lesson 26 — Reproducibility vs. Replicability Lesson 1875 — Data Lineage and Provenance
Data quality: "What's the earliest date in this dataset?; Lesson 885 — MIN and MAX: Finding Extremes
Data quality checks: Comparing `COUNT(*)` vs `COUNT(DISTINCT column)` reveals how many duplicates exist; Lesson 887 — Aggregates with DISTINCT
Data quality drift: Your model expects feature X between 0-100, but upstream changes cause values of 0-1000.; Lesson 2136 — Monitoring Gaps and Silent Failures
Data Quality Issues: Record missing value patterns, unexpected outliers, or encoding problems.; Lesson 1180 — Documenting Univariate Findings Lesson 1201 — Domain Knowledge as a Hypothesis Source Lesson 1840 — What is Dependency Management in Pipelines?Lesson 1851 — Error Logging and Notifications
Data quality validation: enforces business rules:; Lesson 1826 — Data Validation and Schema Enforcement
data reconciliation: finding discrepancies between two datasets, like customers in your CRM but not in your billing system, or vice versa.; Lesson 938 — Symmetric Difference Pattern Lesson 941 — Use Cases: Data Reconciliation
Data requirements: Ensure sufficient sample size in each age group; Lesson 1204 — From Hypothesis to Analysis Plan
Data rows: Each subsequent line represents one record; Lesson 1125 — CSV Files: Structure and Common Issues
Data Science Lifecycle: , lessons 9-15) are so important.; Lesson 26 — Reproducibility vs. Replicability
Data science problem: "Build a binary classification model to predict 30-day churn probability, achieving minimum 80% recall to catch potential churners, using historical customer behavior data from the past 2 years.; Lesson 2085 — Stage 1: Problem Definition and Scoping
Data science specifics: Check for proper handling of missing data, appropriate train-test splits, reproducibility (random seeds), and whether assumptions of statistical methods are met.; Lesson 2024 — Code Review Best Practices
Data Scientist: builds a predictive model to forecast next quarter's sales and recommends which products to promote; Lesson 4 — Data Science vs Data Analytics vs Business Intelligence
Data Scientists: focus on *predicting and prescribing*.; Lesson 2138 — Data Analyst vs Data Scientist vs ML Engineer
Data shows: Customer purchases spike on Tuesdays; Lesson 1201 — Domain Knowledge as a Hypothesis Source
Data sources: Database names, API endpoints, file paths, table versions; Lesson 1988 — Embedding Data Lineage and Metadata Lesson 2077 — The Purpose and Anatomy of a Good README
Data splitting: `train_test_split` shuffles differently each run; Lesson 2055 — Why Randomness Matters in Data Science
Data surprises: Real-world data is messy.; Lesson 2109 — Why Data Science is Inherently Iterative
Data to visualize: Your x and y values; Lesson 1257 — Creating Your First Plot
Data type: Ordinal or continuous (but non-normal); Lesson 474 — Friedman Test: Non-Parametric Repeated Measures ANOVA Lesson 846 — Tables, Schemas, and Data Types Lesson 1163 — Metadata and Data Dictionaries Lesson 1230 — Choosing the Right Chart Type Lesson 2064 — Creating Data Dictionaries
Data type matching: `int64`, `float64`, `object`, `datetime64` match expectations; Lesson 1151 — Schema Validation
Data types: Is `age` an integer, not a string?; Lesson 1151 — Schema Validation
Data updates frequently: and users want current information; Lesson 1330 — Introduction to Interactive Dashboards
Data Version Control (DVC): extends Git to handle large data files.; Lesson 2066 — Version Control for Data Files
Data versioning issues: The dataset changes, but nobody tracks which version was used (Data Provenance); Lesson 30 — The Reproducibility Crisis and Solutions
Data Volume: The number of records processed per run.; Lesson 1856 — Key Metrics to Monitor
data warehouse: , on the other hand, is like a massive library that collects copies of information from many different filing cabinets across an entire organization.; Lesson 20 — Primary Data Sources: Databases and Data Warehouses Lesson 1807 — Data Warehouse vs Database: Architecture and Purpose
Data warehouses: Lesson 1807 — Data Warehouse vs Database: Architecture and Purpose
Data-driven/algorithmic: Uses statistical models to weight contributions based on observed patterns and incrementality testing; Lesson 1637 — What is Metric Attribution?
data-ink ratio: the proportion of ink (or pixels) in your chart that actually represents data versus non-data elements.; Lesson 1237 — Chart Junk and Data-Ink Ratio Lesson 1246 — Visual Clutter and Chartjunk
database: as a digital filing cabinet where organizations store their day-to-day information in an organized way.; Lesson 20 — Primary Data Sources: Databases and Data Warehouses Lesson 842 — What is a Database?
Database Management System (DBMS): is software that sits between you and your database files, handling all the complex operations of storing, organizing, retrieving, and managing data.; Lesson 845 — Database Management Systems (DBMS)
Database portability: Switch from SQLite to PostgreSQL with minimal code changes; Lesson 1117 — What is an ORM and Why Use It?
Database snapshots: or compressed archives; Lesson 2033 — Git Large File Storage (LFS) for Data Assets
Databases: (JDBC):; Lesson 1779 — Reading and Writing Data in Spark Lesson 1877 — Versioning Strategies for Different Data Types
Datadog: automate this process, offering dashboards that show pipeline status at a glance and trigger alerts when thresholds are breached.; Lesson 1861 — Monitoring Tools and Dashboards
DataFrames: organize your data into named columns—like a spreadsheet or SQL table—but distributed across a cluster.; Lesson 1778 — DataFrames and Spark SQL Basics
Dataset: A collection of typed objects (numbers, strings, custom objects).; Lesson 1777 — RDDs: Resilient Distributed Datasets Fundamentals
Datasets: over ~10 MB that change occasionally; Lesson 2033 — Git Large File Storage (LFS) for Data Assets
DATE: , **DATETIME**, **TIMESTAMP**: Date and time values; Lesson 846 — Tables, Schemas, and Data Types Lesson 1999 — Viewing Commit History
Date columns: Find the earliest and most recent dates; Lesson 885 — MIN and MAX: Finding Extremes
Date fields: (`order_date`, `created_at`) — most queries filter by time ranges; Lesson 1812 — Partitioning and Clustering Strategies
Date sequences: Start dates should precede end dates.; Lesson 1155 — Consistency Checks Across Fields
Date truncation: cuts off the precision beyond a certain level, effectively "rounding down" a timestamp to the beginning of that time period.; Lesson 1043 — Date Truncation and Rounding
Dates: capture days without precise times: "March 15, 2024"; Lesson 19 — Temporal Data and Time Series Lesson 857 — Comparison Operators: Greater and Less Than
DATETIME: , **TIMESTAMP**: Date and time values; Lesson 846 — Tables, Schemas, and Data Types
Dating app data: Attractiveness and personality both lead to getting matches.; Lesson 1473 — Conditioning on Colliders: Selection Bias
DAU: counts unique users engaging with your platform each day, while **MAU** tracks monthly uniques.; Lesson 1631 — Social Media Metrics: DAU/MAU and Content Engagement
Day 3: 1 event occurs, 5 customers at risk; Lesson 811 — Computing the Kaplan-Meier Product
Day 5: 1 event occurs, 4 still at risk; Lesson 811 — Computing the Kaplan-Meier Product
Day 7: 1 event occurs, 3 at risk; Lesson 811 — Computing the Kaplan-Meier Product
Day-1 (D1): Did users find immediate value?; Lesson 1657 — Day-1, Day-7, Day-30 Benchmarks
Day-30 (D30): Is this a keeper?; Lesson 1657 — Day-1, Day-7, Day-30 Benchmarks
Day-7 (D7): Are users forming a habit?; Lesson 1657 — Day-1, Day-7, Day-30 Benchmarks
Days or weeks saved: in fast-moving business environments; Lesson 1515 — Trade-offs: Sample Size, Speed, and Complexity
DB2: Lesson 940 — Database Support and Alternatives
dbt: (transforms + documentation), and **OpenLineage** (open standard) embed lineage capture directly into your code.; Lesson 1164 — Tools for Lineage Tracking Lesson 1821 — Hybrid Approaches and Modern Data Stacks
Dead Letter Queue: is a separate storage location where permanently failed tasks or messages are routed after exhausting all retry attempts.; Lesson 1852 — Dead Letter Queues
Dead Letter Queues: Verify that permanently failed records actually land in your dead letter queue for later investigation.; Lesson 1854 — Testing Error Handling
DEBUG: Detailed diagnostics (variable values, loop iterations); Lesson 1857 — Logging Best Practices
Debugging: Identify when data quality issues were introduced; Lesson 1871 — Why Version Control for Data?Lesson 2047 — What is Dependency Management?
Decaf: | 30 | 50 | 80 |; Lesson 423 — Contingency Tables and Expected Frequencies
Decide: If the proposal is higher, always go there.; Lesson 1590 — The Metropolis-Hastings Algorithm
Decide your credibility level: (e.; Lesson 1575 — Computing Equal-Tailed Credible Intervals
Deciles: (10 groups): Divide data into tenths.; Lesson 57 — Quantiles: Quartiles, Deciles, and Beyond
DECIMAL: Numbers with decimals (e.; Lesson 846 — Tables, Schemas, and Data Types
Decision: We either reject H₀ (evidence is convincing) or fail to reject H₀ (insufficient evidence); Lesson 312 — Hypothesis Testing as a Legal Analogy
Decision hesitation: (need to consult others, compare options); Lesson 1681 — Time-Based Funnel Analysis
Decision points: "If we can explain 60% of the variance, we can make the call"; Lesson 2117 — Defining 'Good Enough' with Stakeholders
Decision rule: If p-value < α (usually 0.; Lesson 378 — Testing Normality: Statistical Tests Lesson 716 — Augmented Dickey-Fuller Test
Decision-focused: Compute expected gains or losses for business decisions; Lesson 1570 — Comparing Two Means: Bayesian Approach
Decision-making: Planning inventory based on predicted demand ranges; Lesson 1571 — Posterior Predictive Distribution for New Data Lesson 1580 — Bayesian vs Frequentist A/B Testing
Decisions Made: Document any filtering or transformation decisions.; Lesson 1180 — Documenting Univariate Findings
Declining Session Frequency: A user who visited daily but now appears weekly is sending a signal.; Lesson 1700 — Leading Indicators of Disengagement
Decomposing Seasonality: concept you've learned, the technique involves:; Lesson 1408 — Handling Multiple Seasonal Periods
Decorative backgrounds: Solid fills, gradients, or textures add no value; Lesson 1237 — Chart Junk and Data-Ink Ratio Lesson 1246 — Visual Clutter and Chartjunk Lesson 1963 — Removing Chartjunk
Decreased Depth of Activity: Fewer actions per session, less content consumed, or shallow navigation compared to their historical baseline.; Lesson 1700 — Leading Indicators of Disengagement
Decreasing adjusted R-squared: when you add a variable means that variable adds more noise than signal—it's not worth including; Lesson 614 — Interpreting Adjusted R-Squared Values
Dedicate regular time blocks: for learning—even 30 minutes daily beats sporadic weekend marathons.; Lesson 2143 — Continuous Learning and Skill Development
Default Alphabetical: Lesson 644 — Choosing a Reference Category
Default position: The defendant is innocent (H₀); Lesson 312 — Hypothesis Testing as a Legal Analogy
default_value: What to return when there's no previous/next row (defaults to NULL); Lesson 1023 — Introduction to Window Functions: LAG and LEAD Lesson 1024 — LAG Function: Accessing Previous Row Values
Defect Rate: quantifies quality problems, often measured in defects per million opportunities (DPMO) in Six Sigma environments.; Lesson 1636 — Manufacturing Metrics: OEE, Yield, and Cycle Time
Defense in depth: means building multiple independent security layers so that if one fails, others still protect you.; Lesson 1109 — Input Validation and Defense in Depth
Defensive Deletes: If your pipeline needs to "delete and reload," make deletions specific.; Lesson 1848 — Designing Idempotent Operations
Define a balance criterion: (e.; Lesson 1492 — Rerandomization and Practical Implementation
Define active users: Who counts as "engaged"?; Lesson 1693 — Defining User Engagement
Define an error metric: (like Mean Squared Error or Mean Absolute Error); Lesson 772 — Holt-Winters Parameter Optimization
Define constraints: minimum spend per channel (contractual obligations), maximum spend (capacity limits), total budget; Lesson 1742 — Budget Optimization Using MMM
Define flexible step completion: – count a step as completed the first time it occurs, regardless of order; Lesson 1683 — Multi-Path and Non-Linear Funnels
Define targets: for LTV:CAC ratio (typically 3:1 minimum); Lesson 1759 — Optimizing ROAS, CAC, and Payback Together
Define terms once: When you must use jargon, explain it immediately; Lesson 1967 — Writing Clear and Concise Analysis Sections
Define the event: Product failure or malfunction; Lesson 837 — Product Warranty and Failure Analysis
Define your population: clearly (e.; Lesson 234 — Simple Random Sampling
Defined: using intuitive methods (e.; Lesson 1868 — Great Expectations Framework
Defined scope: (timeframe, population, geography); Lesson 2093 — Translating Business Questions into Analytical Questions
Defining success criteria: What does "good enough" look like?; Lesson 2085 — Stage 1: Problem Definition and Scoping
Definition drift: Changing what "active" means breaks trend comparisons; Lesson 1694 — Daily Active Users (DAU) and Monthly Active Users (MAU)
Deflating r: Conversely, an outlier that doesn't follow the general pattern (like an extremely tall person who weighs very little due to illness) can weaken an otherwise strong correlation by pulling the line away from the main cluster.; Lesson 481 — Outliers and Their Impact on r
Degrading trends: Newer cohorts drop off faster.; Lesson 1650 — Comparing Cohorts Over Time
Degree centrality: How many connections a node has (the "popular" nodes); Lesson 1320 — Network Metrics and Visual Analysis
Degrees of freedom: represent the amount of independent information you have.; Lesson 352 — The t-Distribution and Degrees of Freedom Lesson 355 — Finding Critical Values and P- Values Lesson 362 — Welch's t-Test for Unequal Variances Lesson 501 — T-Test for Pearson Correlation Significance
degrees of freedom (df): , typically *n - 1* for a mean (where *n* is sample size).; Lesson 268 — Critical Values and the t-Distribution Lesson 270 — Degrees of Freedom in t-Intervals
Delayed feedback: Subscription renewals happen after 12 months; Lesson 1517 — Surrogate Metrics: When Direct Measurement is Impractical
Delayed response: You discover issues after substantial damage occurs; Lesson 1617 — The Danger of Lagging-Only Metrics Lesson 1739 — Adstock and Carryover Effects
DELETE: With cascading rules, the database must find and handle all child records; Lesson 1060 — Trade-offs: Performance vs Integrity
DELETE protection: You cannot delete a parent record if children still reference it (unless you specify cascading behavior); Lesson 1052 — Foreign Key Constraints
Delete ruthlessly: If code isn't in production and hasn't been touched in months, remove it.; Lesson 2135 — Dead Experimental Code and Feature Sprawl
Deleting: records; Lesson 844 — What is SQL?Lesson 1124 — Insert, Update, Delete, and Bulk Operations
Deletion Anomalies: Lesson 1062 — Data Anomalies: Insert, Update, Delete
Delimiter: The character separating values (usually a comma, but sometimes tabs, semicolons, or pipes); Lesson 1125 — CSV Files: Structure and Common Issues
Deliver faster: You can answer the question in hours instead of weeks; Lesson 2110 — The Minimum Viable Analysis (MVA)
Deliver incrementally: Instead of one massive final report, provide preliminary findings early.; Lesson 2099 — Aligning with Business Timelines and Decision Points
Demand mechanism: Can you explain *how* improving this metric causes the outcome?; Lesson 1615 — Correlation Without Causation
Demographic parity: Do groups receive positive outcomes at similar rates?; Lesson 1884 — Detecting Bias in Your Data
Demographic Parity (Statistical Parity): Lesson 1887 — Defining Fairness in Data Science
Demographic statistics: Regions with different population counts; Lesson 43 — Weighted Mean and Its Applications
Demographics: Location, device type, referral source; Lesson 1689 — Multivariate Testing and Personalization Lesson 1701 — What is Customer Segmentation?
denominator: (SE) scales that difference by how much variability you'd expect due to random sampling.; Lesson 353 — Calculating the t-Statistic Lesson 478 — The Formula for Pearson's r
DENSE_RANK(): produces: 1, 2, 2, 3 (no gap!; Lesson 1009 — DENSE_RANK(): Ranking Without Gaps
density: at each point—how likely that region is.; Lesson 172 — Probability Density Function for Normal Distribution Lesson 1267 — Histograms and Distribution Plots
Department A (competitive): Lesson 1428 — The Simpson's Paradox Example
Department B (less competitive): Lesson 1428 — The Simpson's Paradox Example
Department choice: is the confounder.; Lesson 1428 — The Simpson's Paradox Example
Dependencies: Input files, datasets, or code it needs; Lesson 1874 — DVC Pipelines and Stages Lesson 1988 — Embedding Data Lineage and Metadata Lesson 2100 — Documenting Assumptions and Open Questions
Dependency hell: occurs when your project requires specific package versions, but those requirements conflict with each other or with what's installed in different environments.; Lesson 2048 — The Dependency Hell Problem
Dependency isolation: is critical in data science because:; Lesson 2039 — Virtual Environments: Concept and Benefits
dependent: (not independent), we use:; Lesson 88 — General Multiplication Rule Lesson 104 — Dependent Events and Joint Probability Lesson 427 — Interpreting Chi-Squared Test Results Lesson 704 — What Makes Time Series Data Different?
Dependent (paired) samples: have a natural one-to-one correspondence between observations.; Lesson 360 — Independent vs. Dependent Samples
Dependent samples: require a **paired t-test**, which analyzes the *differences* within each pair, effectively reducing the problem to a one-sample test on those differences; Lesson 360 — Independent vs. Dependent Samples
Dependent variable: your time series values; Lesson 738 — Linear Detrending
depends on: how much water is present.; Lesson 465 — Interaction Effects Lesson 648 — What are Interaction Terms?
Deployment: Lesson 9 — The Data Science Lifecycle Overview Lesson 15 — Deployment, Monitoring, and Iteration
Deployment constraints: (pure Python libraries install easier in restricted environments); Lesson 1087 — Database Drivers and Connection Libraries
Deployment instructions: specifying environment requirements; Lesson 2091 — Stage 7: Communication and Handoff
Deployment lag: By the time you retrain and deploy, the distribution may have shifted again; Lesson 2128 — Data Distribution Shifts Frequently
Depth: means creating many hierarchical levels.; Lesson 1623 — Depth vs Breadth in Metric Trees
Descendants: of both pathways (e.; Lesson 1432 — Colliders and Bad Controls
description: .; Lesson 817 — Comparing Multiple Survival Curves Lesson 1163 — Metadata and Data Dictionaries Lesson 1910 — Data Protection Impact Assessments (DPIAs)Lesson 2064 — Creating Data Dictionaries
Descriptive Axis Labels: Lesson 1960 — Annotation and Labeling Best Practices
Descriptive problems: answer "What happened?; Lesson 2096 — Distinguishing Descriptive, Diagnostic, and Prescriptive Problems
Descriptive Statistics: Calculate summary metrics for each segment—average purchase frequency, mean customer lifetime value, median recency, typical basket size.; Lesson 1709 — Segment Profiling and Interpretation
Design an Experiment: Lesson 25 — The Scientific Method in Data Science
Desired power: (typically 0.; Lesson 388 — Effect Size in Sample Size Planning
Desired significance level (α): Typically 0.; Lesson 505 — Sample Size and Power for Correlation Tests
Destination loaders: Write processed data to warehouses, lakes, or operational systems; Lesson 1822 — What is a Data Pipeline?
Detect interactions: How one variable's effect depends on another; Lesson 1190 — Introduction to Multivariate Analysis
Detecting multiple anomalies simultaneously: without the masking effect where one outlier hides another; Lesson 1405 — What is Seasonal Hybrid ESD?
Detection delay: measures the time lag between when the change actually occurs and when your algorithm flags it.; Lesson 1418 — Evaluating Change-Point Detection Methods
Detection lag: grows exponentially—the longer it takes to notice, the more data and decisions are affected; Lesson 2136 — Monitoring Gaps and Silent Failures
Determine proportions: Calculate what percentage of the total population each stratum represents; Lesson 236 — Stratified Sampling
Deterministic with ORDER BY: The same query produces the same numbering; Lesson 1007 — ROW_NUMBER(): Assigning Unique Row Numbers
Detrend the Series: Subtract the trend from the original data: `Y - T`.; Lesson 744 — Classical Decomposition Methods
detrending: when:; Lesson 734 — Why Differencing and Detrending Matter Lesson 740 — Choosing Between Differencing and Detrending
Deuteranopia/Deuteranomaly: (green-weak): difficulty distinguishing red from green; Lesson 1248 — Color Blindness and Color Palette Design
Development branch (`develop`): The integration point for completed experiments that passed initial validation.; Lesson 2035 — Branching Strategies for Experiments
Deviance: is the generalization of RSS for all GLMs.; Lesson 697 — Deviance: A Measure of Model Fit
Deviation from mean: `value - AVG(value) OVER (PARTITION BY category)`; Lesson 1019 — Comparing Values to Window Aggregates
Device: Each phone/computer gets assigned; Lesson 1481 — Unit of Randomization
Device type: Mobile, tablet, desktop; Lesson 1682 — Segmenting Funnels by User Attributes
DFBETAS: answers exactly that question.; Lesson 577 — DFBETAS: Influence on Individual Coefficients
DFFITS: (pronounced "dee-fits") zooms in on a more specific question: *"How much does the predicted value for observation i change when we remove observation i from the dataset?; Lesson 576 — DFFITS: Influence on Fitted Values Lesson 589 — Deciding Whether to Remove Outliers
Diagnostic outcomes: A test result cannot be both "positive" and "negative" at the same time; Lesson 81 — Mutually Exclusive Events
Diagnostic problems: answer "Why did it happen?; Lesson 2096 — Distinguishing Descriptive, Diagnostic, and Prescriptive Problems
Diagonal patterns: Gradual retention decline is normal; sudden cliff-drops warrant investigation; Lesson 1649 — Visualizing Cohort Data with Heatmaps
DiD estimate: Treatment effect = Treatment change - Control change; Lesson 1452 — The Difference-in-Differences Setup
DiD estimator: itself.; Lesson 1454 — Calculating the DiD Estimator
Die fairness: You roll a die 600 times.; Lesson 414 — Introduction to Chi-Squared Goodness of Fit Test
difference: for each pair: subtract one measurement from the other.; Lesson 371 — Calculating Paired Differences Lesson 636 — The Reference Category Lesson 698 — Null and Residual Deviance
Difference from average: `sale_amount - AVG(sale_amount) OVER (PARTITION BY region)`; Lesson 1019 — Comparing Values to Window Aggregates
Differences: between two sample means (comparing groups); Lesson 225 — CLT for Sums and Other Statistics
Differences are subtle: small variations must be detectable; Lesson 1233 — Position as the Most Effective Channel
Differences in distribution shape: between groups, not just center/spread; Lesson 1286 — Violin Plots and Distribution Shape
differencing: or **detrending** when:; Lesson 734 — Why Differencing and Detrending Matter Lesson 740 — Choosing Between Differencing and Detrending
differential privacy: (which adds noise to protect individuals), MPC provides exact computation with zero information leakage about inputs—assuming parties don't collude.; Lesson 1903 — Secure Multi-Party Computation Lesson 1911 — GDPR Compliance for Data Scientists
Difficult to change: Update the same value in multiple places; Lesson 2072 — Configuration Files vs Hard-Coded Values
Difficulty interpreting effects: You can't confidently say "holding all else constant, X increases Y by.; Lesson 580 — What is Multicollinearity?
dimension tables: (containing descriptive attributes like customer names, product details, or dates).; Lesson 956 — Star Schema Joins Lesson 1808 — Star Schema and Fact Tables Lesson 1809 — Dimension Tables and Slowly Changing Dimensions
Dimensionality reduction: solves this by mathematically projecting your high-dimensional data into a lower-dimensional space while preserving as much important structure as possible.; Lesson 1196 — Dimensionality Reduction for Visualization
Dimensions matter: Journals often require specific figure sizes (e.; Lesson 1369 — Publication-Ready Plot Styling
diminishing returns: , and we capture it mathematically with **saturation curves**.; Lesson 1740 — Saturation Curves and Diminishing Returns Lesson 2116 — Diminishing Returns and the 80/20 Rule
direct: relationship between two variables, you must "control for" or "hold constant" potential confounders.; Lesson 509 — Confounding Variables and Control Lesson 1712 — Common Channel Categories
Direct attention: Add a single annotation or callout box pointing to what matters: "Sales dropped 30% here.; Lesson 1958 — Simplifying Visual Complexity
Direct download: Obtaining existing datasets or files; Lesson 11 — Data Collection and Acquisition
Direct identifier removal: means stripping obvious PII like names, social security numbers, email addresses, and phone numbers.; Lesson 1895 — Data Anonymization Basics
Direct interpretation: Roughly tells you the "typical distance" values fall from the mean; Lesson 49 — Standard Deviation: Interpretable Spread
Direct probability statements: "There's a 92% chance treatment A has a higher mean than control"; Lesson 1570 — Comparing Two Means: Bayesian Approach
Direct traffic: (users typing your URL directly); Lesson 1711 — What Are Acquisition Channels?
Directed: Tasks flow in specific directions (task A → task B); Lesson 1833 — Introduction to Apache Airflow
Directed Acyclic Graph: is a mathematical structure where nodes (tasks) are connected by directed edges (dependencies) with one ironclad rule: **no cycles allowed**.; Lesson 1842 — Directed Acyclic Graphs (DAGs)
Directed Acyclic Graph (DAG): is a visual diagram where:; Lesson 1468 — Introduction to Directed Acyclic Graphs (DAGs)
Directed edges: (arrows) represent causal relationships pointing from cause to effect; Lesson 1468 — Introduction to Directed Acyclic Graphs (DAGs)
Directed graphs: show asymmetrical relationships.; Lesson 1316 — Introduction to Network Graphs and Graph Theory Basics
Direction: Do points trend upward (positive) or downward (negative)?; Lesson 480 — Scatterplots and Visual Assessment Lesson 2122 — When Uncertainty Is Acceptable
Direction/Angle: (e.; Lesson 1232 — Perceptual Accuracy Hierarchy
Directional (one-tailed): "The new landing page will *increase* sign-ups by 5%"; Lesson 1479 — Formulating Hypotheses
Directional alignment: When the surrogate goes up, the business metric should too (not down!; Lesson 1518 — The Relationship Between Surrogate and Business Metrics
Dirty reads: Reading uncommitted data that gets rolled back; Lesson 1116 — Transaction Isolation and Concurrency
Disability status: Lesson 1888 — Protected Classes and Sensitive Attributes
Disagreement requires judgment: If tests conflict, lean on your visual evidence and domain knowledge.; Lesson 718 — Interpreting Stationarity Test Results
Disclose conflicts: openly to stakeholders; Lesson 35 — Conflicts of Interest and Independence
Disclose failed model iterations: and why they didn't work; Lesson 1929 — Avoiding Cherry-Picking Results
Disclosure of Purpose: Lesson 1913 — Elements of Valid Consent
Discounted LTV = $248.68: (vs.; Lesson 1665 — Discounted LTV and Present Value
Discovery-driven iteration: Your analysis reveals unexpected patterns, missing data, or invalid assumptions.; Lesson 2092 — Iteration and Feedback Loops in Practice
Discrete (apples): You have exactly 5 apples or 6 apples.; Lesson 18 — Numerical Variables: Discrete and Continuous
Discrete data: assigning categories to distinct colors or positions; Lesson 1344 — Scales and Coordinate Systems
Discrete numerical data: consists of whole numbers that represent *counts* of distinct items.; Lesson 18 — Numerical Variables: Discrete and Continuous
Discrete or categorical variables: (e.; Lesson 1446 — Exact Matching
Discriminatory Application: Your fair hiring model might be selectively applied only to certain demographics while others bypass it entirely.; Lesson 1920 — Anticipating Misuse of Data Products
Discussion is centralized: Questions, explanations, and decisions live alongside the code; Lesson 2022 — Understanding Pull Requests
dispersion parameter: controls the spread or variability.; Lesson 665 — Canonical Form of Exponential Family Distributions Lesson 669 — The Dispersion Parameter φ
Display issues: Characters appearing as , boxes, or garbled text—usually an encoding mismatch.; Lesson 1139 — Dealing with Special Characters and Unicode
Display outputs inline: Charts, tables, and statistical results appear right below the code that generated them; Lesson 1982 — Literate Programming with Notebooks
Distance from max: `MAX(value) OVER (PARTITION BY category) - value`; Lesson 1019 — Comparing Values to Window Aggregates
Distinguish stakeholder types: The person requesting may not be the end user.; Lesson 2102 — Understanding Stakeholder Goals and Constraints
Distracts: your audience's attention from the actual data; Lesson 1963 — Removing Chartjunk
distribute: work across many machines.; Lesson 1764 — The Big Data Technology Landscape Lesson 1768 — Data Parallelism Fundamentals
Distributed: Your data is partitioned across multiple machines.; Lesson 1777 — RDDs: Resilient Distributed Datasets Fundamentals
Distributed Scheduler: Lesson 1795 — Distributed Schedulers and Client Setup
Distributing heterogeneous jobs: to available workers; Lesson 1769 — Task Parallelism and Work Distribution
distribution: centered around 0.; Lesson 253 — Sampling Distribution of the Sample Proportion Lesson 1172 — What is Univariate Analysis?Lesson 1284 — Pair Plots for Multivariate Exploration
Distribution Characteristics: Note shape (skewed, bimodal), spread, and central tendency.; Lesson 1180 — Documenting Univariate Findings
Distribution Checks: Compare your data's distribution against historical baselines.; Lesson 1157 — Statistical Anomaly Detection in QA
Distribution comparisons: ensuring all histograms use the same bin ranges; Lesson 1276 — Sharing Axes Between Subplots
Distribution plots: Show how data is spread (histograms, KDE plots); Lesson 1281 — Introduction to Seaborn's Statistical Plots
Distribution shape: describes the overall form or silhouette of your data when visualized—whether values cluster symmetrically in the middle, bunch up on one side, or spread out evenly.; Lesson 63 — Understanding Distribution Shape Lesson 1267 — Histograms and Distribution Plots
Distributions: Use `geom_histogram` or `geom_boxplot`; Lesson 1342 — Geometric Objects (geoms)Lesson 1867 — Data Profiling and Monitoring Lesson 2087 — Stage 3: Exploratory Data Analysis
Diversification: Relying on a single channel is risky; tracking reveals over-dependence; Lesson 1711 — What Are Acquisition Channels?Lesson 1716 — Channel Mix and Portfolio Thinking
Divide: by the total number of observations; Lesson 45 — Central Tendency for Grouped Data Lesson 237 — Cluster Sampling Lesson 744 — Classical Decomposition Methods
Do missingness patterns correlate: with other variables?; Lesson 1207 — Missing Data Assessment and Strategy
Docker container: runs a lightweight, isolated instance of a complete operating system environment.; Lesson 2045 — Docker for Complete Environment Reproducibility
Document active experiments: Maintain a simple tracking file listing which experiments are ongoing, which succeeded, and which are archived.; Lesson 2135 — Dead Experimental Code and Feature Sprawl
Document and mitigate: Create threat models; build monitoring, rate limits, access controls, or kill switches; Lesson 1924 — Red Team Thinking for Data Scientists
Document assumptions: that came from domain experts or literature; Lesson 1972 — Citations and References in Data Science Reports Lesson 2124 — Insufficient or Low-Quality Data
Document changes: so teams understand why the structure evolved.; Lesson 1626 — Maintaining and Evolving Metric Trees
Document data sources: In your README, specify where data lives and how to access it; Lesson 2070 — Separating Data from Code
Document everything: Record every decision, assumption, and step; Lesson 30 — The Reproducibility Crisis and Solutions Lesson 250 — Strategies for Bias Detection and Mitigation Lesson 1679 — Defining Funnel Steps and Events Lesson 2046 — Best Practices for Environment Management in Teams
Document original purpose: explicitly in your consent forms and data governance policies; Lesson 1915 — Secondary Use and Scope Creep
Document sensitivity: Report when conclusions are stable or when they depend on prior choice; Lesson 1572 — Sensitivity Analysis and Prior Robustness
Document trade-offs: Where do optimizations create tension?; Lesson 1625 — Cross-Functional Metric Dependencies
Document your assumptions: (why did you draw each arrow?; Lesson 1469 — Building a Simple Causal DAG
Document your decision: keep or remove?; Lesson 1209 — Outlier Detection and Investigation Lesson 1909 — Right to Erasure and Data Retention Policies
Document your methods: before seeing results (prevents post-hoc justification); Lesson 35 — Conflicts of Interest and Independence
Documentation: Your validation rules become living documentation; Lesson 1158 — Automated Validation Frameworks Lesson 1925 — Mitigation Strategies and Responsible Disclosure Lesson 2082 — Choosing a License for Data Science Projects
Documentation debt: Skipping README updates or data dictionaries; Lesson 2131 — What is Technical Debt in Data Science?
Documentation Licenses: (README, tutorials, papers):; Lesson 2082 — Choosing a License for Data Science Projects
Documentation Standards: , and **Ethical Principles**.; Lesson 33 — Transparency and Explainability
Documented: automatically in human-readable format; Lesson 1868 — Great Expectations Framework Lesson 1912 — What is Informed Consent in Data Science?
Dodge: them (stand side-by-side); Lesson 1353 — Position Adjustments: Dodge, Stack, and Jitter
Domain context: is the background knowledge about the field you're analyzing: its terminology, business processes, constraints, typical patterns, and unwritten rules.; Lesson 1168 — Understanding Domain Context
Domain Expertise: Lesson 1 — Defining Data Science Lesson 386 — Effect Size Interpretation Guidelines Lesson 1429 — Identifying Confounders in Practice Lesson 1534 — The Prior Distribution Lesson 1883 — Protected Classes and Proxy Variables
Domain knowledge: (understanding the industry, business context, or field you're working in) is crucial.; Lesson 8 — Misconceptions About Data Science Lesson 75 — Domain-Specific Outlier Rules Lesson 193 — Choosing Between Distributions in Practice Lesson 537 — When R-Squared is Not Enough Lesson 585 — Remedies: Variable Selection Lesson 1201 — Domain Knowledge as a Hypothesis Source
Domain knowledge suggests: "Our email campaign goes out Monday evenings—let's check if opens predict Tuesday purchases"; Lesson 1201 — Domain Knowledge as a Hypothesis Source
Domain rules: Values outside physically possible ranges (negative age, 500% growth rate); Lesson 1209 — Outlier Detection and Investigation
Domain validity: Your model might fit your training data beautifully (high R-squared) but make nonsensical predictions outside the observed range.; Lesson 537 — When R-Squared is Not Enough
Domain vocabulary: and key metrics (e.; Lesson 2145 — Transitioning Between Industries and Domains
Domain-specific rules: when you have expert knowledge about what constitutes "normal"; Lesson 1411 — Applications and Limitations
Don't: This is still vulnerable to SQL injection if the list contains user input.; Lesson 1108 — Handling IN Clauses Safely
Don't do this: Here's why:; Lesson 639 — The Dummy Variable Trap
Don't skip the diagnostics: Check histograms, Q-Q plots, and variance equality tests *before* running your test.; Lesson 368 — Common Pitfalls and Best Practices
Don't use SUM for: Counting rows (use `COUNT`), averaging (use `AVG`), or non-numeric data (it only works with numbers).; Lesson 883 — SUM: Calculating Totals
Double funnel: Variance is small in the middle but large at both extremes; Lesson 559 — Detecting Heteroscedasticity (Non-Constant Variance)
Double-counting in partitions: When using the law of total probability, make sure your conditioning events are mutually exclusive and collectively exhaustive—no overlap, no gaps.; Lesson 100 — Common Conditional Probability Mistakes
Doubling your sample size: doesn't cut the standard error in half—it reduces it by a factor of √2 ≈ 1.; Lesson 223 — Standard Error and the CLT
Download buttons: for saving charts as static images; Lesson 1300 — Creating Basic Interactive Charts with Plotly Express
Downside: Wastes compute and time on unchanged data.; Lesson 1828 — Incremental vs Full Load Strategies
Downstream: `train_model` and `generate_report` (direct and transitive); Lesson 1841 — Upstream and Downstream Dependencies
Downstream dependencies: are the tasks that rely on *your* task's output.; Lesson 1841 — Upstream and Downstream Dependencies
Downward (negative) trend: Values generally decrease (e.; Lesson 706 — Trend: Long-Term Direction
Draft pull requests: are a special PR state that signals "this is work-in-progress—feedback welcome, but don't merge yet.; Lesson 2029 — Draft Pull Requests and WIP Workflows
Draw a random sample: of size n from that population; Lesson 222 — Visualizing the CLT with Simulations
Draw arrows: from causes to effects; Lesson 1469 — Building a Simple Causal DAG
Draw Conclusions: Lesson 25 — The Scientific Method in Data Science
Drop: If <5% missing and MCAR; Lesson 1207 — Missing Data Assessment and Strategy Lesson 2015 — Interactive Rebase for History Cleanup
Drop non-significant predictors: if they don't contribute beyond noise.; Lesson 703 — Sequential Model Building Strategy
Drug A: | 8 | 2 |; Lesson 433 — Conducting Fisher's Exact Test
Drug B: | 1 | 9 |; Lesson 433 — Conducting Fisher's Exact Test
Dry runs: Execute the DAG structure logic (declaring dependencies, checking conditions) without running the actual data processing.; Lesson 1846 — Testing and Validating Dependency Graphs
Dtype and converter fallbacks: Lesson 1141 — Recovering from Corrupted or Partially Broken Data
Dual use: refers to technology, methods, or data that can be applied for both beneficial and harmful purposes.; Lesson 1919 — Defining Dual Use in Data Science Lesson 1920 — Anticipating Misuse of Data Products Lesson 1931 — When to Push Back on Requests
Dummy variable encoding: creates separate binary (0/1) columns for each category.; Lesson 635 — Dummy Variable Encoding Basics
Dunn's test: follows Kruskal-Wallis to identify which specific group pairs are significantly different.; Lesson 473 — Post-Hoc Tests After Kruskal-Wallis: Dunn's Test
Dunnett's test: is specialized for situations where you have one control or reference group and several experimental treatments.; Lesson 460 — Dunnett's Test for Control Comparisons
Duplicate rows: from the join can inflate your aggregates; Lesson 933 — Aggregating with LEFT JOINs
Durability: Once committed, changes persist even if the system crashes; Lesson 1110 — What Are Database Transactions?
Duration: How long the test will run; Lesson 1485 — Documentation and Pre-Registration
During deep dives: (testing relationships, checking distributions): exploration; Lesson 1216 — Choosing the Right Purpose
During off-peak hours: (to avoid impacting production systems); Lesson 1831 — What is Job Scheduling?
During Training: Models need an objective function—a mathematical definition of "better.; Lesson 2130 — No Clear Success Metric or Feedback Loop
During Validation: Even if you train something, how do you know it works?; Lesson 2130 — No Clear Success Metric or Feedback Loop
Dynamic dependencies: let your pipeline decide its own structure while running.; Lesson 1844 — Dynamic Dependencies
Dynamic filtering: Filter based on calculated values (like averages, maximums) rather than hardcoded numbers; Lesson 959 — Introduction to Subqueries in WHERE

E

E-commerce: Homepage → Product Page → Add to Cart → Checkout → Purchase; Lesson 1678 — What is Funnel Analysis?Lesson 1686 — Defining Conversions and Conversion Rate Lesson 1694 — Daily Active Users (DAU) and Monthly Active Users (MAU)
E(X) = 1/p: Lesson 151 — Expected Value and Variance for Common Distributions
E(X) = np: Lesson 151 — Expected Value and Variance for Common Distributions
E(X) = p: Lesson 151 — Expected Value and Variance for Common Distributions
E(X) = λ: (expected value); Lesson 141 — Mean and Variance of Poisson Distribution Lesson 151 — Expected Value and Variance for Common Distributions
E(Y) = b'(θ): the first derivative of the cumulant function gives you the expected value; Lesson 667 — Mean and Variance in the Exponential Family
E[X] = r/p: Lesson 136 — Expectation and Variance of the Negative Binomial
Early detection: Catch bad data before it corrupts downstream analysis; Lesson 1158 — Automated Validation Frameworks
Early feedback on approach: "Before I process these 50 datasets, does this transformation logic look right?; Lesson 2029 — Draft Pull Requests and WIP Workflows
Early in your workflow: (profiling, hypothesis generation, outlier detection): exploration; Lesson 1216 — Choosing the Right Purpose
Early quality detection: Comparing survival curves (using the log-rank test) between production batches, suppliers, or manufacturing plants reveals if one group has significantly higher failure rates.; Lesson 837 — Product Warranty and Failure Analysis
Early research: Studies showed coffee drinkers had higher rates of heart disease.; Lesson 1426 — Real-World Examples: Correlation vs Causation
Early-life failures: (infant mortality).; Lesson 189 — Fitting Weibull Models to Lifetime Data
Early-stage customer discovery: What sparks initial interest?; Lesson 1720 — First-Touch Attribution Model
Easier collaboration: When your team knows data will always arrive in tidy format, everyone can use the same templates, functions, and workflows without custom adaptations.; Lesson 1149 — Benefits of Tidy Data for Downstream Work
Easier dimension updates: Changing a category name happens in one place; Lesson 1810 — Snowflake Schema and Normalization Trade-offs
Easier Maintenance: Smaller, focused tables are simpler to understand, query, and modify than massive, repetitive tables.; Lesson 1061 — Introduction to Normalization
Easiest wins: Where do top-performing segments reveal best practices?; Lesson 1685 — Actionable Insights from Funnel Analysis
Easy to interpret: it's in the same units as your data.; Lesson 801 — Forecast Evaluation Metrics
Easy to Measure: Lesson 1598 — Characteristics of Lagging Indicators
Economic business cycles: are the classic example.; Lesson 708 — Cyclical Patterns: Non-Fixed Fluctuations
Economic growth: might correlate with **increased coffee consumption**, not because coffee drives the economy, but because both rise with population and urbanization.; Lesson 1422 — Spurious Correlations
Edge cases: Handling tests that never stop, or stopping rules that trigger unexpectedly; Lesson 1515 — Trade-offs: Sample Size, Speed, and Complexity Lesson 1949 — Anticipating Questions: Building in Appendices Lesson 2024 — Code Review Best Practices Lesson 2129 — Edge Cases Dominate the Problem
Edge Color/Style: Differentiate relationship types with color or dashed/solid lines (friend vs.; Lesson 1319 — Styling Network Visualizations
Edge Weight: Make thicker lines represent stronger relationships (more messages, higher correlation).; Lesson 1319 — Styling Network Visualizations
Edit the section: to reflect the desired final state; Lesson 2011 — Resolving Merge Conflicts
Education: Does the average test score in your classroom differ from the district standard of 75?; Lesson 351 — When to Use a One-Sample t-Test
Education and income: Does higher education lead to higher income, or do wealthier families afford better education?; Lesson 1424 — Reverse Causality
Educational records: reflecting unequal access to opportunities; Lesson 1881 — Historical and Societal Bias
Effect: Compresses large values more than small ones, pulling in the right tail of skewed distributions.; Lesson 592 — Common Transformations: Log, Square Root, Reciprocal
Effect size: How different the true parameter is from the null hypothesis value; Lesson 335 — Calculating Type II Error Probability (Beta)Lesson 341 — Effect Size and Power Lesson 343 — Calculating Power for Common Tests Lesson 344 — Power Analysis in Study Design Lesson 384 — What is Effect Size?Lesson 405 — Sample Size and Power for Proportion Tests Lesson 413 — Effect Size and Practical Significance Lesson 446 — Power and Sample Size for ANOVA (+3 more)
Effect Size (δ): The minimum detectable difference you care about—determined by your MDE (Minimum Detectable Effect).; Lesson 1496 — The Four Parameters of Sample Size Calculation
Effective sample size (ESS): Estimates how many independent samples you truly have after accounting for autocorrelation; Lesson 1592 — Burn-in, Thinning, and Convergence Diagnostics
Ego-network analysis: Model and measure the spillover explicitly; Lesson 1527 — Ignoring Network Effects
elbow method: on within-cluster variance or **silhouette scores** to quantify segment quality at each cut point.; Lesson 1706 — Hierarchical Clustering for Segmentation Lesson 1708 — Choosing the Number of Segments
Electricity demand: might show daily (24-hour) *and* weekly (168-hour) patterns; Lesson 746 — Choosing Seasonal Period
Elevation: How high above (or below) the horizontal plane your camera sits, measured in degrees.; Lesson 1326 — Viewing Angles and Projection Types
Eliminates bottlenecks: Shared resources (like shared memory or a central database) become traffic jams as you scale.; Lesson 1771 — Shared-Nothing Architecture
Eliminates sign problems: Squaring makes all errors positive, so they can't cancel each other out.; Lesson 517 — The Least Squares Criterion
ELT flips this order: Extract, Load, *then* Transform.; Lesson 1816 — What is ELT? Extract, Load, Transform Explained
Email: Campaigns sent to your owned email list; Lesson 1712 — Common Channel Categories
Email campaigns: (newsletters, promotional sends); Lesson 1711 — What Are Acquisition Channels?
Email list size: (without open or click rates); Lesson 1612 — What Are Vanity Metrics?
embarrassingly parallel: (easy to split across machines), while others require extensive data shuffling or coordination.; Lesson 1786 — Data Processing Patterns Best Suited for Spark Lesson 1790 — What is Dask and When to Use It
Emotional connection: means helping your audience *feel* the human stakes behind the numbers.; Lesson 1941 — Emotional Connection Without Manipulation
Emphasizes larger errors: A point that's 4 units away contributes 16 to the sum, while one that's 2 units away contributes only 4.; Lesson 517 — The Least Squares Criterion
Emphasizes smaller values: – the square root compresses large values and stretches small ones, making patterns clearer; Lesson 560 — Scale-Location Plot (Spread-Location Plot)
Emphasizing trends over time: or ordered categories; Lesson 1288 — Point Plots for Trend Visualization
Empirical Rule: is your quick mental map for normal distributions.; Lesson 171 — The 68-95-99.7 Rule (Empirical Rule)
Employee-manager relationships: An `employees` table where each employee has a `manager_id` pointing to another employee in the same table; Lesson 945 — Introduction to Self-Joins
Empty result sets: `AVG()` on zero rows returns NULL, not zero; Lesson 884 — AVG: Computing Averages
Enable comparison: You can compare typical values across groups ("Team A averages 15 sales per week vs.; Lesson 38 — What is Central Tendency?
Enable step-by-step execution: Others can run each cell independently to verify your work or experiment with modifications; Lesson 1982 — Literate Programming with Notebooks
Enables decision-making: You can calculate probabilities, credible intervals, and expected values directly from it; Lesson 1537 — The Posterior Distribution
Enables parallelism: by identifying independent tasks; Lesson 1790 — What is Dask and When to Use It
Enclosure: Elements surrounded by a boundary are perceived as a group.; Lesson 1236 — Gestalt Principles in Visualization
End-to-end integration tests: Run your pipeline on sample data in a test environment to verify the execution order produces expected results.; Lesson 1846 — Testing and Validating Dependency Graphs
Ends at 1: F(∞) = 1 (all probability is accumulated); Lesson 157 — Cumulative Distribution Functions (CDFs) for Continuous Variables
Enforces Status Checks: Automated tests, linters, or CI/CD pipelines must pass before merging.; Lesson 2027 — Protecting Branches and Required Reviews
Engagement Rate: = (Likes + Comments + Shares) / Impressions × 100; Lesson 1631 — Social Media Metrics: DAU/MAU and Content Engagement
Engagement scoring: might show that 20% of users generate 80% of value (power users); Lesson 1701 — What is Customer Segmentation?
Engaging storytelling: Making trends memorable and intuitive; Lesson 1306 — Animation and Time-Based Transitions
Engineering Team Objective: Optimize performance; Lesson 1608 — Connecting North Star Metrics to OKRs
Engineers and Technical Implementers: need:; Lesson 1951 — Understanding Stakeholder Priorities and Constraints
Enhancements (additional layers): Smoothing lines, confidence bands, annotations; Lesson 1347 — Understanding Layers in ggplot2
Enrollments: (StudentID, CourseID, StudentName, CourseName, Grade); Lesson 1065 — Second Normal Form (2NF)
Ensure data integrity: (no duplicate or corrupted records); Lesson 842 — What is a Database?
Ensure immutability: Changing a primary key causes cascading headaches; Lesson 1050 — Choosing Effective Primary Keys
Enter: or **Space** (to activate), and **arrow keys** (for fine control).; Lesson 1253 — Interactive Accessibility: Keyboard Navigation
Entry: Transaction begins automatically; Lesson 1114 — Transaction Context Managers in Python
Environment artifacts: virtual environments, cache folders; Lesson 1996 — The .gitignore File
Environment details: Which worker, timestamp, resource usage; Lesson 1851 — Error Logging and Notifications
Environment differences: Code runs differently on different machines (Code and Environment Management); Lesson 30 — The Reproducibility Crisis and Solutions
Environment files: that list all software dependencies and versions; Lesson 29 — Code and Environment Management
Environment management: means recording *exactly* which software versions you used.; Lesson 29 — Code and Environment Management
Environment variables: and configurations; Lesson 2038 — What is Environment Management and Why It Matters
Environment-driven iteration: External changes (new regulations, market shifts, updated systems) force you to revisit earlier decisions.; Lesson 2092 — Iteration and Feedback Loops in Practice
Environment-specific: Your local paths won't work on a colleague's machine or cloud server; Lesson 2072 — Configuration Files vs Hard-Coded Values
environment.yml: Lesson 2043 — Creating and Exporting Environment Specifications Lesson 2044 — Recreating Environments from Specifications
Epidemiological data: that helps track disease spread can reveal individuals' health status or movements, enabling discrimination or persecution.; Lesson 1919 — Defining Dual Use in Data Science
Equal information: Posterior sits roughly halfway between; Lesson 1567 — Posterior Mean as Weighted Average
Equal opportunity: Among qualified individuals, do groups succeed at similar rates?; Lesson 1884 — Detecting Bias in Your Data
Equal probability: (typically 50/50); Lesson 1487 — Simple Random Assignment
Equal to minimum: `WHERE value = (SELECT MIN(value) FROM table)`; Lesson 964 — Subqueries with Aggregate Functions
equal variances: Lesson 361 — Pooled Variance t-Test Lesson 398 — Choosing Between Parametric and Non-Parametric Tests Lesson 447 — Conducting One-Way ANOVA in Practice
Equal-area: Preserves area ratios but distorts shapes; Lesson 1308 — Geographic Data Types and Coordinate Systems
Equal-tailed intervals: always put 2.; Lesson 1577 — When HDI and Equal-Tailed Intervals Differ
Equal-width vs equal-frequency: Different bin strategies tell different stories; Lesson 1245 — Misleading Aggregations and Binning
Equality checks: Lesson 960 — Subqueries Returning Single Values
Equality searches: (`WHERE id = 100`) jump straight to the target; Lesson 1079 — B-Tree Indexes: Structure and Mechanics
Erasure isn't absolute: GDPR includes exemptions when you must keep data:; Lesson 1909 — Right to Erasure and Data Retention Policies
Ergodicity: Long-run averages converge to expectations under the stationary distribution; Lesson 1589 — Markov Chains: The Foundation of MCMC
ERROR: Failures requiring attention; Lesson 1857 — Logging Best Practices
Error bars: attach vertical or horizontal lines to a point (often a mean) showing ±1 standard deviation, ±2 SE, or confidence intervals.; Lesson 55 — Visualizing Spread Lesson 1244 — Omitting Uncertainty and Variability
Error classification: Transient network issue or data quality problem?; Lesson 1851 — Error Logging and Notifications
Error handling: Logs failures, sends alerts, and implements retry logic; Lesson 1822 — What is a Data Pipeline?
Error measures: For numerical predictions, how far off were your guesses on average?; Lesson 14 — Model Evaluation and Validation
Escalation patterns: Moving from chat to phone to "speak to a manager"; Lesson 1673 — Leading Indicators of Churn
ESD: on what remains; Lesson 1408 — Handling Multiple Seasonal Periods
Establish coordination protocols: When do teams need to align before taking action?; Lesson 1625 — Cross-Functional Metric Dependencies
estimate: population parameters.; Lesson 229 — Defining Samples and Statistics Lesson 607 — Confidence Intervals for Coefficients Lesson 1449 — Coarsened Exact Matching (CEM)
Estimate densities: `stat_density()` creates smooth distribution curves; Lesson 1352 — Statistical Transformations with stat_* Layers
Eta-squared: is the most straightforward effect size for ANOVA.; Lesson 445 — Effect Size: Eta-Squared and Omega-Squared
Ethical collection: – Was this data gathered with people's informed consent?; Lesson 36 — Responsible Data Sourcing and Use
Ethical Principles: .; Lesson 33 — Transparency and Explainability
Ethical violations: if you mishandle sensitive data or misrepresent uncertainty; Lesson 34 — Recognizing Boundaries of Competence
ETL: stands for **Extract, Transform, Load**—a traditional data integration pattern that moves data from source systems into a data warehouse or analytics platform.; Lesson 1815 — What is ETL? Extract, Transform, Load Explained
Etsy: Gross Merchandise Sales (GMS) — captures value for both buyers (finding unique items) and sellers (making sales).; Lesson 1606 — Examples of North Star Metrics by Industry
Evaluate improvement: Does deviance drop meaningfully?; Lesson 703 — Sequential Model Building Strategy
Evaluate trade-offs: Parametric tests have higher power when assumptions hold; non-parametric tests are safer when assumptions are questionable; Lesson 398 — Choosing Between Parametric and Non-Parametric Tests
Evaluates segments: For each possible segmentation (sets of change-points), calculates a cost based on how well each segment fits the data; Lesson 1416 — PELT Algorithm: Pruned Exact Linear Time
Evaluation: Lesson 9 — The Data Science Lifecycle Overview
event: is simply a collection of one or more outcomes from your sample space.; Lesson 78 — Events as Subsets of the Sample Space Lesson 803 — Defining the Event and Time Origin Lesson 835 — Customer Churn Prediction with Survival Analysis Lesson 840 — Loan Default Timing and Credit Risk
Event frequencies: Average 2.; Lesson 1647 — Building a Cohort Table
Event indicator: (1 = event occurred, 0 = censored); Lesson 828 — Fitting the Cox Model
Event logs: specific actions like "clicked_ad", "opened_email", "viewed_product"; Lesson 1719 — The Customer Journey and Touchpoints
Events: subjects who experienced the event (death, churn, failure) at that exact time; Lesson 812 — Handling Event Times and Censoring Lesson 1679 — Defining Funnel Steps and Events
Every selection is independent: picking one individual doesn't affect who else gets picked; Lesson 234 — Simple Random Sampling
Everything-to-Target: Always examine relationships *with* your target variable first.; Lesson 1210 — Relationship Exploration: Correlation and Association
evidence: (your observations), and produce **posterior beliefs** (your updated understanding).; Lesson 116 — From Bayes' Theorem to Bayesian Inference Lesson 1536 — The Evidence (Marginal Likelihood)Lesson 1546 — The Role of the Normalizing Constant
Evidence is strong: Overwhelming data swamps your initial belief; Lesson 115 — Prior Sensitivity Analysis
Evidence is weak: A mildly positive test result won't overcome a very skeptical or very confident prior; Lesson 115 — Prior Sensitivity Analysis
Evidence package: Lesson 1946 — Supporting Your Claims with Evidence
Evolving data: Show how metrics change over time; Lesson 1327 — Creating Animations with FuncAnimation
EWMA: applies weighted averaging where recent observations matter more than older ones.; Lesson 1403 — CUSUM and EWMA Charts
Exact duplicate detection: Find rows where *all* columns match exactly—these are often accidental copies from data loading errors.; Lesson 1154 — Uniqueness and Duplication Checks
Exact p-values: use the true, theoretical probability distribution without approximation.; Lesson 322 — Exact vs Asymptotic P-Values
Exact pinning: guarantees that everyone running your code uses identical package versions, maximizing reproducibility.; Lesson 2050 — Pinning Versions vs Flexible Ranges
Exact probabilities: P(X = k) — exactly k successes; Lesson 130 — Calculating Binomial Probabilities
Exact probability: uses the binomial PMF directly.; Lesson 130 — Calculating Binomial Probabilities Lesson 431 — When Chi-Squared Assumptions Fail
exactly: the same measured characteristics as treated units.; Lesson 1446 — Exact Matching Lesson 1993 — The Three States: Working Directory, Staging, Repository
Exactly 4 accept: Calculate P(X=4) directly with the PMF; Lesson 130 — Calculating Binomial Probabilities
exactly the same: as taking the Pearson correlation coefficient between X and Y and squaring it.; Lesson 534 — R-Squared vs Correlation Squared Lesson 647 — Impact on Model Results and Reporting
Exactly two outcomes: Success or failure, no middle ground; Lesson 123 — Bernoulli Trial Definition and Properties
Examine edge cases: Look for missing groups entirely—this reveals coverage error.; Lesson 250 — Strategies for Bias Detection and Mitigation
Examine the coefficient magnitude: in its real-world units; Lesson 609 — Practical vs Statistical Significance
Examine transformations: Review each transformation step—was a join dropping records?; Lesson 1870 — Root Cause Analysis for Quality Issues
Example (left-tailed): H₀: μ = 100 vs H₁: μ < 100; Lesson 311 — One-Sided vs Two-Sided Alternatives
Example (right-tailed): H₀: μ = 100 vs H₁: μ > 100; Lesson 311 — One-Sided vs Two-Sided Alternatives
Example 1: In educational research, improving test scores by d = 0.; Lesson 386 — Effect Size Interpretation Guidelines
Example 2: In pharmaceutical trials, a pain medication with d = 0.; Lesson 386 — Effect Size Interpretation Guidelines
Example 2: Video Autoplay: Lesson 1521 — Risks of Optimizing for Surrogates
Example 3: In physics experiments measuring fundamental constants, even d = 0.; Lesson 386 — Effect Size Interpretation Guidelines
Example analogy: A company's sales look higher in stores with fewer employees.; Lesson 1194 — Simpson's Paradox and Confounding Lesson 1937 — The Hero's Journey: Making Your Audience the Hero
Example context: Lesson 265 — Using Standard Error in Practice
Example handling: Lesson 1100 — Handling NULL Values and Data Types
Example interpretation: Lesson 730 — Interpreting PACF Plots
Example intuition: Imagine you're testing if a coin is fair (H₀: p = 0.; Lesson 335 — Calculating Type II Error Probability (Beta)Lesson 1513 — Always-Valid Inference and Confidence Sequences
Example output: Lesson 1008 — RANK(): Handling Ties with Gaps
Example pattern: Calculating daily revenue across millions of transactions.; Lesson 1786 — Data Processing Patterns Best Suited for Spark Lesson 1865 — Data Quality Checks in Pipelines
Example scenario: You want to find average order values by region, but only for orders placed in 2023.; Lesson 895 — Combining WHERE and GROUP BY Lesson 898 — HAVING Clause Fundamentals Lesson 1600 — Business Examples: Revenue vs Pipeline
Example scenarios: Lesson 36 — Responsible Data Sourcing and Use
Example structure: Lesson 891 — Single Column Grouping Lesson 1977 — Design Principles for Dashboards Lesson 2071 — Modular Code: Functions and Scripts
Example use case: Instead of joining `orders` and `products` and summing totals every time someone checks monthly sales, create a materialized view that stores those monthly totals.; Lesson 1076 — Materialized Views and Summary Tables
Example Values: A few representative samples; Lesson 2064 — Creating Data Dictionaries
Example violation: In a customer churn study, if high-risk customers are more likely to stop using your product *and* more likely to unsubscribe from your tracking emails (causing censoring), your results will be biased.; Lesson 821 — Assumptions of the Log-Rank Test
Example with numbers: Lesson 859 — IN Operator for Multiple Values
Example with text values: Lesson 859 — IN Operator for Multiple Values
Excel: adds formatting overhead and reads even slower than CSV, especially with multiple sheets.; Lesson 1133 — Performance Considerations Across Formats
Excess Kurtosis (Pearson's): Sometimes the calculation stops before the final "-3" adjustment.; Lesson 67 — Calculating Kurtosis
Excessive borders: and boxes around every element; Lesson 1963 — Removing Chartjunk
Excessive grid lines: Too many or overly prominent gridlines; Lesson 1246 — Visual Clutter and Chartjunk
exchangeability: if the groups had swapped assignments, we'd expect the same average outcome.; Lesson 1438 — Ensuring Balance Between Groups Lesson 1443 — Observational Studies vs Randomized Experiments
Excluding multiple values: Lesson 868 — The NOT Operator
Excluding pattern matches: Lesson 868 — The NOT Operator
Executes the subquery first: and gets back multiple rows (each with one value); Lesson 961 — IN Operator with Subqueries
Execution order chaos: Tasks might run simultaneously when they should be sequential, causing some to fail because required data isn't ready yet.; Lesson 1840 — What is Dependency Management in Pipelines?
Execution stage: Lesson 1857 — Logging Best Practices
Executive Summary: (1 page): Key findings, recommendations, business impact; Lesson 1966 — Report Structure and Executive Summary
Executive/Business stakeholders: Lead with directional findings and practical significance.; Lesson 1953 — Adjusting Statistical Depth by Audience
Executives: making strategic decisions need clean, simple charts that communicate the main point at a glance: think bar charts showing three key metrics or a single trend line.; Lesson 1954 — Tailoring Visualizations to Audience Needs
Executives and Business Leaders: focus on:; Lesson 1951 — Understanding Stakeholder Priorities and Constraints
Executor: Determines *how* tasks run (locally, distributed, etc.; Lesson 1833 — Introduction to Apache Airflow
Exercise and health: Do healthier people exercise more, or does exercise make people healthier?; Lesson 1424 — Reverse Causality
Existing knowledge: What do subject-matter experts already know?; Lesson 1168 — Understanding Domain Context
EXISTS: stops searching as soon as it finds *any* matching row.; Lesson 985 — EXISTS vs IN: Performance Considerations
Exit: Always happens, ensuring clean boundaries; Lesson 1114 — Transaction Context Managers in Python
Exogeneity: means that your predictor variable X is determined *outside* the model and is completely independent of the error term ε.; Lesson 553 — Exogeneity: X Must Be Independent of Errors
Expanding funnel: Residuals start tight on the left and fan out wider on the right.; Lesson 559 — Detecting Heteroscedasticity (Non-Constant Variance)
Expectations and Success Criteria: Document what each stakeholder considers "success.; Lesson 2101 — Identifying and Mapping Stakeholders
Expected: The count you would expect if the null hypothesis were true (which you calculated in the previous lesson); Lesson 417 — The Chi-Squared Test Statistic Formula
expected frequencies: across all categories.; Lesson 414 — Introduction to Chi-Squared Goodness of Fit Test Lesson 416 — Calculating Expected Frequencies Lesson 423 — Contingency Tables and Expected Frequencies
Expected Frequency Requirement: Lesson 426 — Assumptions and Sample Size Requirements
Expected Impact: Prevent 20-30 account cancellations/month ($50K-75K MRR saved); Lesson 1948 — The Recommendation Slide: Making It Actionable
Expected interviews needed: 3/0.; Lesson 136 — Expectation and Variance of the Negative Binomial
Expected loss: answers: "If I pick this variant and it's wrong, how much will I lose on average?; Lesson 1584 — Expected Loss and Decision Making Lesson 1586 — Multi-Armed Bandit Connections Lesson 1587 — Bayesian A/B Testing in Practice
Expected loss threshold: Stop when the expected loss of choosing variant B is below $X; Lesson 1585 — Early Stopping in Bayesian Tests
Expected outputs: What files or results should appear; Lesson 1989 — Best Practices for Sharing Reproducible Reports
Expected uniqueness violated: An ID column contains repeats; Lesson 1154 — Uniqueness and Duplication Checks
expected value: (often written as E(X) or μ) is the long-run average you'd expect if you could repeat a random process infinitely many times.; Lesson 121 — Expected Value of Discrete Random Variables Lesson 122 — Variance and Standard Deviation of Discrete Random Variables Lesson 125 — Bernoulli Mean and Variance Lesson 147 — Expected Value of Discrete Random Variables Lesson 152 — Decision Making Under Uncertainty Lesson 255 — Expected Value of Sample Statistics
Expected value (mean): E(X) = 1/p; Lesson 133 — Expectation and Variance of the Geometric Distribution
Experiment: with different sample sizes, population shapes, or statistics; Lesson 259 — Simulating Sampling Distributions Lesson 498 — Bradford Hill Criteria for Causation
Experiment branches: (e.; Lesson 2035 — Branching Strategies for Experiments
Experiment snapshots: `exp-baseline-xgboost`, `exp-feature-engineering-v3`; Lesson 2037 — Tagging Releases and Experiment Snapshots
Experimental tracking nightmare: Hard to remember which parameter combinations you've tested; Lesson 2072 — Configuration Files vs Hard-Coded Values
Experimentation: Create branches to test new approaches without breaking working code; Lesson 1990 — What is Version Control and Why Git?Lesson 2005 — What are Branches and Why Use Them?
Expert input: (what practitioners say "never happens"); Lesson 75 — Domain-Specific Outlier Rules
Explanatory visualization: is your public communication tool.; Lesson 1213 — Exploratory vs Explanatory Visualization
Explicit transactions: You manually control when a group of statements is committed or rolled back.; Lesson 1111 — Autocommit Mode vs Explicit Transactions
Exploitation: playing the machine you currently believe is best; Lesson 1586 — Multi-Armed Bandit Connections
Exploration: trying different machines to learn which pays best; Lesson 1586 — Multi-Armed Bandit Connections
Exploration & Analysis: Lesson 9 — The Data Science Lifecycle Overview
Exploratory Analysis: means investigating your data to discover patterns, spot anomalies, and understand relationships between different pieces of information.; Lesson 13 — Exploratory Analysis and Modeling Lesson 38 — What is Central Tendency?Lesson 1395 — When to Use Grubbs' Test Lesson 1727 — Linear Attribution Model
Exploratory data analysis: where you need to see data distributions, plot trends, and test hypotheses interactively; Lesson 2074 — Notebooks vs Scripts: When to Use Each
Exploratory research: Might use α = 0.; Lesson 342 — Alpha Level Trade-offs
Exploratory visualization: is your private investigation tool.; Lesson 1213 — Exploratory vs Explanatory Visualization
Exploratory vs confirmatory: research; Lesson 324 — Common Significance Levels: 0.05, 0.01, and 0.10
Explore constraints explicitly: Ask about:; Lesson 2102 — Understanding Stakeholder Goals and Constraints
Exploring data: You're getting familiar with a new table and want to see what's in it; Lesson 851 — Selecting All Columns with Asterisk
Exponential: works when failure or arrival rates are constant over time (memoryless property).; Lesson 193 — Choosing Between Distributions in Practice Lesson 664 — What is the Exponential Family of Distributions?
Exponential complexity: With multiple attributes, the number of subgroups grows quickly; Lesson 1893 — Intersectionality in Fairness
Exponential decay: Sharp initial drop, then gradual decline (common in digital marketing); Lesson 1639 — Time Windows and Attribution Decay
exponential distribution: flips this around—it models *how long you wait* until the next event occurs.; Lesson 164 — The Exponential Distribution Lesson 182 — Special Cases: Exponential and Chi-Squared
exponential family: is a special class of probability distributions that can all be written in the same mathematical form.; Lesson 664 — What is the Exponential Family of Distributions?Lesson 666 — Natural Parameter and Sufficient Statistics Lesson 690 — The Poisson Distribution as a GLM
Exponential smoothing: uses a declining weight scheme controlled by parameter `α`.; Lesson 764 — Exponential Smoothing vs Moving Averages
Exponentiate the bounds: Transform to odds ratio scale:; Lesson 685 — Confidence Intervals for Odds Ratios
Exponentiate the coefficient: exp(β); Lesson 681 — Interpreting Logistic Regression Coefficients
Expose data issues early: Basic analysis quickly reveals data quality problems; Lesson 2110 — The Minimum Viable Analysis (MVA)
Extended: Multiple months if seasonal patterns matter to your metric; Lesson 1484 — Duration and Timing Considerations
Extended evidence: Lesson 1971 — Appendices and Technical Details
External benchmarks: for your product category; Lesson 1657 — Day-1, Day-7, Day-30 Benchmarks
External conditions: A task queries an API to see which regions have new data, then dynamically creates one downstream task per region.; Lesson 1844 — Dynamic Dependencies
External factors: (business closure, budget cuts); Lesson 1675 — Churn Attribution and Root Cause Analysis Lesson 1741 — Controlling for Seasonality and External Factors
External task dependencies: explicitly declare that a task in Pipeline A depends on a task in Pipeline B.; Lesson 1845 — Cross-Pipeline Dependencies
External validity: asks: *Do these results apply beyond your study's specific conditions?; Lesson 1441 — Internal vs External Validity
Extra transparency: about consequences of declining; Lesson 1918 — Special Populations and Vulnerable Groups
Extract: data from source systems; Lesson 1817 — Historical Context: Why ETL Came First
Extract and lightly transform: sensitive data (masking PII) before loading to comply with regulations; Lesson 1821 — Hybrid Approaches and Modern Data Stacks
Extract the Irregular (I): Subtract both trend and seasonal components from the original: `I = Y - T - S`.; Lesson 744 — Classical Decomposition Methods
Extract the timezone: Determine what zone a timestamp uses; Lesson 1042 — Working with Timestamps and Time Zones
Extract the Trend (T): Apply a moving average to smooth out the data.; Lesson 744 — Classical Decomposition Methods
Extract, Transform, Load: a traditional data integration pattern that moves data from source systems into a data warehouse or analytics platform.; Lesson 1815 — What is ETL? Extract, Transform, Load Explained
Extracting: data from operational databases, APIs, or files; Lesson 1816 — What is ELT? Extract, Load, Transform Explained
Extraction timestamp: Exact date and time you pulled the data; Lesson 1161 — Documenting Data Sources
Extraction tools: (Fivetran, Airbyte) that load raw data with minimal transformation; Lesson 1821 — Hybrid Approaches and Modern Data Stacks
Extreme outliers: Even with larger samples, severe outliers can distort the mean and inflate the standard error, making your t-statistic misleading.; Lesson 390 — When Parametric Tests Fail: Violations of Assumptions
Extreme predictions: When estimating values far from your data's center; Lesson 550 — Normality of Residuals

F

F < 10: Weak instrument—your second-stage estimates may be severely biased; Lesson 1467 — Testing Instrument Strength and Validity
F > 20–30: Strong instrument; Lesson 1467 — Testing Instrument Strength and Validity
F ≥ 10: Generally acceptable strength; Lesson 1467 — Testing Instrument Strength and Validity
F-distribution: .; Lesson 440 — The F-Statistic and Its Distribution
F-ratio: is simply the ratio of these two mean squares:; Lesson 443 — Mean Squares and the F-Ratio
F-statistic: and **p-value** in the "Between Groups" row tell you whether your groups differ significantly.; Lesson 444 — The ANOVA Table Lesson 464 — Main Effects in Two-Way ANOVA Lesson 1467 — Testing Instrument Strength and Validity
F-statistic is large: and the **p-value is small** (typically < 0.; Lesson 627 — The F-Test for Model Comparison
F-test: (covered in lesson 363), or simply inspect side-by-side boxplots.; Lesson 379 — The Assumption of Equal Variances (Homoscedasticity)Lesson 654 — Testing Interaction Significance
F-test for model comparison: gives you a statistical answer.; Lesson 627 — The F-Test for Model Comparison
F(Factor A): = MS_A / MS_within; Lesson 467 — Two-Way ANOVA F-Tests
F(Factor B): = MS_B / MS_within; Lesson 467 — Two-Way ANOVA F-Tests
F(Interaction): = MS_A×B / MS_within; Lesson 467 — Two-Way ANOVA F-Tests
F(x): , answers: "What's P(X ≤ x)?; Lesson 120 — Cumulative Distribution Functions (CDF) for Discrete Variables
Facebook/Meta: Monthly Active Users (MAU) or Daily Active Users (DAU) — engagement captures the platform's value through connection and content sharing.; Lesson 1606 — Examples of North Star Metrics by Industry
Faceted plots: split your data by a third variable, showing if the pattern changes across groups; Lesson 1195 — Interaction Effects Between Variables
Faceting: means creating multiple small charts—one per category—arranged in a grid.; Lesson 1193 — Conditional Distributions and Faceting Lesson 1289 — Regression Plots: regplot and lmplot Lesson 1356 — What Are Facets and Small Multiples?
Facets: Small multiples—splitting data into separate subplots; Lesson 1340 — The Seven Layers of Grammar Lesson 1362 — When to Use Facets vs. Other Approaches
Facets work best when: Lesson 1362 — When to Use Facets vs. Other Approaches
fact table: (containing measurements like sales amounts, quantities, or counts) connects to multiple **dimension tables** (containing descriptive attributes like customer names, product details, or dates).; Lesson 956 — Star Schema Joins Lesson 1808 — Star Schema and Fact Tables
Factor A: Teaching method (online, hybrid, in-person); Lesson 463 — Introduction to Two-Way ANOVA
Factor A main effect: Does Factor A matter, overall?; Lesson 464 — Main Effects in Two-Way ANOVA
Factor B: Time of day (morning, afternoon); Lesson 463 — Introduction to Two-Way ANOVA
Factor B main effect: Does Factor B matter, overall?; Lesson 464 — Main Effects in Two-Way ANOVA
Factor impact: Do email campaigns accelerate conversion compared to ads?; Lesson 839 — Time-to-Conversion in Marketing Funnels
Fail to Reject H₀: | Correct | Type II Error (β) |; Lesson 338 — What is Statistical Power?Lesson 358 — Worked Example: One-Sample t-Test in Practice
Failed jobs create duplicates: Rerunning after a crash might insert the same records twice; Lesson 1847 — What is Idempotency?
Failure: Rolls back automatically if an exception occurs; Lesson 1114 — Transaction Context Managers in Python
Fair dice or spinners: Are all six faces equally likely?; Lesson 421 — Applications: Uniform, Genetic Ratios, and Distributions
fairness: in how credit flows, enabling better resource allocation and learning.; Lesson 1643 — Building Attribution Frameworks Lesson 1878 — What is Bias in Data?
Fairness audits: Test model outcomes across demographic groups; Lesson 1883 — Protected Classes and Proxy Variables
Fairness through awareness: takes the opposite approach: explicitly include sensitive attributes so you can measure and correct for disparate impact.; Lesson 1892 — Fairness Through Unawareness vs Awareness
Fairness through unawareness: sounds intuitive—if the model can't see protected attributes, it can't discriminate, right?; Lesson 1892 — Fairness Through Unawareness vs Awareness
FALSE: (already false regardless); Lesson 871 — NULL Handling with Logical Operators
False Discovery Rate (FDR): Controls the expected proportion of false positives among all significant results; Lesson 512 — Testing Significance in Correlation Matrices Lesson 1505 — False Discovery Rate (FDR)Lesson 1506 — Benjamini-Hochberg Procedure
false negative: or Type II error.; Lesson 1495 — Power Analysis Fundamentals Lesson 1529 — Running Underpowered Tests
False Negatives (FN): Missed actual change-points; Lesson 1418 — Evaluating Change-Point Detection Methods
False Positive Rate: You should see "significant" results only at your chosen alpha level (e.; Lesson 1483 — Pre-Experiment Validation
false positives: for stationarity.; Lesson 717 — KPSS Test Lesson 1518 — The Relationship Between Surrogate and Business Metrics
False Positives (FP): Flagged changes where none exist (false alarms); Lesson 1418 — Evaluating Change-Point Detection Methods
False precision: The mathematical convenience might tempt you to use an inappropriate prior; Lesson 1555 — Advantages and Limitations of Conjugate Priors
Falsifiability: Can be proven wrong with data; Lesson 1200 — Formulating Specific, Testable Hypotheses
Familiar API: If you know SQL or pandas, DataFrames feel natural; Lesson 1778 — DataFrames and Spark SQL Basics
family-wise error rate: (the probability of making *any* Type I error across all tests) balloons.; Lesson 337 — Error Rates in Practice: Multiple Testing Lesson 1501 — The Multiple Testing Problem
Family-Wise Error Rate (FWER): is the probability of making **at least one false discovery** (Type I error) across a "family" of hypothesis tests conducted simultaneously.; Lesson 1502 — Family-Wise Error Rate (FWER)Lesson 1505 — False Discovery Rate (FDR)
Fan-in: Multiple tasks must complete before one starts; Lesson 1843 — Declaring Dependencies in Orchestration Tools
Fan-out: One task triggers multiple parallel tasks; Lesson 1843 — Declaring Dependencies in Orchestration Tools
Fast-moving funnel: Users zip through in minutes (smooth experience); Lesson 1681 — Time-Based Funnel Analysis
Faster payback: = more cash to reinvest in growth; Lesson 1757 — Payback Period: Definition and Importance
Fault isolation: If one node crashes, others keep running.; Lesson 1771 — Shared-Nothing Architecture
Favor parsimony: When models perform similarly, choose the simpler one (Occam's Razor); Lesson 633 — Practical Model Selection Strategy
Feather: is a lightweight columnar format optimized for speed.; Lesson 1129 — Parquet and Feather: Columnar Formats
Feature adoption: without connecting to retention or expansion revenue; Lesson 1616 — Metrics Divorced from Revenue Lesson 1646 — Defining Cohort Start Events Lesson 1696 — Feature Adoption and Usage Frequency
Feature bloat: Models train slower and become harder to explain when filled with irrelevant features; Lesson 2135 — Dead Experimental Code and Feature Sprawl
Feature branches: you're working on alone; Lesson 2020 — The Golden Rule of Rebase
Feature development: Build new model features without disrupting production code; Lesson 2005 — What are Branches and Why Use Them?
Feature engineering needs: "Create interaction term between X and Y"; Lesson 1212 — EDA Summary Documentation and Next Steps
Feature engineering repeats: New behaviors may require new features entirely; Lesson 2128 — Data Distribution Shifts Frequently
Feature requests: Stakeholders inevitably want new views or filters; Lesson 1979 — Maintenance and Sustainability Considerations
Feature selection discipline: After adding features, measure their importance and remove low-contributors before the next iteration.; Lesson 2135 — Dead Experimental Code and Feature Sprawl
Feature sprawl: happens when you keep accumulating features for models without ever pruning the ones that don't contribute value.; Lesson 2135 — Dead Experimental Code and Feature Sprawl
Features: Early behavior metrics like days to first purchase, first-order value, login frequency in week one, number of products viewed, engagement with onboarding emails; Lesson 1668 — Predictive LTV Models
Feedback loops: A biased model's decisions become tomorrow's training data (e.; Lesson 1882 — Algorithmic Amplification of Bias Lesson 1923 — Algorithmic Amplification of Harm
Few covariates: to match on (2-4 variables); Lesson 1446 — Exact Matching
fewer samples: you need.; Lesson 405 — Sample Size and Power for Proportion Tests Lesson 1515 — Trade-offs: Sample Size, Speed, and Complexity
Fewer Type I errors: – You're less likely to reject H₀ when it's actually true; Lesson 342 — Alpha Level Trade-offs
Field conventions: (what does your discipline expect?; Lesson 324 — Common Significance Levels: 0.05, 0.01, and 0.10
Field standards: Psychology and social sciences commonly use α = 0.; Lesson 342 — Alpha Level Trade-offs
Figure: is the entire building—the blank canvas or container that holds everything.; Lesson 1255 — The Anatomy of a Matplotlib Figure
Fill rate: % of buyer requests successfully matched; Lesson 1630 — Marketplace Metrics: GMV, Take Rate, and Liquidity
Filling gaps: Use LEAD to preview the next non-null value; Lesson 1023 — Introduction to Window Functions: LAG and LEAD
Filter: raw tables to relevant rows; Lesson 994 — CTEs for Simplifying Complex Joins Lesson 1827 — Transformation Patterns: Map, Filter, Aggregate
Filter conditions: WHERE clauses applied at different stages.; Lesson 1084 — Reading and Interpreting Query Execution Plans
Filter data: (e.; Lesson 1302 — Interactive Controls: Dropdown Menus and Buttons
Filter early: Use `WHERE` before `DISTINCT` or `ORDER BY` to reduce the data volume; Lesson 880 — Performance Considerations and Best Practices Lesson 997 — CTE Best Practices and Performance
Filtering: Writing matching rows to a new file; Lesson 1800 — Chunked Reading with read_csv
Filtering conditions: Applying `WHERE` clauses early reduces row counts before joining; Lesson 951 — Join Order and Performance
Filtering early: Place restrictive `WHERE` conditions before or with early joins; Lesson 951 — Join Order and Performance
Finance: R² = 0.; Lesson 533 — Interpreting R-Squared Values Lesson 1412 — What is Change-Point Detection?
Financial analysis: (comparing stocks with vastly different price ranges); Lesson 200 — Comparing Values Across Different Distributions
Financial conflicts: You're analyzing sales data for a product your company desperately needs to succeed.; Lesson 35 — Conflicts of Interest and Independence Lesson 1930 — Managing Conflicts of Interest
Financial costs: matter too.; Lesson 1500 — Practical Considerations and Trade-offs
Financial portfolios: Assets with different investment amounts; Lesson 43 — Weighted Mean and Its Applications
Financing decisions: Investors scrutinize this metric to assess capital efficiency; Lesson 1757 — Payback Period: Definition and Importance
Find minimal adjustment sets: the smallest set of variables that, when conditioned on, blocks all backdoor paths; Lesson 1475 — Using DAGs to Guide Analysis
Find only common rows: between queries; Lesson 998 — Introduction to Set Operations
Find shared drivers: Do two metrics both depend on the same underlying factor?; Lesson 1625 — Cross-Functional Metric Dependencies
Find the column: for the second decimal place (e.; Lesson 198 — Using Z-Tables for Probability
Find the largest i: where p ᵢ ≤ (i/m) × α; Lesson 1506 — Benjamini-Hochberg Procedure
Find the midpoint: of each group (class interval); Lesson 45 — Central Tendency for Grouped Data
Find the p-value: from the chi-squared distribution (df = 1); Lesson 436 — Conducting McNemar's Test Lesson 447 — Conducting One-Way ANOVA in Practice
Find the quantiles: of your posterior distribution that capture that probability mass; Lesson 1562 — Credible Intervals for Proportions Lesson 1575 — Computing Equal-Tailed Credible Intervals
Find the row: corresponding to the first two digits of your Z-score (e.; Lesson 198 — Using Z-Tables for Probability
Finding duplicates: Match rows where key fields are identical; Lesson 947 — Self-Joins for Comparisons Within a Table
Finding gaps: Identify products in inventory but never sold; Lesson 1002 — EXCEPT: Finding Differences
Finding Overlapping Date Ranges: Lesson 948 — Self-Joins with Inequality Conditions
Finding probabilities: P(a < X ≤ b) = F(b) - F(a); Lesson 157 — Cumulative Distribution Functions (CDFs) for Continuous Variables
Finding Sequential Records: Lesson 948 — Self-Joins with Inequality Conditions
Finding unique categories: What products do we sell?; Lesson 873 — Understanding DISTINCT: Removing Duplicate Rows
Findings: Your main insights with supporting visualizations; Lesson 1966 — Report Structure and Executive Summary
Finite Population Correction (FPC): factor adjusts for this by *shrinking* the standard error to reflect the extra precision:; Lesson 264 — Finite Population Correction
Firewall Issues: block traffic between your application and database.; Lesson 1093 — Troubleshooting Connection Issues
First batch of data: Start with Beta(2, 2) prior → observe 10 successes, 15 failures → get Beta(12, 17) posterior; Lesson 1563 — Sequential Updating with New Data
First difference: Treatment group's change = (After - Before); Lesson 1452 — The Difference-in-Differences Setup
First difference (Control group): Calculate the change in the control group over the same period: `(Y_control_after - Y_control_before)`; Lesson 1454 — Calculating the DiD Estimator
First difference (Treatment group): Calculate the change in the treatment group from before to after the intervention: `(Y_treatment_after - Y_treatment_before)`; Lesson 1454 — Calculating the DiD Estimator
First evidence (fingerprint found): Apply Bayes' Theorem → posterior becomes 60%.; Lesson 114 — Sequential Updating
First join: (LEFT): All customers appear, even those without orders (orders columns show NULL); Lesson 952 — Mixing Join Types
First meaningful action: When a user performs a core action that indicates true engagement; Lesson 1646 — Defining Cohort Start Events
First name: can reveal gender or ethnic background; Lesson 1883 — Protected Classes and Proxy Variables
First or last names: → ethnicity, religion, gender; Lesson 1889 — Proxy Variables and Redlining
First purchase: When a user converts from browser to buyer; Lesson 1646 — Defining Cohort Start Events
First Quartile (Q1): the 25th percentile; Lesson 59 — The Five-Number Summary and Box Plots Lesson 1383 — Understanding the Interquartile Range (IQR)
First touch: (30%) – The initial interaction that brings awareness; Lesson 1730 — W-Shaped Attribution Model
First touch matters: Someone discovered you somehow—that channel deserves significant credit; Lesson 1729 — Position-Based (U-Shaped) Attribution
First-Pass Yield: measures the percentage of units that pass quality checks without rework on the first attempt.; Lesson 1636 — Manufacturing Metrics: OEE, Yield, and Cycle Time
First-touch: Credit goes to the initial interaction; Lesson 1637 — What is Metric Attribution?Lesson 1724 — Limitations of Single-Touch Attribution
First-touch attribution: credits the initial discovery channel.; Lesson 1723 — Comparing Single-Touch Models Lesson 1725 — Implementing Single-Touch Attribution
Fisher's exact test: (for small samples); Lesson 419 — Assumptions and Minimum Expected Frequencies Lesson 434 — Fisher's Exact vs Chi- Squared: When to Use Each Lesson 437 — Applications: Clinical Trials and Market Research
Fisher's z-transformation: , which converts *r* into a value *z'* that *is* approximately normally distributed:; Lesson 503 — Confidence Intervals for Correlation Coefficients
Fit models: Kaplan-Meier for overall survival curves; Cox models to test effects of predictors like manufacturing date, component supplier, or usage intensity; Lesson 837 — Product Warranty and Failure Analysis
Fit the Holt-Winters model: with each combination; Lesson 772 — Holt-Winters Parameter Optimization
Fitness trackers: revealing military base locations through jogging patterns; Lesson 1922 — Surveillance and Secondary Data Uses
Fitted value (Ŷ ᵢ): = β₀ + β₁X ᵢ; Lesson 538 — What Are Fitted Values?
Fitted value (ŷ): "Here's what the model *predicts* based on the linear relationship"; Lesson 543 — Residuals as Unexplained Variation
fitted values: (group means) on the x-axis and **residuals** (observed minus predicted) on the y-axis.; Lesson 451 — Diagnostic Plots for ANOVA Lesson 538 — What Are Fitted Values?
Fix: Always use specific join conditions.; Lesson 949 — Avoiding Common Self-Join Pitfalls
Fixed aspect ratios: ensure equal spacing (crucial for maps); Lesson 1344 — Scales and Coordinate Systems
Fixed n: Known number of trials (products, patients, voters, visitors); Lesson 131 — Real-World Applications of Binomial Distributions
fixed number of trials: (n); Lesson 146 — When to Use Poisson vs Other Distributions Lesson 154 — Real-World Use Cases: Customer Behavior and Events
Fixed probability: The probability p stays the same each time; Lesson 123 — Bernoulli Trial Definition and Properties
Fixed random seeds: where randomness is involved; Lesson 1981 — What Makes a Report Reproducible?
Fixing inconsistencies: Standardizing formats (like dates, phone numbers, or categories) so everything follows the same pattern.; Lesson 12 — Data Cleaning and Preparation
Flag: Create indicator columns for "was missing"; Lesson 1207 — Missing Data Assessment and Strategy
Flag legitimate values: as outliers in skewed data; Lesson 1390 — Assumptions of Grubbs' Test
Flat (uniform) priors: over a reasonable range; Lesson 1565 — Prior Distributions for Normal Means
Flat rolling mean: = constant average (good!; Lesson 715 — Visual Tests for Stationarity
Flat rolling std: = constant variability (good!; Lesson 715 — Visual Tests for Stationarity
flexibility: to capture these non-linear patterns without abandoning regression entirely.; Lesson 657 — What Are Polynomial Features?Lesson 662 — Polynomial Features vs Splines Lesson 1816 — What is ELT? Extract, Load, Transform Explained
Flexible: Combines the strengths of multiple sampling techniques you've already learned; Lesson 238 — Multistage Sampling Lesson 1557 — The Beta-Binomial Model
Flexible ranges: allow newer patch or minor versions, enabling automatic security fixes and bug patches without manual intervention.; Lesson 2050 — Pinning Versions vs Flexible Ranges
Flipped coordinates: swap x and y axes for horizontal layouts; Lesson 1344 — Scales and Coordinate Systems
Flipping two coins: Lesson 78 — Events as Subsets of the Sample Space
FLOAT: or **DECIMAL**: Numbers with decimals (e.; Lesson 846 — Tables, Schemas, and Data Types
Focus: your conclusions on these specific relationships; Lesson 428 — Post-Hoc Analysis and Residuals Lesson 1215 — Characteristics of Explanatory Visualizations
Focus indicators: are critical: users must see *where* they are in the interface at all times.; Lesson 1253 — Interactive Accessibility: Keyboard Navigation
Focus on large-data scenarios: With abundant data, the likelihood dominates and priors matter less (robustness naturally increases); Lesson 1572 — Sensitivity Analysis and Prior Robustness
Focus on the slope: The slope still tells you about the relationship *within your data range*; Lesson 526 — When the Intercept Has No Meaning
Folium: and **Plotly** transform your geographic data into engaging web visualizations.; Lesson 1313 — Interactive Maps with Folium and Plotly
Follow ethical guidelines: established by your organization or profession; Lesson 35 — Conflicts of Interest and Independence
Follow multiple channels: academic papers for cutting-edge research, industry blogs for practical applications, documentation for tool updates, and community forums for real-world problem-solving patterns.; Lesson 2143 — Continuous Learning and Skill Development
Follow up with nonrespondents: Send reminders, offer incentives, or use different contact methods to reduce nonresponse bias.; Lesson 250 — Strategies for Bias Detection and Mitigation
Follow-up analysis: needed to refine the approach; Lesson 1970 — Recommendations and Next Steps
Follow-ups: If confirmed, design experiment to optimize marketing for that segment; Lesson 1204 — From Hypothesis to Analysis Plan
Font face: Make titles bold with `face = "bold"` or italicize annotations; Lesson 1364 — Customizing Text Elements
Font family: The typeface itself (e.; Lesson 1297 — Font Properties and Text Styling Lesson 1364 — Customizing Text Elements
Font size: Measured in points; larger for titles, smaller for tick labels; Lesson 1297 — Font Properties and Text Styling Lesson 1364 — Customizing Text Elements
Font sizing: At publication dimensions, default fonts become tiny.; Lesson 1369 — Publication-Ready Plot Styling
Font style: 'normal', 'italic', or 'oblique'; Lesson 1297 — Font Properties and Text Styling
Font weight: 'normal', 'bold', 'light', or numeric values (100-900); Lesson 1297 — Font Properties and Text Styling
Foot traffic: customers entering stores—acts as a leading indicator.; Lesson 1634 — Retail Metrics: Same-Store Sales and Inventory Turnover
For `random` module: Lesson 2057 — Setting Seeds in Python and R
For a one-sided test: at α = 0.; Lesson 326 — Critical Values
For a two-sided test: at α = 0.; Lesson 326 — Critical Values
For comparisons: between categories → use bar charts or column charts; Lesson 1230 — Choosing the Right Chart Type
For distributions: of continuous data → use histograms or box plots; Lesson 1230 — Choosing the Right Chart Type
For each p-value: , calculate its threshold: (i/m) × α, where i is its rank, m is the total number of tests, and α is your target FDR level (e.; Lesson 1506 — Benjamini-Hochberg Procedure
For executives: Lesson 1954 — Tailoring Visualizations to Audience Needs
For expensive aggregations: approximate when possible, or use `.; Lesson 1796 — Limitations and Differences from Pandas
For floats: If precision beyond ~7 significant digits isn't critical for your analysis, `float32` cuts memory in half with minimal impact on most calculations.; Lesson 1799 — Optimal Data Types and Downcasting
For index operations: use `.; Lesson 1796 — Limitations and Differences from Pandas
For lag 1: Lesson 721 — Computing ACF Values
For lag 2: Lesson 721 — Computing ACF Values
For nested models: Use the **Partial F-Test** (which you learned in lesson 623) to formally test whether the extra predictors significantly improve the model; Lesson 626 — Nested vs Non-Nested Models
For non-nested models: Compare using **Adjusted R-Squared**, **AIC**, or **BIC** — but you cannot use the Partial F- Test; Lesson 626 — Nested vs Non-Nested Models
For part-to-whole composition: → use stacked bar charts (or reluctantly, pie charts); Lesson 1230 — Choosing the Right Chart Type
For performance: `UNION ALL` skips the expensive duplicate-checking step, making it significantly faster on large datasets; Lesson 1000 — UNION ALL: Preserving Duplicates
For positive values: It applies a transformation similar to Box-Cox; Lesson 215 — Yeo-Johnson Transformation
For relationships: between two numeric variables → use scatter plots; Lesson 1230 — Choosing the Right Chart Type
For sorting: minimize sorts or do them after filtering to smaller datasets.; Lesson 1796 — Limitations and Differences from Pandas
For strings: The `categorical` dtype stores each unique value once plus integer codes—massive savings when cardinality is low relative to row count.; Lesson 1799 — Optimal Data Types and Downcasting
For technical stakeholders: Lesson 1954 — Tailoring Visualizations to Audience Needs
For the intercept (b₀): Lesson 518 — Deriving the Least Squares Estimators
For the slope (b₁): Lesson 518 — Deriving the Least Squares Estimators
For three variables: → use bubble charts or heatmaps; Lesson 1230 — Choosing the Right Chart Type
For trends over time: → use line charts; Lesson 1230 — Choosing the Right Chart Type
Forecast accurately: Build models on the stable, adjusted series; Lesson 748 — Seasonally Adjusted Data
Forecast ahead: Generate predictions for the held-out period; Lesson 790 — Out-of-Sample Forecast Evaluation
Forecast future churn: more accurately using cohort-specific curves; Lesson 1672 — Cohort-Based Churn Analysis
Forecast future values: based on historical direction; Lesson 706 — Trend: Long-Term Direction
Forecast(t): is the previous forecast; Lesson 758 — Simple Exponential Smoothing (SES)
forecasting: because they only use information available *at that moment*.; Lesson 753 — Centered vs Trailing Moving Averages Lesson 1571 — Posterior Predictive Distribution for New Data
Foreign Key: A column that references the primary key in another table (like `customer_id` in an orders table); Lesson 843 — Relational Database Concepts Lesson 921 — Primary and Foreign Key Relationships Lesson 1051 — Introduction to Foreign Keys
foreign keys: (concepts you've already learned).; Lesson 1061 — Introduction to Normalization Lesson 1148 — Handling Multiple Types in One Table Lesson 1808 — Star Schema and Fact Tables
Form: Is the relationship linear (straight-line pattern) or curved?; Lesson 480 — Scatterplots and Visual Assessment
Form a Hypothesis: Lesson 25 — The Scientific Method in Data Science
Formal definition: A **sampling distribution** is the probability distribution of a statistic (like the mean, median, or proportion) computed from *all possible samples* of a fixed size drawn from the same population.; Lesson 251 — What is a Sampling Distribution?
formal hypothesis test: that helps you determine whether your data is normally distributed.; Lesson 205 — Shapiro-Wilk Test Lesson 1389 — What is Grubbs' Test?
Formal reviews: Monthly presentations for key milestones and decision points; Lesson 2104 — Communication Cadence and Updates
Formal test second: Does it confirm major concerns, or is it just picking up minor noise?; Lesson 570 — Q-Q Plots vs Formal Normality Tests: When Visual Checks Matter
Format errors: Malformed email addresses or phone numbers; Lesson 1109 — Input Validation and Defense in Depth
Format expectations: You assume dates come in one format, numeric codes have certain meanings, or null values are handled consistently—until they're not.; Lesson 2133 — Undocumented Data Dependencies
Format inconsistencies: Dates in different formats, mixed capitalizations; Lesson 1150 — What is Data Validation?
Format selection: Vector formats (PDF, SVG) scale perfectly for print; PNG works for web at 300+ DPI.; Lesson 1369 — Publication-Ready Plot Styling
Formula: Lesson 1383 — Understanding the Interquartile Range (IQR)Lesson 1451 — Estimating Treatment Effects from Matched Samples Lesson 1627 — E-commerce Metrics: AOV, Cart Abandonment, and RPV Lesson 1890 — Measuring Disparate Impact
Formula connection: Lesson 182 — Special Cases: Exponential and Chi-Squared
Formula structure: Lesson 284 — Confidence Intervals for the Difference Between Two Means
Foundation (base layer): Your data and aesthetic mappings using `ggplot()`; Lesson 1347 — Understanding Layers in ggplot2
Foundation for inference: Understanding the sampling distribution lets us say things like "we're 95% confident the true population mean is between X and Y"—which we'll explore in future lessons.; Lesson 251 — What is a Sampling Distribution?
Four columns: → you see where this goes!; Lesson 911 — Performance Considerations with Multiple Groups
Fragmentation: occurs when data pages become scattered physically on disk due to inserts, updates, and deletes.; Lesson 1086 — Index Maintenance and Monitoring
frame: the subset of rows within your partition that the function operates on.; Lesson 1015 — ROWS vs RANGE Frame Specifications Lesson 1016 — Cumulative Sums and Running Totals
Frame count or data: how many frames to generate; Lesson 1327 — Creating Animations with FuncAnimation
Framing the technical problem: Is this supervised learning?; Lesson 2085 — Stage 1: Problem Definition and Scoping
Frequency: How often do they purchase?; Lesson 1703 — RFM Analysis: Recency, Frequency, Monetary Value
Frequency order (descending): Best for spotting the most/least common categories at a glance—this is usually recommended for EDA; Lesson 1178 — Bar Charts for Categorical Data
Frequentist: "If we ran this test repeatedly, 95% of intervals constructed this way would capture the true rate.; Lesson 1564 — Comparing Bayesian and Frequentist Proportion Inference
Frequentist A/B testing: treats the true conversion rate as a fixed (but unknown) parameter.; Lesson 1580 — Bayesian vs Frequentist A/B Testing
Frequentist interpretation: treats probability as a **long-run frequency**.; Lesson 1540 — Comparing Bayesian and Frequentist Interpretations
Frequently accessed lookup values: Product categories, customer names, or status labels that rarely change but are queried constantly.; Lesson 1074 — Duplicating Data Across Tables
Freshness: Maximum age of data (e.; Lesson 1869 — Data Quality Metrics and SLAs
Friction zone: 2-10 GB datasets may work but cause slowdowns and memory pressure; Lesson 1783 — Data Size Thresholds: When Pandas Isn't Enough
Friedman test: .; Lesson 474 — Friedman Test: Non-Parametric Repeated Measures ANOVA
FROM: SQL identifies which table(s) to use; Lesson 896 — GROUP BY Execution Order Lesson 912 — Fundamental Difference: Filter Timing
From domain expertise: If historical data suggests 100 conversions from 500 trials, you could use α = 100, β = 400 as an informative prior that reflects actual experience.; Lesson 1558 — Choosing Informative Priors for Proportions
From percentile to z-score: Work backwards.; Lesson 199 — Finding Percentiles with Z-Scores
FROM subqueries: are embedded inline:; Lesson 974 — When to Use FROM Subqueries vs CTEs
FROM table1: Your starting table (often called the "left" table); Lesson 919 — Basic INNER JOIN Syntax
From z-score to percentile: Look up your z-score in a z-table.; Lesson 199 — Finding Percentiles with Z-Scores
Full model: Adds more predictors (e.; Lesson 623 — Partial F-Tests for Nested Models Lesson 654 — Testing Interaction Significance Lesson 684 — Likelihood Ratio Tests for Model Comparison
FULL OUTER JOIN: (also called FULL JOIN) returns all rows from both tables, regardless of whether there's a matching row in the other table.; Lesson 935 — What is a FULL OUTER JOIN?Lesson 936 — FULL OUTER JOIN Syntax
Funnel analysis: is a method that visualizes and measures how users move through a defined sequence of steps (a "funnel") toward completing a desired action—like making a purchase, signing up, or subscribing.; Lesson 1678 — What is Funnel Analysis?
Funnel or cone shape: Variance increases or decreases with predictions; Lesson 560 — Scale-Location Plot (Spread-Location Plot)
Funnel shapes: Variance changes as predictions increase (violates homoscedasticity); Lesson 556 — What Are Residuals and Why Plot Them?
Future-proofing: You can resurrect old projects years later; Lesson 2047 — What is Dependency Management?
Fuzzy: A job training program *offered* to all unemployed workers over age 55.; Lesson 1461 — Sharp vs Fuzzy RDD
FWER: = 1 - (1 - α)^m; Lesson 1502 — Family-Wise Error Rate (FWER)

G

Games-Howell test: comes in.; Lesson 461 — Games-Howell Test for Unequal Variances
Gaming the System: A credit scoring model could be reverse-engineered by fraudsters who game input features to appear creditworthy.; Lesson 1920 — Anticipating Misuse of Data Products
Gamma: (for positive continuous values); Lesson 664 — What is the Exponential Family of Distributions?Lesson 669 — The Dispersion Parameter φ Lesson 676 — Canonical vs Non-Canonical Links Lesson 769 — Smoothing Parameters: Alpha, Beta, Gamma
Gamma distribution: is a continuous probability distribution that describes positive real numbers (values greater than zero).; Lesson 181 — Gamma Distribution: Shape and Rate Parameters Lesson 1552 — Gamma-Poisson Conjugacy
Gamma-Poisson: conjugacy:; Lesson 1554 — Updating Conjugate Priors with Data
Gap analysis: Find when values changed from the prior period; Lesson 1024 — LAG Function: Accessing Previous Row Values
gaps: (empty bins) that might signal data collection issues or natural separations, and **outliers** (isolated bars far from the main cluster).; Lesson 1175 — Histograms for Distribution Shape Lesson 1220 — Histograms for Continuous Distributions
Gaussian mechanism: adds noise from a normal (Gaussian) distribution.; Lesson 1899 — Adding Noise for Privacy
GDPR principles: , your organization's **conflicts of interest** policy, or industry standards rather than personal objections.; Lesson 1931 — When to Push Back on Requests
Gender: (Male vs Female) on recovery time.; Lesson 653 — Interpreting Categorical × Categorical Interactions
Gender and sex: Lesson 1888 — Protected Classes and Sensitive Attributes
General Multiplication Rule: , and it works for *any* two events—dependent or independent.; Lesson 88 — General Multiplication Rule
Generalization: replaces specific values with broader categories: exact ages become age ranges (25-30), precise locations become regions, exact salaries become brackets.; Lesson 1895 — Data Anonymization Basics Lesson 1896 — K-Anonymity
Generalized Linear Models: that handle non-normal outcomes.; Lesson 664 — What is the Exponential Family of Distributions?
Generalized Linear Models (GLMs): , which extend regression to non-normal outcomes.; Lesson 668 — Common Distributions as Exponential Family Members
Generate a randomization: using your chosen method; Lesson 1492 — Rerandomization and Practical Implementation
Generate replicated data: For each posterior sample of parameters, simulate a new dataset; Lesson 1596 — Posterior Predictive Checks and Model Comparison
Generate scenarios: "What if budget increases 20%?; Lesson 1742 — Budget Optimization Using MMM
Generate the file automatically: from your current environment; Lesson 1987 — Environment and Dependency Management
Generated outputs: model binaries (`.; Lesson 1996 — The .gitignore File
Genetics: You expect a 9:3:3:1 ratio of phenotypes.; Lesson 414 — Introduction to Chi-Squared Goodness of Fit Test
GeoDataFrame: like a pandas DataFrame, but with a special `geometry` column containing the actual shapes.; Lesson 1311 — Working with Shapefiles and GeoJSON
Geographic limitations: Missing homeless populations or remote areas; Lesson 249 — Coverage Error and Undercoverage
Geographic regions: (`country`, `region`) — when analysis is region-specific; Lesson 1812 — Partitioning and Clustering Strategies
Geography: Country, region, timezone; Lesson 1682 — Segmenting Funnels by User Attributes
GeoJSON: is a newer, web-friendly format built on JSON.; Lesson 1311 — Working with Shapefiles and GeoJSON
Geometric: "How many random calls until I reach someone who owns an electric vehicle?; Lesson 138 — Real-World Applications: Quality Control and Surveys Lesson 154 — Real-World Use Cases: Customer Behavior and Events
geometric distribution: tells you the probability of waiting exactly *k* tries before your first success happens.; Lesson 132 — The Geometric Distribution: Waiting for the First Success Lesson 137 — Geometric vs Negative Binomial: Key Differences Lesson 138 — Real-World Applications: Quality Control and Surveys Lesson 154 — Real-World Use Cases: Customer Behavior and Events
Geometric objects (geoms): The actual visual marks—points, lines, bars, polygons—that represent your data; Lesson 1339 — What is the Grammar of Graphics?Lesson 1342 — Geometric Objects (geoms)
Geometries (geom): The visual marks representing data (points, lines, bars, boxes); Lesson 1340 — The Seven Layers of Grammar
ggplot2: ships with a distinctive gray background with white gridlines—a deliberate choice to reduce visual clutter while maintaining reference lines.; Lesson 1371 — Default Aesthetics and Design Choices Lesson 1373 — Statistical Transformations: Built-in vs Manual
Ghost ads: and **PSA (Public Service Announcement) tests** are incrementality testing techniques where you show *neutral content* instead of your real ads to a control group.; Lesson 1747 — Ghost Ads and PSA Tests
Git: tracks changes to code, including transformation scripts.; Lesson 1164 — Tools for Lineage Tracking Lesson 1990 — What is Version Control and Why Git?
Global F-test: Asks "Does *at least one* predictor help explain the outcome?; Lesson 622 — Relationship Between F-Test and t-Tests
Go deeper: when you need to diagnose problems in a specific area (e.; Lesson 1623 — Depth vs Breadth in Metric Trees
Go wider: when you need comprehensive coverage (e.; Lesson 1623 — Depth vs Breadth in Metric Trees
Goal: fit the line `y = a + b*time`, then subtract it; Lesson 738 — Linear Detrending Lesson 1400 — Control Limits vs Specification Limits
Goals or targets: "We've achieved 85% of our annual target"; Lesson 1962 — Contextualizing Numbers
gold standard: for creating representative samples.; Lesson 234 — Simple Random Sampling Lesson 1442 — Limitations and Practical Constraints
Good: Random cloud of points scattered evenly around the horizontal line at y=0, with consistent spread; Lesson 557 — The Residuals vs Fitted Values Plot Lesson 562 — Index Plots and Time-Ordered Residuals Lesson 1857 — Logging Best Practices
Good (pyramid): "Reducing churn requires increasing early-user engagement.; Lesson 1942 — The Pyramid Principle: Starting with the Conclusion
Good example: "Bar chart comparing Q4 sales across five regions.; Lesson 1250 — Text Alternatives and Screen Reader Compatibility
Good hypothesis: "Changing the checkout button from blue to green will increase conversion rate by at least 2 percentage points.; Lesson 1479 — Formulating Hypotheses
Good independence: Lesson 448 — Independence of Observations
Goodhart's Law: *"When a measure becomes a target, it ceases to be a good measure.; Lesson 1521 — Risks of Optimizing for Surrogates
Goodness-of-fit: How well the model explains the data (measured via likelihood); Lesson 629 — Akaike Information Criterion (AIC)Lesson 700 — AIC and BIC for Model Selection
Goodness-of-fit tests: Formal tests compare observed versus expected frequencies.; Lesson 693 — Overdispersion in Count Data
Governance and Approval: Lesson 1643 — Building Attribution Frameworks
GPL: Requires derivative works to also be open source (more restrictive); Lesson 2082 — Choosing a License for Data Science Projects
graceful degradation: the pipeline remains healthy and productive while you handle edge cases systematically.; Lesson 1852 — Dead Letter Queues Lesson 1854 — Testing Error Handling
Grafana: , and **Datadog** automate this process, offering dashboards that show pipeline status at a glance and trigger alerts when thresholds are breached.; Lesson 1861 — Monitoring Tools and Dashboards
Grammar of Graphics: is a systematic approach to creating visualizations by combining independent building blocks, rather than selecting from a fixed menu of chart types.; Lesson 1339 — What is the Grammar of Graphics?
Graph: Visual representation of task relationships; Lesson 1833 — Introduction to Apache Airflow
Graph algorithms: Computing connected components or PageRank involves recursive traversals; Lesson 1784 — Computation Complexity: Beyond Data Size
Great Expectations: is the leading Python library for this purpose.; Lesson 1158 — Automated Validation Frameworks Lesson 1164 — Tools for Lineage Tracking
Greater sensitivity to effects: With less noise in your estimate, even small differences between your null hypothesis and reality become detectable.; Lesson 340 — Power and Sample Size Relationship
greater than: a certain value using cumulative distribution functions.; Lesson 143 — Cumulative Poisson Probabilities Lesson 857 — Comparison Operators: Greater and Less Than
Greenwood's formula: gives us the standard error (SE) of the Kaplan-Meier estimator at any time t.; Lesson 814 — Standard Errors and Confidence Intervals
GridSpec: lets you treat your figure like a flexible grid where subplots can span multiple cells, much like merging cells in a spreadsheet.; Lesson 1278 — GridSpec for Complex Layouts
Gross Profit Margin: Revenue minus cost of goods sold; Lesson 1516 — Business Metrics: Definition and Examples
Group A: Values tightly clustered around 50 (median = 50); Lesson 394 — Interpreting Rank-Based Tests: Medians vs Distributions
Group B: Values spread widely from 20 to 80 (median = 50); Lesson 394 — Interpreting Rank-Based Tests: Medians vs Distributions
GROUP BY: to create rich summaries of grouped data.; Lesson 892 — GROUP BY with Different Aggregate Functions Lesson 896 — GROUP BY Execution Order Lesson 903 — Combining WHERE and HAVING Lesson 912 — Fundamental Difference: Filter Timing
GROUP BY groups: The remaining rows are organized into groups; Lesson 915 — Combining WHERE and HAVING
Group customers into cohorts: by their acquisition date (e.; Lesson 1664 — Cohort-Based LTV Calculation Lesson 1758 — Cohort-Based Payback Analysis
Group your data: by the categorical variable (e.; Lesson 1185 — Grouped Summary Statistics
Grouped (side-by-side) bars: excel when you want to compare specific values across categories.; Lesson 1188 — Stacked and Grouped Bar Charts
Grouped analyses: compare correlation coefficients or slopes between subgroups; Lesson 1195 — Interaction Effects Between Variables
Grouped bar charts: place bars side-by-side for easy direct comparison between groups.; Lesson 1188 — Stacked and Grouped Bar Charts
grouped bars: when precise, side-by-side comparison of subcategories is your priority.; Lesson 1226 — Stacked and Grouped Bar Charts Lesson 1266 — Bar Plots: Categorical Comparisons
Growth stage startups: often tolerate higher CAC and longer payback because they're prioritizing market capture.; Lesson 1759 — Optimizing ROAS, CAC, and Payback Together
Grubbs' test tables: (organized by sample size and α level) or calculate them using formulas involving the t- distribution.; Lesson 1392 — Critical Values and Significance Testing
guardrails: are defensive metrics designed to catch these problems before they damage your business.; Lesson 1624 — Counter-Metrics and Guardrails Lesson 1925 — Mitigation Strategies and Responsible Disclosure
Guide onboarding: Focus new users on high-value features first; Lesson 1696 — Feature Adoption and Usage Frequency
Guidelines: Lesson 1623 — Depth vs Breadth in Metric Trees

H

H ₐ: At least one method produces a different average score; Lesson 439 — ANOVA Hypotheses and Research Questions
H statistic: that measures how much the rank sums vary between groups.; Lesson 471 — Kruskal-Wallis H Test: The Non-Parametric One-Way ANOVA
H₀: μ = some specific value; Lesson 310 — Writing Hypotheses for Different Parameters Lesson 347 — One-Tailed Tests: Testing for a Specific Direction Lesson 354 — Setting Up Hypotheses for One-Sample t-Test Lesson 358 — Worked Example: One-Sample t-Test in Practice Lesson 363 — Testing Equality of Variances Lesson 415 — Setting Up Hypotheses for Goodness of Fit Lesson 439 — ANOVA Hypotheses and Research Questions Lesson 447 — Conducting One-Way ANOVA in Practice (+1 more)
H₀ (Null Hypothesis): All groups have equal variances; Lesson 380 — Testing Equal Variances: Levene's and Bartlett's Tests
H₀: p = 0.10: (the claim is correct); Lesson 401 — Setting Up Hypotheses for Proportions
H₀: μ_d = 0: Lesson 373 — Hypotheses for Paired t-Tests
H₁: μ ≠ some value (two-sided) *or* μ > value *or* μ < value (one-sided); Lesson 310 — Writing Hypotheses for Different Parameters Lesson 347 — One-Tailed Tests: Testing for a Specific Direction Lesson 354 — Setting Up Hypotheses for One-Sample t-Test Lesson 358 — Worked Example: One-Sample t-Test in Practice Lesson 363 — Testing Equality of Variances Lesson 415 — Setting Up Hypotheses for Goodness of Fit Lesson 447 — Conducting One-Way ANOVA in Practice Lesson 1511 — Sequential Probability Ratio Test (SPRT)
H₁ (Alternative Hypothesis): At least one group has a different variance; Lesson 380 — Testing Equal Variances: Levene's and Bartlett's Tests
H₁: p > 0.10: (nausea rate exceeds the claim); Lesson 401 — Setting Up Hypotheses for Proportions
H₁: μ_d > 0: (training increases scores on average); Lesson 373 — Hypotheses for Paired t-Tests
Hadoop MapReduce: was the original distributed processing engine—breaking jobs into "map" (process chunks independently) and "reduce" (combine results) phases.; Lesson 1764 — The Big Data Technology Landscape
Halt (Fail Fast): Lesson 1866 — Handling Failed Quality Checks
Hamiltonian Monte Carlo (HMC): borrows physics concepts to guide sampling intelligently.; Lesson 1593 — Hamiltonian Monte Carlo and NUTS
Handle edge cases explicitly: New scenario?; Lesson 2128 — Data Distribution Shifts Frequently
Handle staggered timing: Each unit's treatment effect is estimated relative to its own adoption date; Lesson 1457 — Multiple Time Periods and Staggered Adoption
Handling missing values: Deciding what to do when data points are absent—should you fill them in, remove those rows, or use another strategy?; Lesson 12 — Data Cleaning and Preparation
Hansen J-test: Lesson 1467 — Testing Instrument Strength and Validity
Hard to enforce rules: You can't easily prevent invalid combinations (like a refund with no related sale); Lesson 1148 — Handling Multiple Types in One Table
Harder interpretation: when you're drowning in similar variables; Lesson 1197 — Identifying Variable Importance and Redundancy
Harder to judge accurately: this is why pie charts are often criticized.; Lesson 1231 — Channels of Visual Encoding
HARKing: (Hypothesizing After Results are Known), where you retrofit explanations to unexpected patterns.; Lesson 1485 — Documentation and Pre-Registration
Hash joins: excel with large tables that fit in memory—they're fast but require space to build the hash structure.; Lesson 957 — Join Strategies: Nested Loop, Hash, Merge
HAVING: Filters groups after aggregation; Lesson 898 — HAVING Clause Fundamentals Lesson 899 — HAVING vs WHERE: Key Differences Lesson 903 — Combining WHERE and HAVING Lesson 912 — Fundamental Difference: Filter Timing
HAVING filters last: It removes entire groups based on their aggregated values; Lesson 915 — Combining WHERE and HAVING
HDI: always includes the most probable values—the densest region.; Lesson 1577 — When HDI and Equal-Tailed Intervals Differ Lesson 1579 — Practical Computation of Credible Intervals
HEAD: points to your current branch; Lesson 1992 — Creating a Repository with git init
Header row: The first line often contains column names (`name`, `age`, `city`); Lesson 1125 — CSV Files: Structure and Common Issues
Health monitoring: Disease onset or treatment effectiveness; Lesson 1412 — What is Change-Point Detection?
Health standards: Is the average blood pressure of patients in a clinic different from the national average of 120 mmHg?; Lesson 351 — When to Use a One-Sample t-Test
Health studies: Volunteers may be more health-conscious than average; Lesson 246 — Volunteer and Self-Selection Bias
Healthcare: Predicting disease outbreaks or personalizing treatment plans; Lesson 6 — Common Data Science Applications
Heatmaps: solve this by color-coding correlation strengths:; Lesson 510 — Correlation Matrices: Construction and Display Lesson 1192 — Correlation Matrices and Heatmaps
heavier tails: meaning more probability in the extremes.; Lesson 268 — Critical Values and the t-Distribution Lesson 352 — The t-Distribution and Degrees of Freedom
Heavy grid lines: that compete with your data points; Lesson 1963 — Removing Chartjunk
Heavy gridlines: Use subtle, minimal guides only when necessary; Lesson 1237 — Chart Junk and Data-Ink Ratio
heavy tails: , meaning a small number of extreme values dominate the total.; Lesson 191 — Pareto Principle and the 80/20 Rule Lesson 567 — Common Q-Q Plot Patterns: Heavy Tails and Light Tails
Heavy-tailed distributions: (with extreme outliers): even larger samples required; Lesson 220 — Sample Size Requirements for the CLT Lesson 1379 — Assumptions and Limitations
Hedging: protects against channel-specific risks (platform bans, seasonal dips); Lesson 1716 — Channel Mix and Portfolio Thinking
Height: Taller bars (positive or negative) indicate stronger correlation; Lesson 722 — ACF Plots and Interpretation
Height and Weight: If you're predicting adult weight from height, the intercept represents the predicted weight when height = 0 inches.; Lesson 526 — When the Intercept Has No Meaning
Heroku: is a general-purpose cloud platform that works with both Streamlit and Dash.; Lesson 1338 — Deployment and Sharing Dashboards
heteroscedasticity: (non-constant variance) — a violation of this assumption.; Lesson 549 — Homoscedasticity: Constant Variance of Residuals Lesson 559 — Detecting Heteroscedasticity (Non-Constant Variance)Lesson 591 — When and Why to Transform Variables
Hidden randomness: Random processes without fixed seeds produce varying results (Random Seeds); Lesson 30 — The Reproducibility Crisis and Solutions
Hidden subgroups: Averaging diverse populations together (Simpson's Paradox territory); Lesson 1245 — Misleading Aggregations and Binning
Hide cyclicality: Show only the upswing of a seasonal pattern while ignoring the inevitable downturn; Lesson 1241 — Cherry-Picking Time Ranges
Hide technical depth strategically: Methodology, statistical tests, and data quality checks belong in appendices or backup slides (lesson 1949).; Lesson 1965 — Progressive Disclosure Techniques
Hiding data: Adding a dense geom (like `geom_ribbon()`) last can hide points underneath; Lesson 1355 — Layer Order and Plot Composition
Hiding trade-offs: Your values might mask important considerations others prioritize differently; Lesson 1927 — Separating Analysis from Advocacy
Hierarchical: Clear parent-child or directed flow relationships; Lesson 1318 — Network Layout Algorithms
Hierarchical relationships: If you have `city`, `state`, and `country` columns, does "Boston" really belong to "Texas" or "Canada"?; Lesson 1155 — Consistency Checks Across Fields
High correlations between predictors: (e.; Lesson 513 — Applications: Feature Selection and Multicollinearity
High influence: = actually changes the fitted model (unusual X *and* unusual Y given that X); Lesson 574 — Influence: Impact on Fitted Model
high leverage: not because their score is unusual, but because their study time is far from the typical range.; Lesson 572 — Leverage: Distance in X-Space Lesson 574 — Influence: Impact on Fitted Model
High missingness: (>50%): Variable may be unusable; Lesson 1179 — Identifying Missing Values Patterns
High noise-to-signal ratio: When errors dominate true patterns, models learn randomness instead of relationships.; Lesson 2124 — Insufficient or Low-Quality Data
High p-value (> 0.05): Good news!; Lesson 787 — Ljung-Box Test for Residual Autocorrelation
High p-value (≥ α): The observed frequencies are reasonably close to expected frequencies.; Lesson 420 — Interpreting Chi-Squared Test Results
High power: to detect non-normality makes it a go-to choice.; Lesson 378 — Testing Normality: Statistical Tests
High-resolution export: in required format; Lesson 1369 — Publication-Ready Plot Styling
High-risk periods: When churn spikes (e.; Lesson 835 — Customer Churn Prediction with Survival Analysis
High-risk, engaged recently: In-product interventions (tooltips, feature prompts) based on usage gaps; Lesson 1676 — Win-Back and Retention Strategies
High-risk, high-value: Personalized outreach, account manager check-ins, or special loyalty offers; Lesson 1676 — Win-Back and Retention Strategies
Higher adjusted R-squared: suggests a better balance of fit and simplicity; Lesson 615 — Comparing Models with Adjusted R-Squared
Higher alpha: (e.; Lesson 1409 — Setting Detection Parameters
Higher alpha (0.10): Like a more-sensitive detector.; Lesson 334 — Setting Alpha: Choosing Your Significance Level
Higher confidence (e.g., 99%): More reliable method, but wider intervals; Lesson 267 — Interpreting Confidence Levels
Higher confidence level: → Wider margin (you cast a wider net to be more certain); Lesson 294 — Margin of Error and Its Components
Higher evidence bar: – Only stronger signals will be deemed "significant"; Lesson 342 — Alpha Level Trade-offs
Higher variance/standard deviation: = outcomes are more spread out, more unpredictable; Lesson 148 — Variance and Standard Deviation of Discrete Distributions
Higher λ: means events happen more frequently → shorter waiting times; Lesson 164 — The Exponential Distribution
Highest Density Interval (HDI): takes a smarter approach: it finds the *shortest possible* interval that still contains your desired probability mass (say, 95%).; Lesson 1576 — Highest Density Intervals (HDI)
Highlight, don't decorate: Use bold or saturated colors for the 1-3 most important data points you want your audience to notice first.; Lesson 1961 — Color as Communication Tool
Highly Objective: Lesson 1598 — Characteristics of Lagging Indicators
Highly skewed distributions: (like income data): you may need n = 50, 100, or more; Lesson 220 — Sample Size Requirements for the CLT
Hill function: (also called logistic or S-curve):; Lesson 1740 — Saturation Curves and Diminishing Returns
histogram: shows the frequency of values in bins.; Lesson 377 — Testing Normality: Visual Methods Lesson 788 — Checking Residual Normality Lesson 1267 — Histograms and Distribution Plots
Histogram of residuals: Should look roughly bell-shaped; Lesson 449 — Normality of Residuals
Histograms: divide your data into bins and show the frequency of observations in each bin as bars.; Lesson 203 — Visual Assessment: Histograms and Density Plots Lesson 290 — Assumptions and Diagnostics for Difference Intervals Lesson 1208 — Distribution Checks for All Variables Lesson 1343 — Statistical Transformations
Historical Data: Lesson 297 — Handling Unknown Population Parameters Lesson 1534 — The Prior Distribution
Historical patterns: If a job normally takes 10-15 minutes, alert at 30+ minutes, not 16; Lesson 1858 — Alerting Strategies Lesson 1878 — What is Bias in Data?
Historical performance: "This is our best quarter in three years"; Lesson 1962 — Contextualizing Numbers
Historical snapshots: Copying current product prices into order records preserves what the customer actually paid, even if prices change later.; Lesson 1074 — Duplicating Data Across Tables
Historical Trends: Show how values change over time.; Lesson 1939 — Context and Comparison: Making Numbers Meaningful
Holm-Bonferroni: (also called "step-down Bonferroni") is a sequential method:; Lesson 459 — Holm-Bonferroni and Šidák Methods Lesson 512 — Testing Significance in Correlation Matrices Lesson 1507 — Multiple Testing in A/B Test Variations
Holt-Winters exponential smoothing: comes in.; Lesson 765 — Introduction to Holt-Winters Method
Holt-Winters Multiplicative Model: is designed for time series where seasonal fluctuations change in size as the overall level of the series changes.; Lesson 768 — Holt-Winters Multiplicative Model
Holt's Method: adds a second equation to track the trend separately.; Lesson 761 — Double Exponential Smoothing (Holt's Method)
Homepage Visit: (entry point); Lesson 1679 — Defining Funnel Steps and Events
homoscedasticity: (homo = same, scedasticity = scatter).; Lesson 379 — The Assumption of Equal Variances (Homoscedasticity)Lesson 450 — Homogeneity of Variance (Homoscedasticity)Lesson 546 — The Five Core Assumptions of Linear Regression Lesson 557 — The Residuals vs Fitted Values Plot Lesson 601 — Assumptions for Multiple Linear Regression Lesson 782 — Residual Diagnostics for ARIMA
Horizontal patterns: One cohort performing differently across all periods indicates something unique about that acquisition group; Lesson 1649 — Visualizing Cohort Data with Heatmaps
Horizontal scaling (scale-out): means distributing work across multiple machines working in parallel.; Lesson 1767 — Scale-Up vs Scale-Out Architectures
Horizontal trend line: Good news—variance is roughly constant (homoscedasticity); Lesson 560 — Scale-Location Plot (Spread-Location Plot)
Hospital studies: Disease severity and access to care both lead to hospitalization.; Lesson 1473 — Conditioning on Colliders: Selection Bias
Hourly data: with daily seasonality → period = 24; Lesson 746 — Choosing Seasonal Period
Hover tooltips: displaying data values when you move your mouse over points; Lesson 1300 — Creating Basic Interactive Charts with Plotly Express
how: to calculate it, the critical question becomes: *what does the number actually mean?; Lesson 533 — Interpreting R-Squared Values Lesson 920 — Understanding Join Conditions with ON Lesson 1346 — The Grammar vs Traditional Plotting Lesson 1830 — Documentation and Metadata Management Lesson 1850 — Retry Strategies Lesson 2023 — Creating a Pull Request
How it works: Lesson 1457 — Multiple Time Periods and Staggered Adoption Lesson 1828 — Incremental vs Full Load Strategies
How long: you'll retain it; Lesson 1908 — Data Subject Access Requests (DSARs)
How many extra parameters: you added (degrees of freedom cost); Lesson 627 — The F-Test for Model Comparison
How much: is missing per column?; Lesson 1207 — Missing Data Assessment and Strategy Lesson 2137 — Refactoring Strategies and Debt Paydown
How much better: the full model fits the data (lower RSS—residual sum of squares); Lesson 627 — The F-Test for Model Comparison
How strongly: it would need to relate to the treatment (exposure); Lesson 1434 — Sensitivity Analysis for Confounding
How to check: Lesson 374 — Assumptions of the Paired t-Test Lesson 552 — Zero Conditional Mean of Errors
How to report bugs: Where should users file issues?; Lesson 2083 — Contributing Guidelines and Contact Information
How to suggest features: Is there a template or discussion forum?; Lesson 2083 — Contributing Guidelines and Contact Information
HR < 1: Decreased hazard (protective effect).; Lesson 827 — Hazard Ratios and Interpretation
HR = 1: No effect.; Lesson 827 — Hazard Ratios and Interpretation
HR > 1: Increased hazard.; Lesson 827 — Hazard Ratios and Interpretation
HubSpot: Weekly active teams using the platform — activation and ongoing engagement signal product- market fit.; Lesson 1606 — Examples of North Star Metrics by Industry
Hue: is what we typically call "color": red, green, blue, purple, etc.; Lesson 1234 — Color: Hue, Saturation, and Luminance Lesson 1238 — Matching Encoding to Data Type
Human-readability: CSV > JSON > Excel > Parquet/Feather; Lesson 1133 — Performance Considerations Across Formats
Human-readable units: Same units as your original data; Lesson 49 — Standard Deviation: Interpretable Spread
Hybrid approach: – handling both global and local anomalies in one framework; Lesson 1405 — What is Seasonal Hybrid ESD?
Hypotheses: Lesson 471 — Kruskal-Wallis H Test: The Non-Parametric One-Way ANOVA Lesson 787 — Ljung-Box Test for Residual Autocorrelation Lesson 1508 — Pre-Registration and Correction Strategy
Hypothesis: The specific change and expected directional effect; Lesson 1485 — Documentation and Pre-Registration
Hypothesis Testing: Z-scores help us ask "Is this result surprising?; Lesson 201 — Z-Score Applications and Limitations
Hypothesize: Based on funnel analysis, identify a bottleneck; Lesson 1692 — Statistical Significance and Iteration

I

I (Integrated) - d: Lesson 773 — Introduction to ARIMA: Components and Notation
I Chart (Individuals Chart): Plots each single measurement and tracks whether the process mean is stable.; Lesson 1404 — Control Charts for Individual Observations
I-MR charts: (Individual and Moving Range charts) come in.; Lesson 1404 — Control Charts for Individual Observations
idempotency: so rerunning doesn't corrupt data, **checkpointing** to resume mid-pipeline, and **monitoring/alerts** for quick detection.; Lesson 1825 — Designing Pipeline Architecture Lesson 1847 — What is Idempotency?Lesson 1850 — Retry Strategies Lesson 1853 — Partial Failure Recovery
Identify: cells with residuals beyond ±2 (moderate) or ±3 (strong); Lesson 428 — Post-Hoc Analysis and Residuals
Identify "flattening": When curves level off, you've found your core retained users—the ones likely to stick around long- term.; Lesson 1656 — Visualizing Retention Curves
Identify all relevant periods: hourly (24), daily (7), weekly, etc.; Lesson 1408 — Handling Multiple Seasonal Periods
Identify all systems: holding that person's data (data lineage helps here!; Lesson 1909 — Right to Erasure and Data Retention Policies
Identify backdoor paths: between treatment and outcome; Lesson 1475 — Using DAGs to Guide Analysis
Identify conflicts: Run `git status` to see which files have conflicts (marked as "both modified"); Lesson 2018 — Resolving Conflicts During Rebase
Identify core value actions: What behaviors indicate someone is getting value?; Lesson 1693 — Defining User Engagement
Identify data sources: Where should each variable come from?; Lesson 2098 — Identifying Data Availability Gaps Early
Identify direct causal relationships: (does X directly cause Y?; Lesson 1469 — Building a Simple Causal DAG
Identify direct links: Does your metric directly influence another team's metric?; Lesson 1625 — Cross-Functional Metric Dependencies
Identify meaningful strata: Divide your population into non-overlapping groups based on important characteristics (age, income, region, education level, etc.; Lesson 236 — Stratified Sampling
Identify outliers: Values with |z| > 3 are typically considered unusual; Lesson 195 — Z-Score Definition and Interpretation Lesson 542 — Computing Fitted Values and Residuals
Identify patterns: A single representative value helps you spot trends over time or differences between categories; Lesson 38 — What is Central Tendency?
Identify power features: High adoption + high frequency = core value drivers; Lesson 1696 — Feature Adoption and Usage Frequency
Identify problematic cohorts: that need intervention; Lesson 1672 — Cohort-Based Churn Analysis
Identify stratification variables: (usually 1-3 key covariates); Lesson 1489 — Stratified Randomization Fundamentals
Identify the confounder: (from your previous analysis); Lesson 1430 — Controlling for Confounders: Stratification
Identify the pre-rebase state: Look for the entry just before you started the problematic rebase; Lesson 2021 — Recovering from Rebase Mistakes
Identify the reference distribution: (standard normal for Z, t-distribution for t, etc.; Lesson 319 — Calculating P-Values from Test Statistics
Identify Unused Indexes: Query your database's system catalogs to find indexes that are never or rarely used.; Lesson 1086 — Index Maintenance and Monitoring
Identifying actionable next steps: What should the business *do* differently?; Lesson 2090 — Stage 6: Interpretation and Insight Generation
Identifying trends: Detect consecutive increases or decreases; Lesson 1023 — Introduction to Window Functions: LAG and LEAD
Identity: Coefficients are direct additive effects (simplest interpretation).; Lesson 678 — Choosing the Right Link Function
identity link: is the simplest possible link function: it does absolutely nothing!; Lesson 672 — The Identity Link Lesson 677 — Interpreting Coefficients Under Different Links Lesson 678 — Choosing the Right Link Function
If it changes direction: The control variable was suppressing the true relationship (a suppressor effect); Lesson 508 — Interpreting Partial Correlations
If it remains strong: The relationship between your two variables is genuine, independent of the control variable(s); Lesson 508 — Interpreting Partial Correlations
If p < α: (e.; Lesson 367 — Interpreting Two-Sample Test Results
If p ≥ α: (e.; Lesson 367 — Interpreting Two-Sample Test Results
If p-value < α: Reject H₀ (the result is "statistically significant"); Lesson 323 — What is a Significance Level (α)?
If p-value > α: Fail to reject H₀ (insufficient evidence against the null); Lesson 327 — Decision Rules: Reject or Fail to Reject Lesson 356 — Making Decisions and Stating Conclusions Lesson 404 — Making Decisions and Drawing Conclusions
If p-value ≤ α: Reject H₀ (the data are unlikely under the null hypothesis); Lesson 327 — Decision Rules: Reject or Fail to Reject Lesson 356 — Making Decisions and Stating Conclusions
If p-value ≥ α: Fail to reject H₀ (insufficient evidence); Lesson 323 — What is a Significance Level (α)?
If violated: This is serious.; Lesson 383 — Diagnostic Workflow: When to Proceed or Switch Tests
If you reject H₀: You have sufficient evidence to support the alternative hypothesis.; Lesson 404 — Making Decisions and Drawing Conclusions
Ignore baseline context: Start your chart at an unusual low point to make normal recovery look exceptional; Lesson 1241 — Cherry-Picking Time Ranges
Ignored anomalies: (revenue drops 15%, but who investigates?; Lesson 1619 — What is Metric Ownership?
Ignoring geographic size bias: Large empty regions dominate visually even with low values.; Lesson 1309 — Choropleth Maps: Basics and Best Practices
Ignoring the base rates: When calculating P(A|B), people forget that the prior probability P(A) matters enormously.; Lesson 100 — Common Conditional Probability Mistakes
Ignoring the clock: You're two weeks past the deadline chasing marginal improvements while stakeholders have moved on or made decisions without you.; Lesson 2119 — Signs You're Over-Engineering
Immediate actions: (this week); Lesson 1970 — Recommendations and Next Steps
Immutable Data Patterns: Rather than updating records in place, append new versions with timestamps or version numbers.; Lesson 1848 — Designing Idempotent Operations
Impact: Number of rows affected, distribution changes, or new/dropped columns; Lesson 1162 — Documenting Transformations Lesson 1883 — Protected Classes and Proxy Variables Lesson 1966 — Report Structure and Executive Summary
Imperfect measurement instruments: A broken thermometer that reads 2°C high introduces systematic error.; Lesson 1880 — Measurement and Label Bias
Implement: Roll out winning variation; Lesson 1692 — Statistical Significance and Iteration
Implement access controls: that enforce purpose-based restrictions; Lesson 1915 — Secondary Use and Scope Creep
Implement pagination: (showing results in batches, like "page 1 of 100"); Lesson 877 — LIMIT: Restricting the Number of Rows Returned
Implementation bugs: Maybe your randomization code has an off-by-one error or timestamp issues; Lesson 1524 — Sample Ratio Mismatch (SRM)
Implementing Safeguards: Lesson 1925 — Mitigation Strategies and Responsible Disclosure
Implicit transformations: You depend on data that's already been filtered or aggregated upstream, but that logic changes without notice.; Lesson 2133 — Undocumented Data Dependencies
Important nuances: Lesson 1451 — Estimating Treatment Effects from Matched Samples
Important requirements: Lesson 1001 — INTERSECT: Finding Common Rows
Impractical test duration: – If your required sample size is large but your daily traffic is small, you'll need to run the test for weeks or months.; Lesson 1493 — Why Sample Size Matters in A/B Tests
Improve Data Integrity: When that customer moves, you update one row in one table, not dozens of scattered records.; Lesson 1061 — Introduction to Normalization
Improve interpretability: Clearer story about what drives your outcome; Lesson 585 — Remedies: Variable Selection
Improve your measurement precision: Lesson 332 — The Trade-off Between Type I and Type II Errors
Improving trends: Later cohorts retain better than earlier ones.; Lesson 1650 — Comparing Cohorts Over Time
Impute: Replace with mean, median, mode, or modeled values; Lesson 1207 — Missing Data Assessment and Strategy
IN: typically builds a complete list of values first, then checks membership.; Lesson 985 — EXISTS vs IN: Performance Considerations
In business: Lesson 802 — What is Survival Analysis?
In coordinated sequences: (extract → transform → load, in order); Lesson 1831 — What is Job Scheduling?
In final deliverables: (reports, presentations, dashboards): explanation; Lesson 1216 — Choosing the Right Purpose
In Production: ML systems degrade over time as data distributions shift.; Lesson 2130 — No Clear Success Metric or Feedback Loop
In Python: Lesson 646 — Reference Categories in Statistical Software
In R: Lesson 646 — Reference Categories in Statistical Software
In science: Lesson 802 — What is Survival Analysis?
In-place modifications: aren't supported—methods like `df.; Lesson 1796 — Limitations and Differences from Pandas
Incapacitated individuals: People with cognitive impairments, dementia, or mental health conditions may not fully comprehend what they're consenting to; Lesson 1918 — Special Populations and Vulnerable Groups
Include a quick-start section: that gets someone from zero to a working result in under five minutes—this builds confidence and engagement.; Lesson 2080 — Usage Examples and Running Your Code
Include notebook workflows: Lesson 2080 — Usage Examples and Running Your Code
Include null results: that show no effect or relationship; Lesson 1929 — Avoiding Cherry-Picking Results
Includes the row: if there's a match; Lesson 961 — IN Operator with Subqueries
including: items at exactly $10 and exactly $50.; Lesson 860 — BETWEEN Operator for Ranges Lesson 1892 — Fairness Through Unawareness vs Awareness
inclusive: , meaning they include the boundary value itself.; Lesson 857 — Comparison Operators: Greater and Less Than Lesson 860 — BETWEEN Operator for Ranges
Inconsistent definitions: across teams (is "active user" last 7 or 30 days?; Lesson 1619 — What is Metric Ownership?
Inconsistent formats: User-entered data with typos, duplicates, or conflicting values; Lesson 1762 — Extended Dimensions: Veracity and Value
Inconsistent standards: If your data collection team changes definitions midway (e.; Lesson 1880 — Measurement and Label Bias
Incorporate offline touchpoints: Sales calls, conferences, or direct mail that standard models ignore; Lesson 1731 — Custom Rule-Based Attribution
Incorporates uncertainty: It's a full distribution, not just a point estimate; Lesson 1537 — The Posterior Distribution
Incorporates uncertainty naturally: The width of your posterior reflects how confident you are; Lesson 1570 — Comparing Two Means: Bayesian Approach
Incorrect conclusions: that harm decision-making; Lesson 34 — Recognizing Boundaries of Competence
Increase I/O: transferring massive result sets; Lesson 911 — Performance Considerations with Multiple Groups
Increase your sample size: (collect more data); Lesson 332 — The Trade-off Between Type I and Type II Errors Lesson 340 — Power and Sample Size Relationship
Increased CAC pressure: You need constant acquisition just to maintain size, let alone grow; Lesson 1670 — What is Churn and Why It Matters
Increased storage: from duplicate data; Lesson 1071 — When to Denormalize: Performance Trade-offs
Increases cognitive load: people work harder to extract meaning; Lesson 1963 — Removing Chartjunk
Increases statistical power: by focusing only on within-pair changes; Lesson 370 — Differences as the Unit of Analysis
Incremental collaboration: Breaking large features into reviewable chunks while still working; Lesson 2029 — Draft Pull Requests and WIP Workflows
Incremental efficiency: does adding channel X improve overall LTV:CAC?; Lesson 1716 — Channel Mix and Portfolio Thinking
Incremental testing: runs initiatives sequentially or uses holdout groups to isolate each team's effect.; Lesson 1640 — Attribution in Multi-Team Environments
Incrementality: asks: "What would have happened *without* this channel?; Lesson 1717 — Incrementality and True Channel Impact Lesson 1718 — Introduction to Marketing Attribution Lesson 1743 — What is Incrementality?Lesson 1744 — Incrementality vs Attribution
Incrementality correlation: Do the model's channel credits align with incrementality tests (like those control group experiments you learned)?; Lesson 1734 — Comparing and Validating Attribution Models
Independence: means making decisions based solely on data and sound methodology—not on what others want to hear.; Lesson 35 — Conflicts of Interest and Independence Lesson 131 — Real-World Applications of Binomial Distributions Lesson 218 — What the Central Limit Theorem States Lesson 382 — Robustness of t-Tests to Assumption Violations Lesson 398 — Choosing Between Parametric and Non-Parametric Tests Lesson 400 — Assumptions and Conditions for Proportion Tests Lesson 419 — Assumptions and Minimum Expected Frequencies Lesson 447 — Conducting One-Way ANOVA in Practice (+5 more)
Independence of Observations: Lesson 426 — Assumptions and Sample Size Requirements Lesson 448 — Independence of Observations Lesson 470 — When Parametric ANOVA Assumptions Fail
Independence of Paired Differences: Lesson 374 — Assumptions of the Paired t-Test
Independence violated: → Reconsider your analysis approach entirely; Lesson 383 — Diagnostic Workflow: When to Proceed or Switch Tests
independent: when the outcome of one doesn't affect the outcome of the other.; Lesson 87 — Multiplication Rule for Independent Events Lesson 88 — General Multiplication Rule Lesson 111 — Spam Filtering with Naive Bayes Lesson 126 — From Bernoulli to Binomial: Multiple Trials Lesson 144 — Poisson Applications: Arrivals and Events Lesson 176 — Sum of Independent Normal Variables Lesson 359 — Two-Sample t-Test Overview Lesson 361 — Pooled Variance t-Test (+4 more)
Independent advocates: for incapacitated individuals; Lesson 1918 — Special Populations and Vulnerable Groups
Independent groups: different subjects in each group, not repeated measures; Lesson 438 — When to Use One-Way ANOVA
independent observations: and a **sufficiently large sample size** (typically n ≥ 30, though this depends on the population distribution).; Lesson 225 — CLT for Sums and Other Statistics Lesson 1389 — What is Grubbs' Test?
Independent samples: come from two different, unrelated groups.; Lesson 360 — Independent vs. Dependent Samples
Independent variable: time (often just the index: 1, 2, 3, .; Lesson 738 — Linear Detrending
index: is a separate data structure that the database maintains to help find rows quickly without scanning the entire table.; Lesson 1078 — What Are Indexes and Why They Matter Lesson 1804 — Index Optimization and Reset Strategies
Index bloat: happens when deleted records leave empty space that isn't automatically reclaimed, making indexes larger than necessary.; Lesson 1086 — Index Maintenance and Monitoring
Index plots: of residuals to spot specific observation numbers; Lesson 587 — Identifying Outliers in Regression Context
Index supporting columns: Ensure columns referenced in correlated conditions are indexed; Lesson 969 — Performance Considerations for SELECT Subqueries
Index usage: Are your indexes still effective?; Lesson 1077 — Measuring Performance Impact of Denormalization
Index-based selection: is severely limited.; Lesson 1796 — Limitations and Differences from Pandas
Indexes: Using indexed columns can make certain join orders faster; Lesson 951 — Join Order and Performance
Individual t-tests: Ask "Does *this specific* predictor add value?; Lesson 622 — Relationship Between F-Test and t-Tests
Industry norms: What metrics matter in this field?; Lesson 1168 — Understanding Domain Context
Industry research: Published studies, competitor analyses, domain blogs; Lesson 1201 — Domain Knowledge as a Hypothesis Source
Industry standards: "We're 20% above market average"; Lesson 1962 — Contextualizing Numbers
Inference: When you need trustworthy hypothesis tests and prediction intervals; Lesson 550 — Normality of Residuals Lesson 1594 — PyMC: Probabilistic Programming in Python
Inflated standard errors: The uncertainty around coefficient estimates increases dramatically; Lesson 580 — What is Multicollinearity?
Inflating r: Imagine plotting height vs.; Lesson 481 — Outliers and Their Impact on r
Influence: is about *actual impact*—how much the regression line would change if you removed that observation.; Lesson 571 — What Are Leverage and Influence?Lesson 574 — Influence: Impact on Fitted Model Lesson 2101 — Identifying and Mapping Stakeholders
Influence vs. Interest Matrix: Plot stakeholders on two axes:; Lesson 2101 — Identifying and Mapping Stakeholders
Influenced by time trends: Both variables increase over time independently; Lesson 494 — Spurious Correlations and Coincidence
INFO: Normal operations (job started, file processed); Lesson 1857 — Logging Best Practices
Info/Log only: Minor retries succeeded, small delays—for forensic review later; Lesson 1858 — Alerting Strategies
informative prior: .; Lesson 1543 — Defining Prior Distributions Lesson 1581 — Setting Priors for A/B Tests
Informative priors: reflect strong beliefs.; Lesson 1534 — The Prior Distribution Lesson 1544 — Informative vs Uninformative Priors
Informed: Clear, jargon-free explanation of:; Lesson 1912 — What is Informed Consent in Data Science?
Infrastructure costs: Hosting, computing resources, and database connections; Lesson 1979 — Maintenance and Sustainability Considerations
Infrastructure debt: Manual processes that should be automated; Lesson 2131 — What is Technical Debt in Data Science?
Initial belief (prior): Maybe there's a 20% chance the suspect is guilty based on background.; Lesson 114 — Sequential Updating
INNER JOIN: is SQL's way of bringing together information from two separate tables based on a relationship between them.; Lesson 918 — What is an INNER JOIN?Lesson 928 — LEFT JOIN vs INNER JOIN: When to Use Each
INNER JOIN table2: The table you're joining to (the "right" table); Lesson 919 — Basic INNER JOIN Syntax
INNER JOINs: to connect each relevant dimension; Lesson 956 — Star Schema Joins
Inner query: Sum sales by department; Lesson 973 — Nested Subqueries in FROM
Inner query alias: (`inner`): identifies columns from the subquery; Lesson 976 — Basic Correlated Subquery Syntax
Input data context: What data was being processed when it broke?; Lesson 1851 — Error Logging and Notifications
Input metadata: (file names, row counts, date ranges); Lesson 1857 — Logging Best Practices
Input(s): The component property you're monitoring (e.; Lesson 1335 — Dash Callbacks: Adding Interactivity
Inputs: Lesson 1737 — Aggregate-Level Data in MMM
INSERT protection: You cannot add a child record unless the referenced parent exists; Lesson 1052 — Foreign Key Constraints
INSERT/UPDATE: The database verifies the foreign key value exists in the parent table; Lesson 1060 — Trade-offs: Performance vs Integrity
Inserting: new records; Lesson 844 — What is SQL?Lesson 1124 — Insert, Update, Delete, and Bulk Operations
Insertion Anomalies: Lesson 1062 — Data Anomalies: Insert, Update, Delete
Inside the bounds: (between the dashed lines): The autocorrelation is **not statistically significant**—it could easily be random noise; Lesson 723 — Significance Bounds in ACF Plots
Inspect source data: Go to the original data source.; Lesson 1870 — Root Cause Analysis for Quality Issues
Inspect your data first: If an integer column only contains values between 0 and 100, you don't need `int64`—`int8` (range: -128 to 127) suffices.; Lesson 1799 — Optimal Data Types and Downcasting
Installation Instructions: Step-by-step commands to set up the environment; Lesson 2077 — The Purpose and Anatomy of a Good README
Instead of: "The slope coefficient β₁ = 2.; Lesson 530 — Communicating Results to Non-Technical Audiences Lesson 1955 — Framing Insights in Business Language
Institutional review: (like ethics boards) before data collection; Lesson 1918 — Special Populations and Vulnerable Groups
Instrumentation Issues: Logging errors, tracking bugs, or data pipeline problems often surface during A/A tests; Lesson 1483 — Pre-Experiment Validation
Insurance claims: Total claim amounts in a period; Lesson 181 — Gamma Distribution: Shape and Rate Parameters
INT: Whole numbers (e.; Lesson 846 — Tables, Schemas, and Data Types
INTEGER: or **INT**: Whole numbers (e.; Lesson 846 — Tables, Schemas, and Data Types
Integer division: Some databases may truncate decimal places if the column is an integer type; Lesson 884 — AVG: Computing Averages
Integrated (Main Body): Lesson 1947 — Handling Methodology and Technical Details
Integrates segments: into downstream workflows like marketing automation, pricing engines, or customer support tools; Lesson 1710 — Operationalizing Segments: Scoring and Deployment
Integrity and Confidentiality: Lesson 1905 — Core Principles of GDPR
Intent matters: Ask yourself: "Am I creating this visualization to inform or to persuade dishonestly?; Lesson 1247 — The Ethics of Visualization Design
Intent-to-Treat (ITT): means you analyze every participant in the group they were *originally randomized to*, regardless of what they actually did.; Lesson 1439 — Intent-to-Treat Analysis Lesson 1748 — Intent-to-Treat Analysis
interaction: where the effect of one factor depends on the level of the other.; Lesson 463 — Introduction to Two-Way ANOVA Lesson 466 — Visualizing Interactions Lesson 561 — Residuals vs Predictor Plots
Interaction analysis: (do color and size amplify each other's effects?; Lesson 1482 — Control and Treatment Design
interaction effect: .; Lesson 465 — Interaction Effects Lesson 1195 — Interaction Effects Between Variables
interaction effects: worth exploring (e.; Lesson 1201 — Domain Knowledge as a Hypothesis Source Lesson 1531 — Interference from Concurrent Tests Lesson 1689 — Multivariate Testing and Personalization
Interaction effects analysis: (Lesson 1195) shows whether variables work together or independently; Lesson 1197 — Identifying Variable Importance and Redundancy
Interaction is available: (rotation helps overcome perspective distortion); Lesson 1323 — Introduction to 3D Plotting in Matplotlib
Interaction plots: make these non-additive effects visible at a glance.; Lesson 466 — Visualizing Interactions
interaction term: represents a relationship where the effect of one predictor on your outcome variable *depends on* the level or value of another predictor.; Lesson 648 — What are Interaction Terms?Lesson 653 — Interpreting Categorical × Categorical Interactions Lesson 1455 — DiD with Regression
Interactive 2D plots: Let users filter and explore without perspective distortion; Lesson 1329 — Effective Use and Pitfalls of 3D Visualizations
Interactive dashboards are primary: While R has Shiny, Python's Streamlit and Dash often integrate more naturally into broader Python ecosystems.; Lesson 1375 — Choosing Tools: When to Use R vs Python for Visualization
Interactive zoom: Let users explore crowded areas at different scales; Lesson 1310 — Point Maps and Scatter Plots on Maps
Interest: How much do they care about the outcome?; Lesson 2101 — Identifying and Mapping Stakeholders
Interleave explanation with code: Write markdown cells that introduce your analysis approach, then show the actual code that implements it; Lesson 1982 — Literate Programming with Notebooks
Intermediate outputs: Cleaned datasets, feature engineering results; Lesson 2065 — Tracking Data Lineage
Internal databases: Your organization's own records (sales, customer info, logs); Lesson 11 — Data Collection and Acquisition
Internal first: Alert your organization's leadership and legal/ethics teams; Lesson 1925 — Mitigation Strategies and Responsible Disclosure
Internal validity: asks: *Are the results truly caused by what you think caused them?; Lesson 1441 — Internal vs External Validity
Interpret: Positive residuals mean more observations than expected; negative means fewer; Lesson 428 — Post-Hoc Analysis and Residuals Lesson 436 — Conducting McNemar's Test Lesson 685 — Confidence Intervals for Odds Ratios Lesson 740 — Choosing Between Differencing and Detrending
Interpret in Context: Lesson 447 — Conducting One-Way ANOVA in Practice
Interpretability: Results speak directly about means—easier to communicate and understand in most contexts.; Lesson 475 — Choosing Between Parametric and Non-Parametric Tests Lesson 1555 — Advantages and Limitations of Conjugate Priors Lesson 2102 — Understanding Stakeholder Goals and Constraints Lesson 2123 — Simple Rules Beat Complex Models
Interpretation: What does "revenue" mean—gross or net?; Lesson 23 — Data Provenance and Metadata Lesson 443 — Mean Squares and the F-Ratio Lesson 533 — Interpreting R-Squared Values Lesson 647 — Impact on Model Results and Reporting Lesson 691 — Interpreting Poisson Coefficients Lesson 827 — Hazard Ratios and Interpretation Lesson 1580 — Bayesian vs Frequentist A/B Testing Lesson 1629 — SaaS Growth Metrics: Quick Ratio and Net Revenue Retention
Interpretation cells: Discuss what results mean (markdown referencing outputs above); Lesson 1982 — Literate Programming with Notebooks
Interpretation guideline: context (small/medium/large, or domain-specific benchmarks); Lesson 389 — Reporting Effect Sizes in Practice
Interpretation guidelines: Lesson 445 — Effect Size: Eta-Squared and Omega-Squared Lesson 472 — Interpreting Kruskal-Wallis Results and Effect Size
Interpreting the condition number: Lesson 583 — Condition Number and Eigenvalues
Interpreting variability: Standard deviation and variance assume certain shapes.; Lesson 63 — Understanding Distribution Shape
Interquartile Range (IQR): is a measure of variability that tells you how spread out the middle half of your data is.; Lesson 51 — Interquartile Range (IQR)Lesson 56 — Understanding Percentiles and Their Interpretation Lesson 1176 — Box Plots for Spread and Outliers Lesson 1383 — Understanding the Interquartile Range (IQR)Lesson 1384 — The IQR Outlier Detection Rule
Intersection ( ∩): "A **and** B"; Lesson 80 — Set Operations: Union, Intersection, and Complement
Intersectionality: recognizes that a Black woman's experience isn't just "being Black" plus "being a woman"—it's a unique combined experience.; Lesson 1893 — Intersectionality in Fairness
Interval: milliseconds between frames; Lesson 1327 — Creating Animations with FuncAnimation
Interval censoring: means you know the event occurred within a specific time window, but not the precise moment.; Lesson 805 — Left and Interval Censoring
Interval data: (numeric with no true zero: temperature in Celsius, dates) suits:; Lesson 1238 — Matching Encoding to Data Type
Interval/ratio data: Meaningful numeric measurements; Lesson 398 — Choosing Between Parametric and Non-Parametric Tests
Introduces scope creep: that derails core objectives; Lesson 2107 — Saying No and Pushing Back Constructively
Introduction: Problem statement, objectives, context; Lesson 1966 — Report Structure and Executive Summary
Introduction cells: State the question and context (markdown); Lesson 1982 — Literate Programming with Notebooks
Intuition: If you expect 3 heads in 10 fair coin flips (10 × 0.; Lesson 129 — Binomial Mean and Variance Lesson 136 — Expectation and Variance of the Negative Binomial
Intuitive interpretation: The Beta parameters have natural meanings:; Lesson 1551 — Beta-Binomial Conjugacy
Invalid inference: Hypothesis tests and confidence intervals are incorrect; Lesson 734 — Why Differencing and Detrending Matter
Invalid statistical inference: Standard errors, confidence intervals, and hypothesis tests become meaningless because they assume stability that isn't there.; Lesson 713 — Why Stationarity Matters
Inventory turnover ratio: measures how many times you sell and replace stock annually:; Lesson 1634 — Retail Metrics: Same-Store Sales and Inventory Turnover
Inverse-Gamma: part models uncertainty about σ²; Lesson 1568 — Unknown Variance: Normal-Inverse-Gamma Model
Inverted S-shape: Light-tailed distribution (fewer extreme values); Lesson 565 — What Q-Q Plots Show: Comparing Residual Distribution to Normal Lesson 566 — Reading Q- Q Plots: Interpreting Points Along the Reference Line
Invest in data collection: if possible, but accept that sometimes you need to deliver value *now* with what you have.; Lesson 2124 — Insufficient or Low-Quality Data
Investigate First: Lesson 579 — What to Do with Influential Points
Investment advice: based only on winning stocks ignores all the losers that went to zero; Lesson 247 — Survivorship Bias
Involuntary churn: occurs without customer intent—usually from failed payments, expired credit cards, or technical issues.; Lesson 1670 — What is Churn and Why It Matters Lesson 1671 — Churn Rate Calculation Methods
IoT sensor data: Temperature, energy consumption, or manufacturing metrics with predictable rhythms; Lesson 1411 — Applications and Limitations
IQR: (Interquartile Range) shines.; Lesson 54 — When to Use Each Measure
IQR method: makes no such assumption—it relies on quartiles and is robust to skewed or non-normal distributions.; Lesson 1386 — IQR Method vs Z-Score: When to Use Each
IQR methods: give you rules of thumb for flagging outliers, Grubbs' Test takes a more rigorous approach.; Lesson 1389 — What is Grubbs' Test?
Irreducibility: The chain can eventually reach any state from any other state; Lesson 1589 — Markov Chains: The Foundation of MCMC
Irregular: components, you face a fundamental choice: do these pieces combine by adding or by multiplying?; Lesson 710 — Additive vs Multiplicative Models Lesson 744 — Classical Decomposition Methods
Irreversibility matters: Decisions are costly to reverse; Lesson 1522 — Balancing Speed and Accuracy in Metric Selection
Isolate Seasonality (S): Average the detrended values for each season (e.; Lesson 744 — Classical Decomposition Methods
Isolate the problem: Test from different machines or networks to rule out local issues; Lesson 1093 — Troubleshooting Connection Issues
Isolates brand impact: The PSA has no commercial intent, so any lift from your real ad is truly incremental; Lesson 1747 — Ghost Ads and PSA Tests
Isolating the treatment effect: Differences in outcomes are more likely due to treatment, not pre-existing differences; Lesson 1445 — The Matching Framework
Isolation: Concurrent transactions don't interfere with each other; Lesson 1110 — What Are Database Transactions?
Issue tracker: Direct link to GitHub Issues or your bug tracking system; Lesson 2083 — Contributing Guidelines and Contact Information
It doesn't: Lesson 1883 — Protected Classes and Proxy Variables
It slows down: every new feature or improvement; Lesson 2132 — Pipeline Glue Code and Complexity Creep
It's hard to test: in isolation; Lesson 2132 — Pipeline Glue Code and Complexity Creep
It's poorly documented: ("I'll remember what this does"); Lesson 2132 — Pipeline Glue Code and Complexity Creep
It's tightly coupled: to specific data formats or versions; Lesson 2132 — Pipeline Glue Code and Complexity Creep
iterate: .; Lesson 15 — Deployment, Monitoring, and Iteration Lesson 25 — The Scientific Method in Data Science
Iteration: means deliberately refining your approach based on what you learned—testing a new feature, adjusting model complexity, or exploring a different angle after stakeholder feedback.; Lesson 2112 — Iteration vs Rework: Learning from Each Cycle Lesson 2142 — Interviewing: Technical and Behavioral Prep
Iteration is critical: You need to test dozens of variants quickly; Lesson 1522 — Balancing Speed and Accuracy in Metric Selection
Iterative algorithms: Machine learning models that require hundreds of passes over the data; Lesson 1784 — Computation Complexity: Beyond Data Size

J

Jarque-Bera test: takes a unique approach: it specifically looks at two shape characteristics—**skewness** and **kurtosis**—and combines them into a single test statistic.; Lesson 208 — Jarque-Bera Test
Jeffrey's prior: , `Beta(0.; Lesson 1559 — Uninformative and Weakly Informative Priors
Jitter: them (each person shifts slightly so all faces are visible); Lesson 1353 — Position Adjustments: Dodge, Stack, and Jitter
Jittering: Slightly randomize positions (when exact location isn't critical); Lesson 1310 — Point Maps and Scatter Plots on Maps
Job performance studies: Both competence and charisma can lead to promotion.; Lesson 1473 — Conditioning on Colliders: Selection Bias
Joining tables: Multiple tables might have overlapping column names, causing confusion; Lesson 851 — Selecting All Columns with Asterisk
JSON: handles nested data well but creates significant memory overhead with all its bracket and quote characters.; Lesson 1133 — Performance Considerations Across Formats Lesson 1779 — Reading and Writing Data in Spark Lesson 2072 — Configuration Files vs Hard-Coded Values
Just right: Reveals the true distribution shape clearly; Lesson 1267 — Histograms and Distribution Plots
Justified Removal (Last Resort): Lesson 579 — What to Do with Influential Points

K

k < 1: Decreasing failure rate (infant mortality—defects fail early); Lesson 187 — The Weibull Distribution: Shape, Scale, and Survival Lesson 188 — Weibull Distribution: Hazard Function and Reliability Lesson 189 — Fitting Weibull Models to Lifetime Data
k = 1: Constant failure rate (becomes the exponential distribution—random failures); Lesson 187 — The Weibull Distribution: Shape, Scale, and Survival Lesson 188 — Weibull Distribution: Hazard Function and Reliability Lesson 189 — Fitting Weibull Models to Lifetime Data
k > 1: Increasing failure rate (wear-out phase—things break down over time); Lesson 187 — The Weibull Distribution: Shape, Scale, and Survival Lesson 188 — Weibull Distribution: Hazard Function and Reliability Lesson 189 — Fitting Weibull Models to Lifetime Data
K-1 degrees of freedom: .; Lesson 824 — Multiple Group Comparisons
K-anonymity: ensures that each record is indistinguishable from at least *k-1* other records when considering quasi-identifiers (age, ZIP code, gender).; Lesson 1895 — Data Anonymization Basics Lesson 1896 — K-Anonymity Lesson 1897 — L-Diversity and T- Closeness Lesson 1911 — GDPR Compliance for Data Scientists
Kaplan-Meier: to estimate response probability curves by segment; Lesson 841 — Campaign Response Time Analysis
Kaplan-Meier curves: Lesson 836 — Employee Turnover and Retention Analysis
Kaplan-Meier estimator: , you can plot conversion curves that account for censoring (prospects still "alive" but not yet converted).; Lesson 839 — Time-to-Conversion in Marketing Funnels
KDE (Kernel Density Estimate): adds a smooth curve that estimates the underlying probability distribution, helping you see trends the blocky bins might obscure.; Lesson 1267 — Histograms and Distribution Plots
Keep conditions simple: Complex nested CASE statements are hard to maintain and slower to execute.; Lesson 1037 — CASE Best Practices and Performance
Keep CTEs focused: Each CTE should represent one logical step; Lesson 997 — CTE Best Practices and Performance
Keep it simple: Single-column integer keys perform best for joins and indexing; Lesson 1050 — Choosing Effective Primary Keys Lesson 1679 — Defining Funnel Steps and Events
Keep separate: Analyze complete vs incomplete groups; Lesson 1207 — Missing Data Assessment and Strategy
Kendall: when you have outliers, skewed distributions, or ordinal (ranked) data.; Lesson 1184 — Correlation Coefficients in Bivariate Analysis
Kendall correlation: also uses ranks but counts how often pairs of observations agree in their ordering.; Lesson 1184 — Correlation Coefficients in Bivariate Analysis
Kendall's Tau: counts *concordant and discordant pairs*—comparing every possible pair of observations to see if they agree in direction.; Lesson 490 — Kendall's Tau vs Spearman's Rho
Kendall's Tau (τ): .; Lesson 489 — Kendall's Tau Correlation Coefficient
Kernel Density Estimation: is the mathematical technique behind these visualizations.; Lesson 1312 — Heatmaps and Density Maps for Spatial Data
Kernel Density Estimation (KDE): is a technique that creates a smooth curve approximating your data's probability distribution.; Lesson 1177 — Density Plots and KDE
Kernel Density Plots: (or density curves) smooth out the histogram into a continuous curve.; Lesson 203 — Visual Assessment: Histograms and Density Plots
Kernel Matching: uses a weighted average of *all* control units, with weights based on distance from each treated unit's propensity score.; Lesson 1448 — Propensity Score Matching Methods
Key advantage: Simple correction; coefficients remain interpretable as log-rate ratios.; Lesson 694 — Quasi-Poisson and Negative Binomial Models
Key characteristic: Unlike the sampling distribution of the mean (which becomes normal thanks to the CLT), the sampling distribution of the variance follows a **chi-squared distribution** when the population is normal.; Lesson 254 — Sampling Distribution of the Sample Variance
Key conditions for convergence: Lesson 1589 — Markov Chains: The Foundation of MCMC
key difference: is simply what you're waiting for:; Lesson 137 — Geometric vs Negative Binomial: Key Differences Lesson 229 — Defining Samples and Statistics
Key factors affecting significance: Lesson 1692 — Statistical Significance and Iteration
Key findings: (2-3 bullet points with numbers); Lesson 1966 — Report Structure and Executive Summary
Key insight: ".; Lesson 1250 — Text Alternatives and Screen Reader Compatibility
Key lesson: Always ask "what else might explain this pattern?; Lesson 1426 — Real-World Examples: Correlation vs Causation
Key partitioning principles: Lesson 1782 — Spark Performance Basics: Partitions and Caching
Key properties: Lesson 572 — Leverage: Distance in X-Space
Key Results: are 2-5 specific, measurable outcomes that define *how* you'll know you've succeeded.; Lesson 1607 — Introduction to OKRs (Objectives and Key Results)
Key thresholds: Lesson 1629 — SaaS Growth Metrics: Quick Ratio and Net Revenue Retention
Kill the jargon: Replace technical variable names like "churn_propensity_score_v2" with "Customer Risk Level.; Lesson 1958 — Simplifying Visual Complexity
Know your exit options: Sometimes you must escalate to leadership or, in extreme cases, consider **responsible disclosure** or changing roles.; Lesson 1931 — When to Push Back on Requests
Knowledge spreads: Reviewers learn about changes they didn't write; contributors get feedback that improves their skills; Lesson 2022 — Understanding Pull Requests
Known Unknowns: "We don't yet know if historical patterns hold post-merger"; Lesson 2100 — Documenting Assumptions and Open Questions
Known variance structure: The variance function follows directly from the exponential family form; Lesson 670 — Why Exponential Family Matters for GLMs
Kolmogorov-Smirnov: to check if numeric distributions match theoretical ones (like normal distribution).; Lesson 1208 — Distribution Checks for All Variables
Kolmogorov-Smirnov (K-S) test: takes a slightly different approach.; Lesson 206 — Kolmogorov-Smirnov Test
KPSS fails to reject: (high p-value) → Evidence *for* stationarity; Lesson 717 — KPSS Test
KPSS rejects: (low p-value) → Evidence *against* stationarity; Lesson 717 — KPSS Test
KPSS test: High p-value (> 0.; Lesson 718 — Interpreting Stationarity Test Results Lesson 741 — Testing Stationarity After Transformation
Kruskal-Wallis test: The non-parametric cousin of one-way ANOVA, comparing medians across groups using ranks; Lesson 470 — When Parametric ANOVA Assumptions Fail
Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test: is a statistical test for stationarity that works *opposite* to the Augmented Dickey-Fuller test you just learned.; Lesson 717 — KPSS Test

L

L well-represented values: for sensitive attributes.; Lesson 1897 — L-Diversity and T-Closeness
L-Diversity: addresses this by requiring that each equivalence class (group of indistinguishable records) contains at least **L well-represented values** for sensitive attributes.; Lesson 1897 — L-Diversity and T-Closeness
L(θ | data): or **P(data | θ)**; Lesson 1535 — The Likelihood Function
LA's coefficient: (say, -5): means LA is 5 units lower than Boston; Lesson 643 — Interpreting Coefficients Relative to Reference
Label bias: happens when subjective human judgment creates inconsistent or skewed labels for supervised learning.; Lesson 1880 — Measurement and Label Bias
Labels: Add text to critical nodes.; Lesson 1319 — Styling Network Visualizations
Labels + Color: Add direct text labels or annotations to clarify what colors represent; Lesson 1251 — Avoiding Reliance on Color Alone
Labels and Titles: Use `set_xlabel()`, `set_ylabel()`, and `set_title()` to give context.; Lesson 1270 — Customizing Axes: Labels, Limits, and Scales
LAG: and **LEAD** window functions make this trivial by letting you "peek" at other rows directly.; Lesson 1023 — Introduction to Window Functions: LAG and LEAD
Lag 1: Correlation between consecutive observations (today vs.; Lesson 719 — What is Autocorrelation?Lesson 720 — The Autocorrelation Function (ACF)
Lag 2: Correlation between observations two steps apart (today vs.; Lesson 719 — What is Autocorrelation?Lesson 720 — The Autocorrelation Function (ACF)
Lag Effect: Lesson 756 — Limitations of Moving Averages
Lag k: Correlation at any time lag *k*; Lesson 719 — What is Autocorrelation?
Lagging: Ultimate business metrics (revenue per user, customer lifetime value); Lesson 1601 — Balancing Leading and Lagging Metrics
Lagging indicators: are outcome-focused metrics that tell you *what has already happened*.; Lesson 1597 — What Are Leading and Lagging Indicators?
Lagging metrics provide accountability: They're your scoreboard, measuring final outcomes.; Lesson 1601 — Balancing Leading and Lagging Metrics
lambda (λ): , called the **rate parameter**.; Lesson 139 — The Poisson Process and Rate Parameter Lesson 214 — Box-Cox Transformation
Language-agnostic: The same DataFrame operations work in Python, Scala, R, and Java; Lesson 1778 — DataFrames and Spark SQL Basics
Laplace mechanism: adds noise drawn from a Laplace distribution.; Lesson 1899 — Adding Noise for Privacy
large: , you fail to reject—the equal variance assumption seems reasonable.; Lesson 380 — Testing Equal Variances: Levene's and Bartlett's Tests Lesson 1871 — Why Version Control for Data?Lesson 2070 — Separating Data from Code
Large Cook's Distance: Point substantially changes the regression line; Lesson 578 — Visualizing Leverage and Influence
Large data files: store these separately (cloud storage, data lakes); Lesson 1996 — The .gitignore File
Large datasets: where finding exact twins is feasible; Lesson 1446 — Exact Matching
Large effect: d ≈ 0.; Lesson 385 — Cohen's d for Standardized Mean Differences Lesson 386 — Effect Size Interpretation Guidelines Lesson 429 — Effect Size: Cramér's V and Phi
Large magnitude values: (typically |residual| > 2 or 3): potential outliers or poorly-fit observations; Lesson 701 — Deviance Residuals
Large p-value: (≥ 0.; Lesson 606 — Statistical Significance of Individual Coefficients
Large p-value (≥ 0.05): Weak evidence.; Lesson 619 — Interpreting the F-Statistic and P-Value
Large p-value (e.g., 0.40): Your data isn't unusual at all under H₀.; Lesson 318 — What is a P-Value?
large sample sizes: , even tiny, meaningless slopes become statistically significant.; Lesson 529 — Practical vs Statistical Significance Lesson 1386 — IQR Method vs Z-Score: When to Use Each
large samples: (n > 5000): Tests often reject normality due to trivial deviations that won't affect your downstream analyses.; Lesson 209 — Sample Size Considerations in Normality Tests Lesson 550 — Normality of Residuals
Large Standard Errors: Lesson 581 — Symptoms of Multicollinearity
Large tables: Retrieving unnecessary columns wastes bandwidth and memory; Lesson 851 — Selecting All Columns with Asterisk
Large λ: (e.; Lesson 1403 — CUSUM and EWMA Charts
Larger sample size: → Narrower margin (more data gives better estimates); Lesson 294 — Margin of Error and Its Components
larger sample sizes: and is particularly sensitive to differences in the middle of the distribution.; Lesson 206 — Kolmogorov-Smirnov Test Lesson 630 — Bayesian Information Criterion (BIC)Lesson 1482 — Control and Treatment Design
Larger ε: = weaker privacy (less noise, more accuracy); Lesson 1898 — Differential Privacy Fundamentals
Last Non-Direct: Add `WHERE channel !; Lesson 1725 — Implementing Single-Touch Attribution
Last Non-Direct Click: would credit: LinkedIn Ad; Lesson 1722 — Last Non-Direct Click Attribution Lesson 1723 — Comparing Single-Touch Models
Last Non-Direct Click Attribution: credits the last touchpoint in the customer journey *before* conversion, **excluding any direct traffic**.; Lesson 1722 — Last Non-Direct Click Attribution
Last touch: (30%) – The final interaction before purchase; Lesson 1730 — W-Shaped Attribution Model
Last touch matters: Something convinced them to finally convert; Lesson 1729 — Position-Based (U-Shaped) Attribution
Last-touch: Credit goes entirely to the final action before conversion; Lesson 1637 — What is Metric Attribution?Lesson 1722 — Last Non-Direct Click Attribution Lesson 1724 — Limitations of Single-Touch Attribution
last-touch attribution: .; Lesson 1721 — Last-Touch Attribution Model Lesson 1723 — Comparing Single-Touch Models Lesson 1725 — Implementing Single-Touch Attribution
Latency matters: Fraud detection must happen in seconds, not overnight; Lesson 1788 — Streaming Data and Real-Time Requirements
Latency periods: for disease incubation; Lesson 179 — When Variables Are Log-Normally Distributed
Latitude: measures north-south position from the equator (0°) to the poles (±90°).; Lesson 1308 — Geographic Data Types and Coordinate Systems
Law of Total Probability: lets you split the sample space into separate, non-overlapping scenarios (a partition), calculate the probability within each scenario, and then add them up to get your answer.; Lesson 90 — The Law of Total Probability Lesson 97 — Law of Total Probability
Lawful basis for processing: You can't just collect data because it's convenient—you need explicit consent or a legitimate legal reason; Lesson 1904 — What is GDPR and Why It Matters
Lawfulness, Fairness, and Transparency: Lesson 1905 — Core Principles of GDPR
Layer in supporting evidence: Once the headline lands, show the *why*—perhaps a single clear visualization with strong annotation (lessons 1960-1961).; Lesson 1965 — Progressive Disclosure Techniques
Layer your information: .; Lesson 1967 — Writing Clear and Concise Analysis Sections
Lazy loading: means only computing what's currently visible, deferring expensive operations until absolutely necessary.; Lesson 1337 — Dashboard Performance and Caching
LEAD: window functions make this trivial by letting you "peek" at other rows directly.; Lesson 1023 — Introduction to Window Functions: LAG and LEAD
Lead conversion: (30%) – The moment a prospect becomes a qualified lead (e.; Lesson 1730 — W-Shaped Attribution Model
Lead generation: Form submission or demo request; Lesson 1686 — Defining Conversions and Conversion Rate
Lead with findings: Put results before methodology when possible; Lesson 1967 — Writing Clear and Concise Analysis Sections
Leading: Surrogate metrics (click-through rate, engagement time, sign-up rate); Lesson 1601 — Balancing Leading and Lagging Metrics
leading indicator: that correlates strongly with your actual business goal but can be measured sooner, more frequently, or with less noise.; Lesson 1517 — Surrogate Metrics: When Direct Measurement is Impractical Lesson 1604 — What is a North Star Metric?Lesson 1605 — Characteristics of Good North Star Metrics Lesson 1628 — SaaS Metrics: MRR, ARR, and Logo Churn Lesson 1632 — Financial Services Metrics: AUM, NIM, and Credit Metrics
Leading indicators: are predictive, forward-looking metrics that signal *what is likely to happen in the future*.; Lesson 1597 — What Are Leading and Lagging Indicators?
Leading indicators of disengagement: are behavioral signals that precede actual churn—like smoke before fire.; Lesson 1700 — Leading Indicators of Disengagement
Learn & Repeat: Use insights to generate next hypothesis; Lesson 1692 — Statistical Significance and Iteration
Learn strategically: by identifying skill gaps; Lesson 34 — Recognizing Boundaries of Competence
Learning the structure: Analyze the original data's distributions, correlations, and statistical properties; Lesson 1901 — Synthetic Data Generation
least squares criterion: says: choose the line that minimizes the **sum of squared residuals**.; Lesson 517 — The Least Squares Criterion Lesson 518 — Deriving the Least Squares Estimators
Left censoring: occurs when you know an event *has already occurred* before your observation period began, but you don't know exactly when.; Lesson 805 — Left and Interval Censoring
LEFT JOIN: Returns **all** rows from the left table, plus matching rows from the right (or NULL if no match); Lesson 928 — LEFT JOIN vs INNER JOIN: When to Use Each Lesson 936 — FULL OUTER JOIN Syntax Lesson 946 — Self-Joins for Hierarchical Data
Left pane: Your changes (current branch); Lesson 2019 — Using Diff Tools for Conflict Resolution
Left tail: α/2 (e.; Lesson 346 — Two-Tailed Tests: Testing for Any Difference
left to right: (though the optimizer may reorder them internally).; Lesson 950 — Chaining Multiple Joins Lesson 952 — Mixing Join Types
Left-only rows: Right-side columns are NULL; Lesson 937 — Identifying Matched vs Unmatched Rows
Left-skewed (negative skew): A long tail to the left; most values cluster high (e.; Lesson 1175 — Histograms for Distribution Shape
Legacy enterprise systems: → XML; Lesson 22 — File Formats: CSV, JSON, and Beyond
Legacy systems: Some older SQL environments don't support CTEs; Lesson 974 — When to Use FROM Subqueries vs CTEs
Legal compliance: – Are you following laws like GDPR, CCPA, or HIPAA that govern data use in different regions and industries?; Lesson 36 — Responsible Data Sourcing and Use Lesson 2062 — Why Data Source Documentation Matters
Legal Obligation: Lesson 1906 — Legal Bases for Processing Personal Data
Legend interactivity: to show/hide data series by clicking; Lesson 1300 — Creating Basic Interactive Charts with Plotly Express
Legends: identify what different visual elements represent—especially crucial when you have multiple lines, colors, or groups.; Lesson 1271 — Adding Legends, Annotations, and Text
Legitimate Interests: Lesson 1906 — Legal Bases for Processing Personal Data
Lends: an available connection when your code requests one; Lesson 1092 — Connection Pooling Basics
LENGTH: measures how many characters are in a string; Lesson 1044 — String Manipulation: CONCAT, LENGTH, and SUBSTRING Lesson 1232 — Perceptual Accuracy Hierarchy Lesson 1238 — Matching Encoding to Data Type
Length of stay (LOS): tracks average days hospitalized.; Lesson 1633 — Healthcare Metrics: Patient Outcomes and Operational Efficiency
Lengthening Time-to-Return: The gap between visits grows longer.; Lesson 1700 — Leading Indicators of Disengagement
Leptokurtic: (kurtosis > 3 or excess kurtosis > 0): Heavy tails and a sharp peak.; Lesson 66 — Kurtosis: Definition and Interpretation
Less effective when: Lesson 1727 — Linear Attribution Model
Less SQL boilerplate: You write Python code, not SQL strings; Lesson 1117 — What is an ORM and Why Use It?
Less typing: You save keystrokes, reducing errors and speeding up query writing.; Lesson 924 — Using Table Aliases in Joins
Lesson 804: , you learned about right censoring (when someone drops out before the event happens).; Lesson 805 — Left and Interval Censoring
Let supporting details orbit: around these three points, but never introduce a fourth major message; Lesson 1940 — The Rule of Three in Data Storytelling
level: of the series; Lesson 740 — Choosing Between Differencing and Detrending Lesson 765 — Introduction to Holt-Winters Method Lesson 767 — Holt-Winters Additive Model Lesson 770 — Initializing Holt-Winters Components Lesson 771 — Forecasting with Holt-Winters
Level (L₀): Lesson 770 — Initializing Holt-Winters Components
Level A: Minimum accessibility (e.; Lesson 1254 — Testing Visualizations for Accessibility
Level AA: Recommended target (e.; Lesson 1254 — Testing Visualizations for Accessibility
Level AAA: Enhanced accessibility (e.; Lesson 1254 — Testing Visualizations for Accessibility
Level equation: The current baseline value, adjusted for trend; Lesson 761 — Double Exponential Smoothing (Holt's Method)Lesson 767 — Holt-Winters Additive Model Lesson 768 — Holt-Winters Multiplicative Model
Level shifts: Sudden jumps to a new baseline that persists; Lesson 715 — Visual Tests for Stationarity
Levene's test: or the **F-test** (covered in lesson 363), or simply inspect side-by-side boxplots.; Lesson 379 — The Assumption of Equal Variances (Homoscedasticity)Lesson 380 — Testing Equal Variances: Levene's and Bartlett's Tests
Leverage: refers to an observation's *position* in the predictor space—specifically, how far its X-value is from the mean of all X-values.; Lesson 571 — What Are Leverage and Influence?Lesson 573 — Calculating and Interpreting Hat Values Lesson 574 — Influence: Impact on Fitted Model Lesson 575 — Cook's Distance
Leverage associations: Use culturally familiar color meanings: red for danger/stop/negative, green for go/positive, blue for neutral/calm.; Lesson 1961 — Color as Communication Tool
License: The legal terms under which you can use and share the data.; Lesson 2063 — Essential Metadata to Capture
Lightweight artifacts: Small CSVs of feature importance, confusion matrices, or performance metrics belong in version control; Lesson 2034 — Committing Data Artifacts and Model Outputs
Likelihood: If someone has the disease, how likely is a positive test?; Lesson 107 — Bayes' Theorem Formula and Components Lesson 682 — Maximum Likelihood Estimation in Logistic Regression Lesson 697 — Deviance: A Measure of Model Fit Lesson 1417 — Bayesian Change-Point Detection Lesson 1550 — What Are Conjugate Priors?Lesson 1566 — Conjugate Normal-Normal Model Lesson 1594 — PyMC: Probabilistic Programming in Python
Likelihood P(Evidence | Guilty): probability of seeing this evidence if guilty; Lesson 112 — Legal Evidence and Jury Reasoning
Likelihood Ratio Test: compares two **nested models**—where one model (the simpler one) is a special case of the other (the more complex one).; Lesson 699 — The Likelihood Ratio Test Lesson 791 — Comparing Nested and Non-Nested Models Lesson 830 — Testing Coefficient Significance
likelihood ratio test (LRT): compares two nested models by examining how well each explains the data.; Lesson 628 — Likelihood Ratio Tests Lesson 684 — Likelihood Ratio Tests for Model Comparison
Likelihood: P(B|A): How probable the evidence B is *if* A is true; Lesson 107 — Bayes' Theorem Formula and Components
Limit CTE reuse: If you reference a CTE many times, consider a temp table instead; Lesson 997 — CTE Best Practices and Performance
Limit result set size: Filter rows in the outer query before applying expensive subqueries; Lesson 969 — Performance Considerations for SELECT Subqueries
Limit your palette: Too many colors create cognitive overload—your audience spends mental energy decoding the legend instead of understanding your insight.; Lesson 1961 — Color as Communication Tool
Limitation: It doesn't give you a true likelihood, so some model comparison tools (like AIC) won't work.; Lesson 694 — Quasi-Poisson and Negative Binomial Models Lesson 1226 — Stacked and Grouped Bar Charts Lesson 1744 — Incrementality vs Attribution Lesson 1767 — Scale-Up vs Scale-Out Architectures
Limitations and confidence levels: honesty builds trust; Lesson 2091 — Stage 7: Communication and Handoff
Limitations and uncertainties: Where might the model fail?; Lesson 1917 — Transparency in Analysis and Models
Limited control: You can't influence outcomes that already crystallized; Lesson 1617 — The Danger of Lagging-Only Metrics
Limited flexibility: Your prior beliefs must fit the conjugate family's shape, even if reality suggests otherwise; Lesson 1555 — Advantages and Limitations of Conjugate Priors
Limits: the total number of concurrent connections to prevent overwhelming the database; Lesson 1092 — Connection Pooling Basics
Limits and breaks: controlling what range displays and where tick marks appear; Lesson 1344 — Scales and Coordinate Systems
Line charts: Showing trends over time (monthly revenue, daily user counts); Lesson 1959 — Choosing Familiar Chart Types
Line style + Color: Vary dashed, dotted, and solid lines in addition to color; Lesson 1251 — Avoiding Reliance on Color Alone
Line Styles: control how lines appear:; Lesson 1272 — Colors, Markers, and Line Styles
Line type: `linetype` — solid, dashed, dotted; Lesson 1341 — Data and Aesthetic Mappings
lineage: information, so if a partition fails, it can rebuild just that piece—providing fault tolerance without constant replication overhead.; Lesson 1774 — What is Apache Spark and Why Use It?Lesson 1871 — Why Version Control for Data?
linear: relationships.; Lesson 476 — What is Pearson Correlation?Lesson 477 — Interpreting the Correlation Coefficient Lesson 680 — The Logit Link Function and Odds Lesson 1196 — Dimensionality Reduction for Visualization
Linear decay: Attribution drops steadily over time (e.; Lesson 1639 — Time Windows and Attribution Decay
Linear interpolation: draws an imaginary line between the 7th and 8th values and picks the point halfway between them.; Lesson 58 — Calculating Percentiles: Methods and Algorithms
Linear pattern: Points should roughly follow a straight line, not a curve; Lesson 480 — Scatterplots and Visual Assessment
Linear scalability: Add more nodes, get proportionally more capacity.; Lesson 1771 — Shared-Nothing Architecture
Linearity: Lesson 546 — The Five Core Assumptions of Linear Regression Lesson 557 — The Residuals vs Fitted Values Plot Lesson 601 — Assumptions for Multiple Linear Regression
Linearity assumption: Are patterns randomly scattered, or do residuals show curves that suggest a non-linear relationship?; Lesson 544 — The Role of Residuals in Diagnostics Lesson 547 — Linearity: The Relationship Must Be Linear Lesson 558 — Identifying Non-Linearity in Residual Plots
Lines: (`geom_line`) connecting observations in sequence; Lesson 1342 — Geometric Objects (geoms)
Linestyle: controls whether your line is solid, dashed, dotted, or dash-dotted.; Lesson 1258 — Customizing Lines: Colors, Styles, and Markers
link function: transforms the expected value of your response variable so it can be modeled with a linear predictor.; Lesson 671 — What is a Link Function?Lesson 672 — The Identity Link Lesson 690 — The Poisson Distribution as a GLM
Link to business costs: "Type I vs Type II errors" becomes "cost of investigating false alarms vs cost of missing real problems"; Lesson 2105 — Translating Between Technical and Business Language
Linked selections: Selecting points in one plot highlights them in others; Lesson 1304 — Subplots and Linked Interactions
List all possible outcomes: and their probabilities; Lesson 152 — Decision Making Under Uncertainty
List relevant variables: in your research question; Lesson 1469 — Building a Simple Causal DAG
List required features: What specific variables does your analysis need?; Lesson 2098 — Identifying Data Availability Gaps Early
Live: Build in pauses for questions.; Lesson 1957 — Adapting Delivery Format: Live vs Async
Live presentations: Keep slides sparse.; Lesson 1957 — Adapting Delivery Format: Live vs Async
Ljung-Box test: is a formal hypothesis test that checks whether residuals show significant autocorrelation at multiple lags at once.; Lesson 783 — Ljung-Box Test for Residual Independence Lesson 799 — Fitting and Diagnosing SARIMA Models
Load: only clean, aggregated, ready-to-query data into the warehouse; Lesson 1817 — Historical Context: Why ETL Came First
Load balancing: assigning work so no worker sits idle while others are overloaded; Lesson 1769 — Task Parallelism and Work Distribution
Load raw data: into your cloud warehouse for most sources; Lesson 1821 — Hybrid Approaches and Modern Data Stacks
Loading: raw data directly into staging tables in the warehouse; Lesson 1816 — What is ELT? Extract, Load, Transform Explained
Loans: table tracks who borrowed what; Lesson 1051 — Introduction to Foreign Keys
Local branches: that exist only on your machine; Lesson 2020 — The Golden Rule of Rebase
Local control: Changes in one region don't affect distant regions; Lesson 662 — Polynomial Features vs Splines
Locks: Constraints can hold locks longer, blocking concurrent operations; Lesson 1060 — Trade-offs: Performance vs Integrity
log: link); Lesson 671 — What is a Link Function?Lesson 678 — Choosing the Right Link Function
log link: solves this by connecting the linear predictor to the expected outcome through a logarithm.; Lesson 675 — The Log Link Lesson 677 — Interpreting Coefficients Under Different Links Lesson 678 — Choosing the Right Link Function Lesson 690 — The Poisson Distribution as a GLM
Log transformation: If log-transformed data looks normal, consider log-normal.; Lesson 193 — Choosing Between Distributions in Practice Lesson 212 — Log Transformations Lesson 591 — When and Why to Transform Variables
Log transformation of X: If you modeled `Y = β₀ + β₁log(X)`, then β₁ represents the change in Y when X is *multiplied* by some factor (like doubling).; Lesson 594 — Interpreting Models After Transformation
Log transformation of Y: If you modeled `log(Y) = β₀ + β₁X`, the coefficient β₁ represents the *proportional* change in Y.; Lesson 594 — Interpreting Models After Transformation
Log-normal: suits variables that are products of many small multiplicative factors—like incomes, stock prices, or city sizes.; Lesson 193 — Choosing Between Distributions in Practice
log-odds: Lesson 673 — The Logit Link Lesson 677 — Interpreting Coefficients Under Different Links Lesson 680 — The Logit Link Function and Odds Lesson 681 — Interpreting Logistic Regression Coefficients Lesson 686 — Assumptions and Diagnostics in Logistic Regression
log-rank test: is the most common statistical test for answering this question.; Lesson 818 — What is the Log-Rank Test?Lesson 823 — Log-Rank Test vs Other Tests Lesson 836 — Employee Turnover and Retention Analysis
Log-rank tests: to compare response timing across different campaign variants; Lesson 841 — Campaign Response Time Analysis
Logical consistency: Lesson 1211 — Domain Validation and Sanity Checks
Logically Connected: Every recommendation must flow directly from your analysis.; Lesson 1970 — Recommendations and Next Steps
Login frequency: (weight: 0.; Lesson 1699 — Engagement Scoring Systems
Logistic regression: is designed specifically for binary outcomes.; Lesson 679 — Logistic Regression Setup and the Binary Response Lesson 1447 — Propensity Score: Concept and Estimation Lesson 1674 — Churn Prediction Models
logit: link); Lesson 671 — What is a Link Function?Lesson 674 — The Probit Link Lesson 678 — Choosing the Right Link Function
logit link: bridges this gap.; Lesson 673 — The Logit Link Lesson 674 — The Probit Link Lesson 677 — Interpreting Coefficients Under Different Links
logit link function: you just learned transforms these probabilities into a scale where linear modeling works, then transforms back to give valid probabilities.; Lesson 679 — Logistic Regression Setup and the Binary Response Lesson 680 — The Logit Link Function and Odds
Logo churn: counts *how many customers* cancel (e.; Lesson 1628 — SaaS Metrics: MRR, ARR, and Logo Churn
Long flat sections: Periods with no events or only censored observations; Lesson 815 — Survival Curve Plots and Interpretation
Long format: (the tidy version) stacks these observations vertically, using one column for the variable name (`month`) and another for its value (`sales`).; Lesson 1144 — Common Violations: Wide vs Long Format
Long-term business viability: A 2% floor vs 20% changes unit economics dramatically; Lesson 1658 — Flattening and Asymptotic Behavior
Long, complex journeys: where no single touchpoint dominates; Lesson 1727 — Linear Attribution Model
Longer-term fluctuations: tied to economic or business cycles, but *without* a fixed period.; Lesson 705 — The Four Classical Components
Longitude: measures east-west position from the Prime Meridian (0°) through ±180°.; Lesson 1308 — Geographic Data Types and Coordinate Systems
LOO: (Leave-One-Out cross-validation) to compare them:; Lesson 1596 — Posterior Predictive Checks and Model Comparison
Look for: Lesson 701 — Deviance Residuals
Look for confounders: What third factor might drive both metrics?; Lesson 1615 — Correlation Without Causation
Look for patterns: Are curves converging?; Lesson 1659 — Comparing Retention Across Cohorts
Look for subgroup patterns: Split your data by the suspected confounder—does the treatment-outcome relationship change or reverse?; Lesson 1429 — Identifying Confounders in Practice
Looks unprofessional: in serious analytical contexts; Lesson 1963 — Removing Chartjunk
Lookups: Fetching related data without a JOIN; Lesson 967 — Subqueries in the SELECT Clause
Losing credibility: Decision-makers can't tell where facts end and opinions begin; Lesson 1927 — Separating Analysis from Advocacy
Love plots: (or balance plots) display SMDs before and after matching, making it easy to see which covariates improved and which remain problematic.; Lesson 1450 — Assessing Balance After Matching
Low baseline rate example: If your current conversion is 2%, improving to 3% (a 50% relative lift!; Lesson 1499 — Adjusting for Baseline Conversion Rates
Low p-value (< α): Your observed frequencies differ significantly from expected frequencies.; Lesson 420 — Interpreting Chi-Squared Test Results
Low p-value (≤ 0.05): Warning!; Lesson 787 — Ljung-Box Test for Residual Autocorrelation
Low statistical power: – You might miss real effects (false negatives).; Lesson 1493 — Why Sample Size Matters in A/B Tests
Lower alpha: (e.; Lesson 1409 — Setting Detection Parameters
Lower alpha (0.01): Like a less-sensitive detector.; Lesson 334 — Setting Alpha: Choosing Your Significance Level
Lower bound only: Mean - t*(SE); Lesson 275 — One-Sided Confidence Bounds
Lower boundary: Q1 - 1.; Lesson 1384 — The IQR Outlier Detection Rule
Lower confidence: to 95% (slightly riskier, smaller sample needed); Lesson 295 — Trade-offs: Precision, Confidence, and Cost
Lower confidence (e.g., 90%): Narrower intervals, but less reliable method; Lesson 267 — Interpreting Confidence Levels
Lower Control Limit (LCL): Typically 3 standard deviations below the mean; Lesson 1396 — Introduction to Control Charts Lesson 1397 — Shewhart Control Chart Basics Lesson 1398 — Control Charts for Means (X-bar Charts)
Lower fence: = Q1 - (1.; Lesson 72 — IQR Method and Tukey's Fences Lesson 1385 — Calculating IQR Fences in Practice
Lower is better: The model with the smallest AIC or BIC is preferred; Lesson 781 — Information Criteria: AIC and BIC
Lower peak: The center is slightly flatter than a normal curve; Lesson 352 — The t-Distribution and Degrees of Freedom
Lower threshold (A): Based on acceptable Type II error (β, false negative rate); Lesson 1511 — Sequential Probability Ratio Test (SPRT)
Lower values are better: they indicate a superior balance of fit and simplicity.; Lesson 785 — Information Criteria: AIC and BIC
Lower variance/standard deviation: = outcomes cluster tightly around the expected value, more predictable; Lesson 148 — Variance and Standard Deviation of Discrete Distributions
Lower λ: means events happen less frequently → longer waiting times; Lesson 164 — The Exponential Distribution
LTV:CAC ratio: divides lifetime value by customer acquisition cost (CAC) to reveal whether you're spending wisely.; Lesson 1667 — LTV:CAC Ratio and Profitability Lesson 1669 — LTV Segmentation and Targeting Lesson 1756 — LTV:CAC Ratio as a Health Metric
LTV:CAC ratios: .; Lesson 1715 — Comparing Channel Performance
Luminance: (or lightness/value) is how bright or dark the color appears, from near-black to near-white.; Lesson 1234 — Color: Hue, Saturation, and Luminance
lurking variable: or **hidden confounder**.; Lesson 497 — The Third Variable Problem Lesson 1423 — The Third Variable Problem

M

M_X(t) = E[e^(tX)]: Lesson 150 — Moment Generating Functions
MA process: ACF cuts off sharply; PACF decays gradually; Lesson 731 — PACF for AR Process Identification
MA(1): Uses only the most recent error; Lesson 775 — Moving Average (MA) Models Lesson 777 — Identifying MA Order (q) Using ACF
MA(2): Uses the two most recent errors; Lesson 775 — Moving Average (MA) Models Lesson 777 — Identifying MA Order (q) Using ACF
MA(q): process shows **gradual exponential decay** or a damped sinusoidal pattern in the PACF—no clean cutoff.; Lesson 732 — PACF Patterns for Common Models Lesson 775 — Moving Average (MA) Models Lesson 777 — Identifying MA Order (q) Using ACF
Machine failures: For certain systems, past survival time doesn't reduce future failure risk; Lesson 167 — Memoryless Property of Exponential
Machine Learning (ML): These are techniques that let computers find patterns and make predictions automatically.; Lesson 7 — The Data Science Skill Stack
Machine Learning Feature Scaling: Many algorithms (like k-nearest neighbors or neural networks) perform better when features are standardized to similar ranges.; Lesson 201 — Z-Score Applications and Limitations
Machine learning methods: (Isolation Forest, autoencoders) for complex multivariate patterns; Lesson 1411 — Applications and Limitations
MAD: (Mean Absolute Deviation) is useful when you want interpretability similar to standard deviation but with less sensitivity to outliers.; Lesson 54 — When to Use Each Measure
MAD (Median Absolute Deviation): instead of standard deviation (a robust measure of spread you learned earlier); Lesson 73 — Modified Z-Score Using MAD
MAE (Mean Absolute Error): Average of absolute differences; easy to interpret in original units; Lesson 790 — Out-of-Sample Forecast Evaluation
magnitude: of the difference between proportions.; Lesson 413 — Effect Size and Practical Significance Lesson 637 — Interpreting Dummy Variable Coefficients
Mahalanobis distance: measures how far a point is from the center of a multivariate distribution, accounting for correlations between variables.; Lesson 74 — Multivariate Outlier Detection Lesson 1381 — Multivariate Z-Score Methods
Main branch (`main`): Your production-ready, validated code.; Lesson 2035 — Branching Strategies for Experiments
Main effect of degree: the intercept difference between groups; Lesson 652 — Interpreting Categorical × Continuous Interactions
Main effect of experience: the baseline slope (for the reference group, no degree); Lesson 652 — Interpreting Categorical × Continuous Interactions
Main effects: test whether each factor matters *on its own*, averaging across all levels of the other factor.; Lesson 464 — Main Effects in Two-Way ANOVA Lesson 465 — Interaction Effects Lesson 1689 — Multivariate Testing and Personalization
Main ingredients (geom layers): Points, lines, bars added with `geom_*()` functions; Lesson 1347 — Understanding Layers in ggplot2
Maintain integrity: by being transparent about what you can and cannot deliver; Lesson 34 — Recognizing Boundaries of Competence
Maintain predictive power: Often little loss in R-squared; Lesson 585 — Remedies: Variable Selection
Maintain Specification Files: Lesson 2046 — Best Practices for Environment Management in Teams
Maintain valid α: and power guarantees; Lesson 1510 — Sequential Testing Overview
Maintainability: Adding or reordering parameters won't break your code; Lesson 1106 — Parameter Placeholders: Named Parameters
Maintenance: Rule-based systems are trivial to update.; Lesson 2123 — Simple Rules Beat Complex Models
Maintenance burden: Outdated code may break when dependencies change, triggering false alarms; Lesson 2135 — Dead Experimental Code and Feature Sprawl
Maintenance scheduling: When k > 1, rising hazard rates signal when preventive maintenance is cost-effective; Lesson 188 — Weibull Distribution: Hazard Function and Reliability
Make a Decision: Lesson 447 — Conducting One-Way ANOVA in Practice
Make better decisions: React to actual changes, not predictable cycles; Lesson 748 — Seasonally Adjusted Data
Make decisions: Knowing the typical outcome helps guide business choices and predictions; Lesson 38 — What is Central Tendency?Lesson 2121 — Timeboxing and Deadlines
Make decisions faster: on average; Lesson 1510 — Sequential Testing Overview
Make probabilistic statements: Calculate P(μ₁ > μ₂ | data) or credible intervals for δ; Lesson 1570 — Comparing Two Means: Bayesian Approach
Makes everything positive: – no more negative values, so you're only looking at spread magnitude; Lesson 560 — Scale-Location Plot (Spread-Location Plot)
Making findings accessible: Translate technical metrics into business language.; Lesson 2090 — Stage 6: Interpretation and Insight Generation
Making multiplicative relationships additive: – Easier to model and interpret; Lesson 212 — Log Transformations
Manager: Lead a team (4-8 people), conduct 1-on-1s, handle performance reviews, remove blockers; Lesson 2140 — Individual Contributor vs Management Tracks
Manages memory: by processing chunks sequentially when needed; Lesson 1790 — What is Dask and When to Use It
Managing execution order: through task graphs or workflow engines; Lesson 1769 — Task Parallelism and Work Distribution
Mann-Whitney U test: (also called the Wilcoxon Rank-Sum test) offers a robust alternative to the two-sample t-test.; Lesson 393 — Mann-Whitney U Test (Wilcoxon Rank-Sum)
Manual entry: Recording observations or measurements; Lesson 11 — Data Collection and Acquisition
Manual line-by-line parsing: Lesson 1141 — Recovering from Corrupted or Partially Broken Data
Manual transformations: (typical Python workflow):; Lesson 1373 — Statistical Transformations: Built-in vs Manual
Manually edit: Choose which changes to keep, or combine them, then remove the conflict markers; Lesson 2018 — Resolving Conflicts During Rebase
Manufacturing: Predicting when machines need maintenance before they break; Lesson 6 — Common Data Science Applications Lesson 1412 — What is Change-Point Detection?
Manufacturing Defects: A production line averages 2 defects per 1,000 units.; Lesson 144 — Poisson Applications: Arrivals and Events
Map: and **Reduce**.; Lesson 1770 — The MapReduce Programming Model Lesson 1827 — Transformation Patterns: Map, Filter, Aggregate
Map common path variations: – use path analysis or Sankey diagrams to visualize popular alternate routes; Lesson 1683 — Multi-Path and Non-Linear Funnels
Map projections: are mathematical transformations that flatten the globe onto a plane—like peeling an orange and trying to lay the peel flat.; Lesson 1308 — Geographic Data Types and Coordinate Systems
Map secondary applications: What could someone build *on top of* your work that causes harm?; Lesson 1924 — Red Team Thinking for Data Scientists
MAR (Missing at Random): Missingness relates to *observed* data (e.; Lesson 1207 — Missing Data Assessment and Strategy
marginal effect: the change in your outcome variable when that predictor increases by one unit, *while holding all other predictors constant*.; Lesson 604 — Marginal Effects and Ceteris Paribus Lesson 659 — Interpreting Polynomial Regression Coefficients
marginal likelihood: ) is the denominator that normalizes the posterior distribution.; Lesson 1536 — The Evidence (Marginal Likelihood)Lesson 1546 — The Role of the Normalizing Constant
Markdown text: Plain text with simple formatting (headers, lists, bold, italics); Lesson 1983 — R Markdown for Dynamic Reports
Marker size reduction: Smaller points reduce overlap; Lesson 1310 — Point Maps and Scatter Plots on Maps
Marker styles: change the shape of each point (circles, squares, triangles, etc.; Lesson 1265 — Scatter Plots: Relationships Between Variables
Markers: are symbols that appear at each data point along your line.; Lesson 1258 — Customizing Lines: Colors, Styles, and Markers Lesson 1272 — Colors, Markers, and Line Styles
Market or product changes: Launching in new markets, adding product lines, or facing competitive threats requires rethinking which metrics matter most and how they interconnect.; Lesson 1626 — Maintaining and Evolving Metric Trees
Marketing: Identifying which customers are likely to cancel subscriptions; Lesson 6 — Common Data Science Applications
Marketing effectiveness: Compare cohorts from different channels; Lesson 1644 — What is Cohort Analysis?
Marketing expenses: ad spend across all channels, content creation, marketing tools and software; Lesson 1753 — Customer Acquisition Cost (CAC): Components and Calculation
Marketplace/Transactional: Lesson 1657 — Day-1, Day-7, Day-30 Benchmarks
Marketplaces: More sellers (treatment) improve selection for all buyers; Lesson 1527 — Ignoring Network Effects
Markov chain: is a sequence of states where the next state depends *only* on the current state, not on how you got there.; Lesson 1589 — Markov Chains: The Foundation of MCMC Lesson 1733 — Markov Chain Attribution Models
Mask volatility: Show a smooth period while hiding the chaos before and after; Lesson 1241 — Cherry-Picking Time Ranges
masking: .; Lesson 1394 — Sequential Application of Grubbs' Test Lesson 1407 — The ESD Component
Massively scalable compute engines: (BigQuery, Snowflake, Redshift): These could query petabytes of data in seconds, separating storage from compute power; Lesson 1818 — The Rise of ELT: Cloud Storage and Compute
Match: Apply exact matching on these coarsened bins—much easier now!; Lesson 1449 — Coarsened Exact Matching (CEM)
Match regions: Pair test and control geographies with similar historical sales, demographics, and seasonality; Lesson 1746 — Geo-Lift Experiments
Match the question: If you care about *means*, try parametric first.; Lesson 398 — Choosing Between Parametric and Non-Parametric Tests
Match the Response Distribution: Lesson 678 — Choosing the Right Link Function
Match user intent: Steps should reflect meaningful progress, not just technical page loads.; Lesson 1679 — Defining Funnel Steps and Events
Matched Pairs: Lesson 369 — When to Use a Paired t-Test Lesson 435 — McNemar's Test: Paired Categorical Data
Matched records with discrepancies: (e.; Lesson 941 — Use Cases: Data Reconciliation
Matched rows: Both sides have real values (no NULLs in key columns); Lesson 937 — Identifying Matched vs Unmatched Rows
Mathematical convenience: Some priors (called *conjugate priors*) make calculations simpler; Lesson 1534 — The Prior Distribution
Mathematical dependencies: Subtotals should equal their parts.; Lesson 1155 — Consistency Checks Across Fields
Mathematical elegance: Squared terms have smooth derivatives, making it possible to solve for the optimal slope and intercept using calculus.; Lesson 517 — The Least Squares Criterion
Mathematically elegant: If your prior is Beta(α, β) and you observe `k` successes in `n` trials (Binomial likelihood), your posterior is simply Beta(α + k, β + n - k).; Lesson 1551 — Beta-Binomial Conjugacy
Mathematics & Statistics: Lesson 1 — Defining Data Science
Matplotlib: historically had a more technical, MATLAB-inspired look with white backgrounds and primary colors (blue, orange, green).; Lesson 1371 — Default Aesthetics and Design Choices Lesson 1373 — Statistical Transformations: Built-in vs Manual
Matplotlib charts: use `st.; Lesson 1333 — Displaying Charts and Tables in Streamlit
Matplotlib subplots: Best when you need completely different plot types, custom layouts, or fine-grained control over individual panels; Lesson 1372 — Faceting: ggplot2 vs Seaborn and Matplotlib Subplots
Matplotlib's Object-Oriented Interface: treats plotting like building with objects.; Lesson 1370 — Syntax Philosophy: Grammar of Graphics vs Object-Oriented
Matplotlib's subplots: , however, are more imperative and manual.; Lesson 1372 — Faceting: ggplot2 vs Seaborn and Matplotlib Subplots
Matrix plots: Visualize data tables (heatmaps, cluster maps); Lesson 1281 — Introduction to Seaborn's Statistical Plots
Mature businesses: optimize for efficiency—targeting shorter payback (under 12 months), higher ROAS (3x+), and stable CAC.; Lesson 1759 — Optimizing ROAS, CAC, and Payback Together
MAU: tracks monthly uniques.; Lesson 1631 — Social Media Metrics: DAU/MAU and Content Engagement
MAX: together with **GROUP BY** to create rich summaries of grouped data.; Lesson 892 — GROUP BY with Different Aggregate Functions Lesson 894 — NULL Values in GROUP BY
Maximize ROAS: You'll likely reduce spend, raise CAC (fewer efficient channels), and shorten payback (only safest bets); Lesson 1759 — Optimizing ROAS, CAC, and Payback Together
Maximizing external validity: You recruit diverse students from multiple schools, allow natural variation in implementation, and study real-world conditions.; Lesson 1441 — Internal vs External Validity
Maximizing internal validity: You recruit a highly homogeneous group of students, control every aspect of the environment, use strict protocols, and carefully monitor compliance.; Lesson 1441 — Internal vs External Validity
Maximum: 75°F; Lesson 47 — Range: The Simplest Measure
Maximum (Max): the largest value; Lesson 59 — The Five-Number Summary and Box Plots
maximum likelihood estimation: , which:; Lesson 628 — Likelihood Ratio Tests Lesson 780 — ARIMA Model Estimation
Maximum test: Only checks if the *largest* value is an outlier; Lesson 1393 — Two-Sided vs One-Sided Grubbs' Test
McFadden's R²: Lesson 702 — Pseudo R-Squared Measures
McNemar's Test: is specifically designed for situations where:; Lesson 435 — McNemar's Test: Paired Categorical Data Lesson 437 — Applications: Clinical Trials and Market Research
MDE: = minimum detectable effect (absolute difference in means); Lesson 1498 — Sample Size Formulas for Continuous Metrics
mean: (or arithmetic average) is a single number that represents the "center" of a dataset by distributing the total value equally across all observations.; Lesson 39 — The Mean (Arithmetic Average)Lesson 40 — The Median: Middle Value Lesson 42 — Comparing Mean, Median, and Mode Lesson 52 — Mean Absolute Deviation (MAD)Lesson 141 — Mean and Variance of Poisson Distribution Lesson 147 — Expected Value of Discrete Random Variables Lesson 163 — Uniform Distribution: Mean and Variance Lesson 174 — Symmetry and the Mode, Median, Mean (+5 more)
Mean (Expected Value): E(X) = *np*; Lesson 129 — Binomial Mean and Variance Lesson 161 — The Continuous Uniform Distribution
Mean = 0: The center is at zero; Lesson 194 — The Standard Normal Distribution
Mean = 1/λ: The average waiting time is simply the inverse of the rate; Lesson 166 — Exponential Distribution: Mean and Variance
Mean Absolute Error (MAE): on historical data.; Lesson 759 — Choosing the Smoothing Parameter α Lesson 763 — Evaluating Exponential Smoothing Models
Mean of X: Lesson 180 — Parameters and Moments of the Log-Normal
Mean Square Between (MSB): Variance *between* groups; Lesson 443 — Mean Squares and the F-Ratio
Mean Square Within (MSW): Variance *within* groups (also called Mean Square Error, MSE); Lesson 443 — Mean Squares and the F-Ratio
Mean Squared Error (MSE): or **Mean Absolute Error (MAE)** on historical data.; Lesson 759 — Choosing the Smoothing Parameter α Lesson 763 — Evaluating Exponential Smoothing Models
mean squares: (variance estimates) by dividing sum of squares by their respective df.; Lesson 442 — Degrees of Freedom in ANOVA Lesson 443 — Mean Squares and the F-Ratio
Mean time between crashes: 1/0.; Lesson 166 — Exponential Distribution: Mean and Variance
Mean-variance relationship: For Poisson, `Var(Y) = μ` (variance equals the mean); Lesson 690 — The Poisson Distribution as a GLM
Measurability: Quantifiable outcomes; Lesson 1200 — Formulating Specific, Testable Hypotheses
Measurable: What metrics define success?; Lesson 1166 — Defining the Business Question Lesson 1478 — Defining Success Metrics Lesson 1605 — Characteristics of Good North Star Metrics Lesson 2094 — Defining Success Metrics Upfront
Measurable Success Criteria: "We need 70% accuracy" or "Reduce customer churn by 15%"; Lesson 10 — Problem Definition and Scoping
Measure impact: Did that feature launch in March improve Day-30 retention for the March cohort compared to February?; Lesson 1659 — Comparing Retention Across Cohorts
Measure lift: Compare actual outcomes in test regions vs predicted outcomes (based on control region trends); Lesson 1746 — Geo-Lift Experiments
Measure outcomes: Track your KPI for both groups over the same time window; Lesson 1641 — Isolating Effects with Control Groups
Measure the outcome: (purchases, sign-ups, visits) for both groups; Lesson 1747 — Ghost Ads and PSA Tests
Measurement: Salary data that systematically underreports gig economy earnings; Lesson 1878 — What is Bias in Data?
Measurement bias: occurs when your data collection instruments, procedures, or definitions consistently produce inaccurate values.; Lesson 1880 — Measurement and Label Bias
measurement error: ?; Lesson 589 — Deciding Whether to Remove Outliers Lesson 1464 — Instrumental Variables: The Endogeneity Problem
Measurement error in X: Recording mistakes correlate with unpredictable noise; Lesson 553 — Exogeneity: X Must Be Independent of Errors
Measures customer value directly: – It reflects how much value customers extract from your product; Lesson 1604 — What is a North Star Metric?
Medcouple-based detection: measures skewness more robustly than traditional methods; Lesson 1388 — Limitations and Alternatives to IQR Detection
Media Mix Modeling (MMM): and **attribution modeling** help marketers understand marketing effectiveness, they examine the problem from fundamentally different angles—like comparing satellite imagery to street-level photography.; Lesson 1736 — MMM vs Attribution: Key Differences
median: income is $35,000 (the middle value—much more representative of the typical person); Lesson 40 — The Median: Middle Value Lesson 42 — Comparing Mean, Median, and Mode Lesson 56 — Understanding Percentiles and Their Interpretation Lesson 73 — Modified Z-Score Using MAD Lesson 174 — Symmetry and the Mode, Median, Mean Lesson 306 — Bootstrap for Non-Standard Problems Lesson 1173 — Numerical Variable Summary Statistics Lesson 1380 — Modified Z-Score Using Median
Median (Q2): the 50th percentile (middle value); Lesson 59 — The Five-Number Summary and Box Plots
Median Absolute Deviation (MAD): (a robust measure of spread); Lesson 1380 — Modified Z-Score Using Median
Median survival times: for each group (from lesson 816); Lesson 817 — Comparing Multiple Survival Curves
Median time-to-conversion: How long until half your prospects convert?; Lesson 839 — Time-to-Conversion in Marketing Funnels
mediator: sits *on* the causal path between treatment and outcome.; Lesson 1471 — Mediators and Colliders Lesson 1476 — Common DAG Patterns and Pitfalls
Mediators: On the path (X → M → Y).; Lesson 1471 — Mediators and Colliders
Medical datasets: excluding or underrepresenting certain populations; Lesson 1881 — Historical and Societal Bias
Medical measurements: (comparing height, weight, and blood pressure on the same scale); Lesson 200 — Comparing Values Across Different Distributions
Medical trials: that only track patients who completed treatment miss those who got too sick to continue; Lesson 247 — Survivorship Bias
Medium effect: d ≈ 0.; Lesson 385 — Cohen's d for Standardized Mean Differences Lesson 386 — Effect Size Interpretation Guidelines Lesson 429 — Effect Size: Cramér's V and Phi
Medium-risk: Automated email campaigns highlighting underused features that correlate with retention; Lesson 1676 — Win-Back and Retention Strategies
Meetups and conferences: offer face-to-face learning and relationship building.; Lesson 2144 — Networking and Community Engagement
Memorable and motivating: Teams should *want* to achieve it; Lesson 1609 — Setting Effective Objectives
Memory efficiency: Parquet/Feather > CSV > JSON > Excel; Lesson 1133 — Performance Considerations Across Formats Lesson 1802 — Filtering During Read with dtype and Converters
memoryless property: when r = 1, but only the geometric distribution is truly memoryless in the classical sense.; Lesson 137 — Geometric vs Negative Binomial: Key Differences Lesson 167 — Memoryless Property of Exponential
Mental Model: Humans naturally think "group by region, then by product" differently than "group by product, then by region"; Lesson 906 — Order Matters: Column Sequence in GROUP BY
Mercator: Preserves angles (useful for navigation) but distorts area dramatically near poles; Lesson 1308 — Geographic Data Types and Coordinate Systems
Merge: creates a new commit that combines two branches, preserving the complete history of both branches.; Lesson 2014 — Understanding Git Rebase vs Merge Lesson 2026 — Merge Strategies: Merge vs Squash vs Rebase
merge conflict: (covered in upcoming lessons).; Lesson 2009 — Three-Way Merges Lesson 2010 — Merge Conflicts: What They Are Lesson 2017 — Understanding Merge Conflicts
Merge joins: are efficient for pre-sorted data or when the database can sort cheaply, advancing through both tables in lockstep.; Lesson 957 — Join Strategies: Nested Loop, Hash, Merge
Mesokurtic: (kurtosis ≈ 3 or excess kurtosis ≈ 0): Matches the normal distribution.; Lesson 66 — Kurtosis: Definition and Interpretation
Message: The explanation you provided with `git commit`; Lesson 1999 — Viewing Commit History
Message passing coordination: Nodes use network protocols to exchange data.; Lesson 1771 — Shared-Nothing Architecture
Messaging apps: Features that change how one person messages affect the recipient; Lesson 1527 — Ignoring Network Effects
Metadata: is "data about data"—the descriptive information that explains what each piece of data means.; Lesson 23 — Data Provenance and Metadata Lesson 1163 — Metadata and Data Dictionaries Lesson 1871 — Why Version Control for Data?
Metadata Database: Stores DAG runs, task states, and logs; Lesson 1833 — Introduction to Apache Airflow
Method: Interview whoever agrees until each quota is full; Lesson 240 — Quota Sampling
Method calls: generate SQL behind the scenes; Lesson 1117 — What is an ORM and Why Use It?
Methodological transparency: Show your process, not just results; Lesson 2141 — Building a Portfolio and Personal Brand
Methodology: (brief): High-level approach without technical minutiae; Lesson 1966 — Report Structure and Executive Summary
Methods used: Which algorithms?; Lesson 1917 — Transparency in Analysis and Models
Metric attribution: is the process of assigning credit to the specific drivers that caused a metric to move.; Lesson 1637 — What is Metric Attribution?
Metric Stability: Your success metrics shouldn't show statistically significant differences between identical groups; Lesson 1483 — Pre-Experiment Validation
metric tree: (also called a "metric hierarchy" or "decomposition tree") is a visual framework that breaks a single top-level metric—often your North Star Metric—into the sub-metrics that mathematically drive it.; Lesson 1621 — Metric Trees: Structure and Purpose Lesson 1632 — Financial Services Metrics: AUM, NIM, and Credit Metrics
Metrics/Measures: Numerical values you aggregate (sum, average, count); Lesson 1808 — Star Schema and Fact Tables
Mid-level managers: Add one layer of confidence.; Lesson 1953 — Adjusting Statistical Depth by Audience
Mid-period acquisitions: Should new customers acquired *during* the period count in the denominator?; Lesson 1671 — Churn Rate Calculation Methods
Middle touches aren't irrelevant: They nurtured the relationship and kept your brand top-of-mind; Lesson 1729 — Position-Based (U-Shaped) Attribution
MIN: , and **MAX**—together with **GROUP BY** to create rich summaries of grouped data.; Lesson 892 — GROUP BY with Different Aggregate Functions Lesson 894 — NULL Values in GROUP BY
Minimize: data movement across the network; Lesson 1780 — Transformations vs Actions in Spark
Minimize CAC: ROAS might drop (reaching less-qualified audiences), and payback could extend; Lesson 1759 — Optimizing ROAS, CAC, and Payback Together
Minimize color: Use color purposefully to highlight the key finding, not to decorate every category.; Lesson 1958 — Simplifying Visual Complexity
Minimize payback: You may sacrifice ROAS (focusing on quick wins, not best returns) and accept higher CAC initially; Lesson 1759 — Optimizing ROAS, CAC, and Payback Together
Minimum: 65°F; Lesson 47 — Range: The Simplest Measure Lesson 1484 — Duration and Timing Considerations
Minimum (Min): the smallest value; Lesson 59 — The Five-Number Summary and Box Plots
Minimum Detectable Effect (MDE): is the smallest effect size your A/B test is designed to reliably detect.; Lesson 1480 — Minimum Detectable Effect (MDE)Lesson 1493 — Why Sample Size Matters in A/B Tests Lesson 1494 — Effect Size: The Minimum Detectable Effect
Minimum sizes matter: For digital displays, axis labels should typically be at least 10–12 points; titles 14–18 points.; Lesson 1252 — Font Size, Typeface, and Readability
Minimum test: Only checks if the *smallest* value is an outlier; Lesson 1393 — Two-Sided vs One-Sided Grubbs' Test
Minimum Wage Changes: One of the most famous DiD studies compared New Jersey (which raised minimum wage) to Pennsylvania (which didn't).; Lesson 1459 — Real-World DiD Applications
Minor departures matter more: – slight skewness or a few outliers become "statistically significant"; Lesson 209 — Sample Size Considerations in Normality Tests
Misaligned incentives: The team running the test is measured on engagement, not profitability, so they optimize for engagement.; Lesson 1530 — Mismatched Metrics and Goals
Misallocating budget: based on attribution models that reward proximity, not causation; Lesson 1717 — Incrementality and True Channel Impact
Misallocating resources: to ineffective initiatives; Lesson 1637 — What is Metric Attribution?
Miss true anomalies: in heavy-tailed distributions; Lesson 1390 — Assumptions of Grubbs' Test
Missed opportunities: Early signals of success or failure go unnoticed; Lesson 1617 — The Danger of Lagging-Only Metrics
Missing context: Always include a clear legend, data source, and what the metric represents.; Lesson 1309 — Choropleth Maps: Basics and Best Practices
Missing details: Analysis steps aren't fully documented (remember Documentation Standards?; Lesson 30 — The Reproducibility Crisis and Solutions
Missing hidden drivers: that actually moved the needle; Lesson 1637 — What is Metric Attribution?
Missing information: needed to proceed; Lesson 1681 — Time-Based Funnel Analysis
Missing ON Clause: Lesson 955 — Avoiding Cartesian Products
Missing patterns: Gaps or unusual concentrations?; Lesson 1208 — Distribution Checks for All Variables
Missing required values: NULLs where they shouldn't exist; Lesson 1150 — What is Data Validation?
Missing Value Codes: How nulls or missing data are represented (NA, -999, blank); Lesson 2064 — Creating Data Dictionaries
Missing values: more gracefully than classical methods; Lesson 745 — STL Decomposition (Seasonal-Trend Loess)Lesson 1762 — Extended Dimensions: Veracity and Value
Mistaking correlation patterns: Seeing A and Y correlate and assuming A → Y, when really both are caused by unmeasured C.; Lesson 1476 — Common DAG Patterns and Pitfalls
MIT or Apache 2.0: Permissive licenses allowing commercial use with minimal restrictions; Lesson 2082 — Choosing a License for Data Science Projects
Mitigation measures: technical safeguards (encryption, access controls) and organizational policies (training, audits); Lesson 1910 — Data Protection Impact Assessments (DPIAs)
Mixed (Numeric × Categorical): When one variable is numeric and the other categorical (like "salary" across "department"), you're comparing distributions of the numeric variable across groups.; Lesson 1182 — Choosing Analysis Methods by Variable Types
Mixed ARMA Process: Lesson 733 — Using ACF and PACF Together
ML Engineers: focus on *production systems and scale*.; Lesson 2138 — Data Analyst vs Data Scientist vs ML Engineer
MLlib: provides scalable machine learning algorithms that work on distributed data:; Lesson 1775 — Spark Components: Core, SQL, MLlib, Streaming
MMM: works at the aggregate level, analyzing total spend and performance across channels over time (typically weeks or months).; Lesson 1736 — MMM vs Attribution: Key Differences
Modality: One peak (unimodal), two (bimodal), or many?; Lesson 63 — Understanding Distribution Shape
mode: is the value that appears most frequently in your data.; Lesson 41 — The Mode: Most Frequent Value Lesson 42 — Comparing Mean, Median, and Mode Lesson 174 — Symmetry and the Mode, Median, Mean Lesson 1173 — Numerical Variable Summary Statistics
Model 1: Start with just square footage; Lesson 617 — Practical Example: Variable Selection
Model 1 (Binary): Predicts whether an observation is a "certain zero" (structural) using logistic regression.; Lesson 695 — Zero-Inflated Models
Model 2: Add bedrooms; Lesson 617 — Practical Example: Variable Selection
Model 2 (Count): For those who *can* have the event, predicts the count using Poisson or negative binomial regression.; Lesson 695 — Zero-Inflated Models
Model 3: Add house age; Lesson 617 — Practical Example: Variable Selection
Model 4: Add distance to coffee shop; Lesson 617 — Practical Example: Variable Selection
Model A: Predicts house price using square footage (R² = 0.; Lesson 612 — Why R-Squared Alone Is Misleading
model artifacts: , version everything: the serialized model file, training code commit hash, hyperparameters, training data version, and evaluation metrics.; Lesson 1877 — Versioning Strategies for Different Data Types Lesson 2091 — Stage 7: Communication and Handoff
Model artifacts and cache: Lesson 2031 — Using .gitignore for Data Science Projects
Model B: Predicts house price using square footage + owner's favorite color (R² = 0.; Lesson 612 — Why R-Squared Alone Is Misleading Lesson 630 — Bayesian Information Criterion (BIC)
Model checking: Does your fitted model generate realistic fake data?; Lesson 1571 — Posterior Predictive Distribution for New Data
Model comparison: Test models with different subsets and compare fit metrics; Lesson 585 — Remedies: Variable Selection
Model debt: Quick fixes to model performance without understanding why they work; Lesson 2131 — What is Technical Debt in Data Science?
Model Development: involves selecting appropriate statistical methods or machine learning algorithms, training them on your prepared data, and tuning parameters.; Lesson 2089 — Stage 5: Model Development and Validation
Model insights: A baseline model might show your problem is too easy (100% accuracy suggests data leakage) or impossibly hard (random performance suggests the question can't be answered with available data).; Lesson 2109 — Why Data Science is Inherently Iterative
Model metadata: Training parameters, hyperparameters, dataset versions, and evaluation scores (stored in JSON or YAML); Lesson 2034 — Committing Data Artifacts and Model Outputs
Model mismatch: Real problems often involve non-conjugate likelihoods or complex dependencies that conjugates can't handle; Lesson 1555 — Advantages and Limitations of Conjugate Priors
Model performance metrics: "This model is 85% accurate on the test set"; Lesson 2122 — When Uncertainty Is Acceptable
Model performance plateaus: Your validation accuracy improves from 0.; Lesson 2116 — Diminishing Returns and the 80/20 Rule
Model selection: (certain algorithms handle discrete vs continuous differently); Lesson 18 — Numerical Variables: Discrete and Continuous
Model staleness: Your model was trained on 2022 data.; Lesson 2136 — Monitoring Gaps and Silent Failures
Model versioning gaps: occur when you can't reliably reproduce a model's results because critical information wasn't tracked: which exact code version, which data snapshot, which hyperparameters, which library versions, or even which random seed was used.; Lesson 2134 — Model Versioning and Reproducibility Gaps
Model versions: `model-churn-v2`, `fraud-detector-deployed`; Lesson 2037 — Tagging Releases and Experiment Snapshots
Model-ready format: Machine learning libraries expect features in columns and samples in rows.; Lesson 1149 — Benefits of Tidy Data for Downstream Work
Modeling: Lesson 9 — The Data Science Lifecycle Overview Lesson 13 — Exploratory Analysis and Modeling
Modeling considerations: "Imbalanced classes—use stratified sampling"; Lesson 1212 — EDA Summary Documentation and Next Steps
Modeling relationships: Capture how variables relate to each other (e.; Lesson 1901 — Synthetic Data Generation
Moderate baseline rate example: If your baseline is 50%, improving to 51% has much higher variance (0.; Lesson 1499 — Adjusting for Baseline Conversion Rates
Moderately skewed data: n ≥ 30 usually works; Lesson 220 — Sample Size Requirements for the CLT
Modern best practice: Use Welch's t-test by default for two independent samples.; Lesson 362 — Welch's t-Test for Unequal Variances
Modern tools available: MCMC samplers and probabilistic programming libraries handle non-conjugate cases well; Lesson 1556 — Choosing Between Conjugate and Non-Conjugate Priors
Modern, Python-first orchestration: focused on developer experience.; Lesson 1839 — Alternative Orchestration Tools
Modes: `overwrite`, `append`, `ignore`, `errorIfExists`; Lesson 1779 — Reading and Writing Data in Spark
Modified box plots: adjust the fence calculations to account for skewness:; Lesson 1388 — Limitations and Alternatives to IQR Detection
Modified files: Files you've changed since your last commit but haven't staged yet; Lesson 1998 — Checking Repository Status
modified Z-score: replaces the vulnerable mean and standard deviation with robust alternatives:; Lesson 73 — Modified Z-Score Using MAD Lesson 1380 — Modified Z-Score Using Median
Modules: (`.; Lesson 2071 — Modular Code: Functions and Scripts
Monetary Value: How much do they spend?; Lesson 1703 — RFM Analysis: Recency, Frequency, Monetary Value
Monitor cohort-level performance: to detect when trade-offs shift; Lesson 1759 — Optimizing ROAS, CAC, and Payback Together
Monitor Index Statistics: Track metrics like index size, scan counts vs seek counts, and fragmentation percentage.; Lesson 1086 — Index Maintenance and Monitoring
Monitoring & Maintenance: Lesson 9 — The Data Science Lifecycle Overview
Monitoring infrastructure: You need systems to check stopping conditions regularly (daily, hourly, or continuously); Lesson 1515 — Trade-offs: Sample Size, Speed, and Complexity
Monitoring overhead: You need infrastructure to detect drift before it damages performance; Lesson 2128 — Data Distribution Shifts Frequently
Monitoring recommendations: what to watch when it's live; Lesson 2091 — Stage 7: Communication and Handoff
Monotonic: S(t) never increases; it either stays flat or decreases; Lesson 810 — The Survival Function S(t)
monotonic relationships: where one variable consistently increases as the other increases (positive monotonic) or consistently decreases (negative monotonic)—even if the relationship isn't a straight line.; Lesson 486 — Spearman's Rank Correlation Coefficient Lesson 490 — Kendall's Tau vs Spearman's Rho
Monotonic vs Linear Relationships: Lesson 487 — When to Use Spearman vs Pearson
Monthly Active Users (MAU): does the same over a 30-day window.; Lesson 1694 — Daily Active Users (DAU) and Monthly Active Users (MAU)
Monthly cycles: Credit card spending spikes at month-end; Lesson 707 — Seasonality: Regular Periodic Patterns
Monthly data: with yearly seasonality → period = 12; Lesson 746 — Choosing Seasonal Period
Monthly Recurring Revenue (MRR): , a metric tree might decompose it as:; Lesson 1621 — Metric Trees: Structure and Purpose
Monthly vs. annual churn: A 5% monthly churn rate doesn't equal 60% annual churn.; Lesson 1671 — Churn Rate Calculation Methods
More complex write logic: to keep redundant data synchronized; Lesson 1071 — When to Denormalize: Performance Trade-offs
More normal-looking: as sample size increases; Lesson 252 — Sampling Distribution of the Sample Mean
More Type II errors: – You're more likely to miss real effects (false negatives increase); Lesson 342 — Alpha Level Trade-offs
More variability: → Wider margin (unpredictable data means less precision); Lesson 294 — Margin of Error and Its Components
Most accurate: channel for quantitative data—humans excel at comparing positions along a common scale.; Lesson 1231 — Channels of Visual Encoding
Most Common Category: Lesson 644 — Choosing a Reference Category
most critical: assumption for chi-squared tests.; Lesson 426 — Assumptions and Sample Size Requirements Lesson 550 — Normality of Residuals
Most Likely Values: Lesson 1539 — Interpreting Posterior Probabilities
Motion: Movement catches the eye before anything else; Lesson 1235 — Pre-Attentive Attributes
moving average: structure of order q.; Lesson 726 — Using ACF for Model Identification Lesson 750 — What is a Moving Average?Lesson 753 — Centered vs Trailing Moving Averages Lesson 1017 — Moving Averages with Window Frames
Moving Average (MA) models: use previous *forecast errors* (also called residuals or shocks).; Lesson 775 — Moving Average (MA) Models
Moving averages: give equal weight to all observations in the window.; Lesson 764 — Exponential Smoothing vs Moving Averages
MPP (massively parallel processing): .; Lesson 1813 — Modern Cloud Data Warehouses: Snowflake, BigQuery, Redshift
MRR: is the normalized monthly value of all active subscriptions.; Lesson 1628 — SaaS Metrics: MRR, ARR, and Logo Churn
MS (Mean Square): SS divided by df—the average variation per degree of freedom; Lesson 444 — The ANOVA Table
Much higher than 3:1: (e.; Lesson 1756 — LTV:CAC Ratio as a Health Metric
Multi-panel dashboards: where consistent scales prevent visual confusion; Lesson 1276 — Sharing Axes Between Subplots
Multi-path funnels: recognize that users can reach the same endpoint through different sequences of events.; Lesson 1683 — Multi-Path and Non-Linear Funnels
Multi-step ahead forecasting: projects multiple periods into the future (e.; Lesson 794 — Forecasting Concepts and Horizons
Multi-touch: Credit is distributed across multiple touchpoints (linear, time-decay, position-based); Lesson 1637 — What is Metric Attribution?
multicollinearity: (highly correlated predictors) before modeling; Lesson 510 — Correlation Matrices: Construction and Display Lesson 511 — Reading and Interpreting Correlation Matrices Lesson 513 — Applications: Feature Selection and Multicollinearity Lesson 622 — Relationship Between F-Test and t-Tests Lesson 661 — Centering Predictors for Polynomials Lesson 1192 — Correlation Matrices and Heatmaps
multimodal: distribution.; Lesson 41 — The Mode: Most Frequent Value Lesson 1175 — Histograms for Distribution Shape
Multimodal data: Multiple clusters make mean/std deviation misleading; Lesson 1379 — Assumptions and Limitations
Multiple columns: "What's the average salary for each job title *within* each department?; Lesson 905 — Grouping by Multiple Columns: Basics
Multiple linear regression: extends the same least squares framework to include several predictor variables simultaneously.; Lesson 595 — From Simple to Multiple Linear Regression
Multiple lines: Often overlay several cohorts or segments for comparison; Lesson 1653 — What are Retention Curves?
Multiple outliers expected: Use robust methods like Modified Z-score or clustering techniques; Lesson 1395 — When to Use Grubbs' Test
Multiple subqueries: execute independently instead of sharing work; Lesson 966 — Performance Considerations for WHERE Subqueries
Multiple Testing: Lesson 430 — Common Applications and Pitfalls
Multiple testing correction method: (Bonferroni, Holm-Bonferroni, Benjamini-Hochberg, etc.; Lesson 1508 — Pre-Registration and Correction Strategy
Multiple views: of the same data are needed (charts, tables, maps together); Lesson 1330 — Introduction to Interactive Dashboards
multiplication rule: you just learned works beautifully for two events: P(A ∩B) = P(A) × P(B|A).; Lesson 95 — Chain Rule for Multiple Events Lesson 107 — Bayes' Theorem Formula and Components
multiplicative: world, each decoration stretches proportionally—higher rungs get proportionally bigger decorations.; Lesson 710 — Additive vs Multiplicative Models Lesson 744 — Classical Decomposition Methods Lesson 765 — Introduction to Holt-Winters Method Lesson 825 — What is the Cox Proportional Hazards Model?
Multiplicative forecasting formula: Lesson 771 — Forecasting with Holt-Winters
Multiplicative model: `Observed = Trend × Seasonality × Irregular`; Lesson 710 — Additive vs Multiplicative Models Lesson 742 — Components of Seasonal Decomposition Lesson 748 — Seasonally Adjusted Data Lesson 749 — Using Decomposition for Forecasting Lesson 770 — Initializing Holt-Winters Components
Multiplicative models: assume components are multiplied:; Lesson 743 — Additive vs Multiplicative Models
Multiplicative seasonality: means seasonal swings grow or shrink proportionally with the trend level.; Lesson 766 — Additive vs Multiplicative Seasonality
Multiply: each midpoint by its frequency (how many values fall in that range); Lesson 45 — Central Tendency for Grouped Data Lesson 96 — Conditional Probability in Tree Diagrams Lesson 178 — Log-Normal Distribution: Definition and Properties
Multiply by the likelihood: `P(data|θ)` — how probable the observed data is for each possible θ value; Lesson 1545 — Calculating the Posterior Distribution
Multiplying by a constant: If you multiply *X* by constant *a*, the expectation scales proportionally:; Lesson 149 — Properties of Expectation and Variance
Multiprocessing Scheduler: Lesson 1795 — Distributed Schedulers and Client Setup
must: be a success (that's your *r*th success); Lesson 135 — The Negative Binomial Distribution: Waiting for r Successes Lesson 274 — Confidence Intervals for Small Samples Lesson 799 — Fitting and Diagnosing SARIMA Models Lesson 971 — Aliasing Derived Tables Lesson 1056 — Foreign Key Constraints in Practice Lesson 1841 — Upstream and Downstream Dependencies
Must control: for C to isolate A's effect on Y.; Lesson 1476 — Common DAG Patterns and Pitfalls
Mutual independence: (also called *joint independence*) is stronger.; Lesson 103 — Mutual Independence vs Pairwise Independence
mutually exclusive: .; Lesson 80 — Set Operations: Union, Intersection, and Complement Lesson 83 — Partitions of the Sample Space Lesson 86 — General Addition Rule for Overlapping Events Lesson 89 — The Complement Rule Lesson 106 — Common Misconceptions About Independence Lesson 309 — Complementary Nature of Hypotheses
MySQL: is another open-source option, popular for web applications and simpler projects.; Lesson 845 — Database Management Systems (DBMS)Lesson 862 — Case Sensitivity in Text Filtering Lesson 940 — Database Support and Alternatives Lesson 1041 — Formatting and Parsing Dates

N

N-1: (one less than your sample size); Lesson 50 — Population vs Sample Variance
Nagelkerke R²: Adjusts Cox & Snell to range fully from 0 to 1; Lesson 702 — Pseudo R-Squared Measures
Name: The actual column identifier; Lesson 1163 — Metadata and Data Dictionaries
Named parameters: give each placeholder a meaningful name using the `:name` syntax, making your queries self- documenting and easier to modify.; Lesson 1106 — Parameter Placeholders: Named Parameters Lesson 1108 — Handling IN Clauses Safely
Naming conventions prevent chaos: Establish patterns like `YYYY-MM-DD_project_dataset_version.; Lesson 2068 — Data Provenance Best Practices
Narrower: than the original population distribution; Lesson 252 — Sampling Distribution of the Sample Mean
National origin: Lesson 1888 — Protected Classes and Sensitive Attributes
Natural keys: Existing unique values like `email` or `ssn`; Lesson 1048 — What Are Primary Keys?Lesson 1050 — Choosing Effective Primary Keys
Natural order: For ordinal categories like "Small, Medium, Large" or months, preserve logical sequence; Lesson 1178 — Bar Charts for Categorical Data
Natural workflow: Mirrors how we actually learn—bit by bit, not all at once; Lesson 1538 — Updating Beliefs with Sequential Data
Navigate with keyboard only: can you reach all interactive features?; Lesson 1254 — Testing Visualizations for Accessibility
Near-duplicates: Similar records that might represent the same entity (e.; Lesson 1154 — Uniqueness and Duplication Checks
Near-real-time: (minutes) allows micro-batches.; Lesson 1825 — Designing Pipeline Architecture
Near-term lags: (1, 2, 3.; Lesson 796 — Identifying Seasonal Patterns
Nearest-Neighbor Matching: pairs each treated unit with the control unit(s) having the closest propensity score.; Lesson 1448 — Propensity Score Matching Methods
Necessity assessment: why this processing is needed, what legal basis applies; Lesson 1910 — Data Protection Impact Assessments (DPIAs)
Needs improvement: Below 25th percentile; Lesson 61 — Using Percentiles for Comparison and Benchmarking
Negative: Your model overpredicted (observed < fitted); Lesson 539 — What Are Residuals?Lesson 652 — Interpreting Categorical × Continuous Interactions
Negative (Left) Skewness: Lesson 64 — Skewness: Definition and Interpretation
Negative binomial: "How many calls until I get *10* EV owners for my focus group?; Lesson 138 — Real-World Applications: Quality Control and Surveys Lesson 694 — Quasi-Poisson and Negative Binomial Models
negative binomial distribution: answers this question: "How many trials will I need to get exactly *r* successes?; Lesson 135 — The Negative Binomial Distribution: Waiting for r Successes Lesson 136 — Expectation and Variance of the Negative Binomial Lesson 137 — Geometric vs Negative Binomial: Key Differences Lesson 138 — Real-World Applications: Quality Control and Surveys
Negative coefficient: → that category has a lower average outcome than the reference; Lesson 637 — Interpreting Dummy Variable Coefficients
Negative correlation: Points trend downward (as one increases, the other decreases); Lesson 1222 — Scatter Plots for Relationships
Negative r: Variables move in opposite directions (car age and resale value, temperature and heating bills); Lesson 477 — Interpreting the Correlation Coefficient
Negative residuals: (`e_i < 0`) occur when the actual value is *below* the fitted line.; Lesson 540 — The Residual Formula
Negative slope: X and Y move opposite (X ↑ means Y ↓); Lesson 524 — The Meaning of the Slope
Negative values: = lighter tails than normal (platykurtic); Lesson 67 — Calculating Kurtosis Lesson 720 — The Autocorrelation Function (ACF)
Neglecting the complement: When you know P(A|B), don't assume you automatically know P(A|not B).; Lesson 100 — Common Conditional Probability Mistakes
Neither: recognizes the critical middle steps that move customers down the funnel; Lesson 1724 — Limitations of Single-Touch Attribution
nested: when one model is a special case of the other.; Lesson 626 — Nested vs Non-Nested Models Lesson 1334 — Dash Basics: App Layout with Components
nested models: a smaller model (with fewer predictors) and a larger model (with additional predictors).; Lesson 623 — Partial F-Tests for Nested Models Lesson 627 — The F-Test for Model Comparison Lesson 699 — The Likelihood Ratio Test Lesson 791 — Comparing Nested and Non-Nested Models
Net Profit: Revenue minus all expenses; Lesson 1516 — Business Metrics: Definition and Examples
Net Revenue Retention: measures revenue retention *including* expansion from existing customers:; Lesson 1629 — SaaS Growth Metrics: Quick Ratio and Net Revenue Retention
Netflix: Hours watched — reflects content value and reduces churn risk.; Lesson 1606 — Examples of North Star Metrics by Industry
Network effects: In social features, randomizing by user may "leak" treatment effects to control users who interact with treated users.; Lesson 1481 — Unit of Randomization Lesson 1923 — Algorithmic Amplification of Harm
Network Errors: happen when Python can't reach the database server.; Lesson 1093 — Troubleshooting Connection Issues
Neural networks: Many deep learning frameworks expect one-hot encoded inputs; Lesson 638 — One-Hot Encoding Overview
Never: choose your tail configuration after seeing your results.; Lesson 350 — Choosing the Right Tail Configuration
Never extrapolate: interpretation beyond your data range.; Lesson 523 — The Meaning of the Intercept
Never hardcode credentials: Store them in environment variables or configuration files:; Lesson 1090 — Establishing a Connection with psycopg2 (PostgreSQL)
New Customers: Recent converters who've made their first purchase or subscription.; Lesson 1704 — Customer Lifecycle Stages
New hypothesis: Email engagement is the driver, not day-of-week effects; Lesson 1201 — Domain Knowledge as a Hypothesis Source
New insights from data: Analysis might reveal that what you thought was a key driver (a branch metric) actually has minimal impact on your North Star.; Lesson 1626 — Maintaining and Evolving Metric Trees
No arbitrary cutoff: All past data contributes (just with declining weight); Lesson 757 — Introduction to Exponential Smoothing
No autocorrelation: values shouldn't predict future values; Lesson 709 — Irregular Component: Random Noise
No change: "Customer satisfaction hasn't changed after the redesign" (before = after); Lesson 307 — Defining the Null Hypothesis (H₀)
no correlation: between the two variables in the population.; Lesson 500 — Hypothesis Testing Framework for Correlation Lesson 1222 — Scatter Plots for Relationships
No decay (flat): Equal weight throughout the window—simpler but less realistic; Lesson 1639 — Time Windows and Attribution Decay
No difference: "The mean weight of Group A equals the mean weight of Group B" (μ₁ = μ₂); Lesson 307 — Defining the Null Hypothesis (H₀)
No effect: "This new drug has no effect on blood pressure" (effect = 0); Lesson 307 — Defining the Null Hypothesis (H₀)
No extreme outliers: A single outlier can dramatically distort r; Lesson 480 — Scatterplots and Visual Assessment
No manual editing: of intermediate files between steps; Lesson 1981 — What Makes a Report Reproducible?
No Multicollinearity: Lesson 546 — The Five Core Assumptions of Linear Regression
No one person understands: the entire chain anymore; Lesson 2132 — Pipeline Glue Code and Complexity Creep
No outliers: Mean provides more information because it uses all data points.; Lesson 42 — Comparing Mean, Median, and Mode
No patterns remaining: if you see structure in the residuals, you've missed something; Lesson 709 — Irregular Component: Random Noise
No Perfect Multicollinearity: Lesson 601 — Assumptions for Multiple Linear Regression
No relationship: "There's no correlation between study hours and test scores" (correlation = 0); Lesson 307 — Defining the Null Hypothesis (H₀)
No repeating groups: there are no columns like "Phone1", "Phone2", "Phone3" storing similar data; Lesson 1064 — First Normal Form (1NF)
No seasonality: (no repeating patterns); Lesson 758 — Simple Exponential Smoothing (SES)
No selection bias: (nothing observable or unobservable influences assignment); Lesson 1487 — Simple Random Assignment
No significant spikes: → No clear direct autoregressive pattern; Lesson 730 — Interpreting PACF Plots
no trend: and **constant seasonality** can be stationary; Lesson 712 — What is Stationarity?Lesson 758 — Simple Exponential Smoothing (SES)
No trend (flat): Values fluctuate around a stable mean with no persistent direction; Lesson 706 — Trend: Long-Term Direction
Node Color: Use color to encode categories (communities, types) or continuous values (temperature scales for metrics).; Lesson 1319 — Styling Network Visualizations
Node Size: Scale nodes by importance metrics (degree centrality, betweenness) or attributes (population, budget).; Lesson 1319 — Styling Network Visualizations
Nodes: represent variables (like treatment, outcome, confounders); Lesson 1468 — Introduction to Directed Acyclic Graphs (DAGs)
Noise addition: introduces random perturbation to numerical data, making exact values uncertain while preserving statistical properties for aggregate analysis.; Lesson 1895 — Data Anonymization Basics
Nominal: variables are categories without any inherent ranking or order.; Lesson 17 — Categorical Variables: Nominal and Ordinal
Nominal data: (categories with no order: fruit types, countries, product names) pairs best with:; Lesson 1238 — Matching Encoding to Data Type
Non-canonical links: are any other valid link functions you might choose for that distribution.; Lesson 676 — Canonical vs Non-Canonical Links
Non-correlated SELECT subqueries: run once:; Lesson 969 — Performance Considerations for SELECT Subqueries
Non-correlated subqueries: are completely independent.; Lesson 968 — Correlated vs Non-Correlated Subqueries in SELECT
Non-correlated with JOIN: Lesson 980 — Converting Correlated to Non-Correlated Subqueries
non-directional: .; Lesson 345 — Directionality in Hypothesis Testing Lesson 415 — Setting Up Hypotheses for Goodness of Fit
Non-directional (two-tailed): "The new landing page will *change* sign-ups"; Lesson 1479 — Formulating Hypotheses
Non-independence: If observations are related in ways you haven't accounted for (clustered data, time series correlations), the independence assumption fails completely, and your p-values become meaningless.; Lesson 390 — When Parametric Tests Fail: Violations of Assumptions
Non-independent pairs: Results are unreliable; reconsider your study design; Lesson 374 — Assumptions of the Paired t-Test
Non-informative (flat) priors: essentially say "I know nothing" — they let the data dominate the analysis completely.; Lesson 1534 — The Prior Distribution
Non-linear funnels: acknowledge that users don't always move forward.; Lesson 1683 — Multi-Path and Non-Linear Funnels
Non-linear relationships: R-squared measures *linear* fit.; Lesson 537 — When R-Squared is Not Enough
Non-linearity: The relationship between X and Y curves rather than forming a straight line; Lesson 591 — When and Why to Transform Variables
Non-negative: Can't go below zero; Lesson 689 — When to Use Poisson Regression
non-nested: when neither is a special case of the other.; Lesson 626 — Nested vs Non-Nested Models Lesson 629 — Akaike Information Criterion (AIC)
Non-nested models: are competitors that can't be simplified into one another.; Lesson 791 — Comparing Nested and Non-Nested Models
Non-normal data: Switch to IQR-based detection; it's distribution-agnostic; Lesson 1395 — When to Use Grubbs' Test
Non-normal differences (large n): Paired t-test is usually still robust; Lesson 374 — Assumptions of the Paired t-Test
Non-normal differences (small n): Consider the Wilcoxon signed-rank test (a non-parametric alternative); Lesson 374 — Assumptions of the Paired t-Test
Non-Normal Distributions: Lesson 487 — When to Use Spearman vs Pearson Lesson 1379 — Assumptions and Limitations
Non-normal residuals: The Q-Q plot shows heavy tails, skewness, or other departures from normality; Lesson 591 — When and Why to Transform Variables
Non-normality with small samples: If your sample size is small (typically n < 30) and your data show strong skewness, heavy outliers, or non-normal distributions (confirmed through visual checks or tests like Shapiro-Wilk), the t- test's results become unreliable.; Lesson 390 — When Parametric Tests Fail: Violations of Assumptions
Non-nullability: A primary key can never be `NULL`.; Lesson 1048 — What Are Primary Keys?
Non-parametric part: The baseline hazard function (risk over time for someone with all covariates = 0) is **not assumed** to follow any distribution—it's left flexible.; Lesson 825 — What is the Cox Proportional Hazards Model?
non-probability sampling: method where you select individuals or items simply because they're easy to reach.; Lesson 239 — Convenience Sampling Lesson 242 — Probability vs Non-Probability Sampling Lesson 247 — Survivorship Bias
Non-probability sampling advantages: Lesson 242 — Probability vs Non-Probability Sampling
Non-probability sampling limitations: Lesson 242 — Probability vs Non-Probability Sampling
Non-regression contexts: Classification tasks, clustering, or similarity calculations; Lesson 638 — One-Hot Encoding Overview
Non-repeatable reads: Reading the same row twice and getting different values; Lesson 1116 — Transaction Isolation and Concurrency
Non-response bias: When certain groups don't respond to your survey.; Lesson 244 — Selection Bias and Its Causes
Non-stationarity: The statistical properties (mean, variance) often change over time—seasonal patterns, trends, and structural breaks are common; Lesson 704 — What Makes Time Series Data Different?
non-stationary: .; Lesson 716 — Augmented Dickey-Fuller Test Lesson 725 — Decay Rates in ACF Lesson 726 — Using ACF for Model Identification
Non-technical audiences: (executives, stakeholders, general public) typically:; Lesson 1950 — Identifying Your Audience: Technical vs Non-Technical
Non-trivial: `StudentID → StudentName` (actually tells us something); Lesson 1063 — Functional Dependencies
Nonlinear relationships: Curves, U-shapes, or other patterns that aren't straight lines; Lesson 1222 — Scatter Plots for Relationships
nonresponse bias: .; Lesson 245 — Response Bias and Nonresponse Bias Lesson 247 — Survivorship Bias
Normal: (for continuous outcomes); Lesson 664 — What is the Exponential Family of Distributions?Lesson 669 — The Dispersion Parameter φ Lesson 1568 — Unknown Variance: Normal-Inverse-Gamma Model
normal distribution: (also called the Gaussian distribution) is a continuous probability distribution that creates a distinctive **bell curve** shape when graphed.; Lesson 169 — The Normal Distribution: Definition and Properties Lesson 180 — Parameters and Moments of the Log-Normal Lesson 676 — Canonical vs Non-Canonical Links Lesson 1395 — When to Use Grubbs' Test Lesson 1566 — Conjugate Normal-Normal Model
Normal posterior: Use the mean ± (z-score × standard deviation) where z-score comes from the normal distribution.; Lesson 1579 — Practical Computation of Credible Intervals
normal prior: for means because:; Lesson 1565 — Prior Distributions for Normal Means Lesson 1566 — Conjugate Normal-Normal Model
Normal-Inverse-Gamma (NIG): distribution is a conjugate prior for the normal likelihood when both mean (μ) and variance (σ²) are unknown.; Lesson 1568 — Unknown Variance: Normal-Inverse-Gamma Model
Normal-Normal: conjugacy (known variance σ²):; Lesson 1554 — Updating Conjugate Priors with Data
Normal-Normal conjugacy: Normal prior + Normal likelihood = Normal posterior.; Lesson 1553 — Normal-Normal Conjugacy
Normality: Each group's data is roughly normally distributed (or sample size is large); Lesson 447 — Conducting One-Way ANOVA in Practice Lesson 544 — The Role of Residuals in Diagnostics Lesson 546 — The Five Core Assumptions of Linear Regression Lesson 601 — Assumptions for Multiple Linear Regression Lesson 782 — Residual Diagnostics for ARIMA
Normality checks: Residuals should be roughly normally distributed (histogram, Q-Q plot); Lesson 799 — Fitting and Diagnosing SARIMA Models
Normality holds: Your data (or sampling distribution) is approximately normal, especially with small samples (n < 30); Lesson 398 — Choosing Between Parametric and Non-Parametric Tests
Normality of Differences: Lesson 374 — Assumptions of the Paired t-Test
Normality required: Your data must be approximately normally distributed (Grubbs' is parametric); Lesson 1389 — What is Grubbs' Test?
Normality violated: Severe skewness, outliers, or small samples from non-normal populations; Lesson 398 — Choosing Between Parametric and Non-Parametric Tests
Normality violated, small sample: → Switch to non-parametric alternative (Mann-Whitney, Wilcoxon signed-rank); Lesson 383 — Diagnostic Workflow: When to Proceed or Switch Tests
Normality violations: The Central Limit Theorem saves us here.; Lesson 382 — Robustness of t-Tests to Assumption Violations
Normalization: splitting tables so each stores one logical entity with proper relationships maintained through primary and foreign keys.; Lesson 1062 — Data Anomalies: Insert, Update, Delete
Normalize by the evidence: `P(data)` — a scaling constant that ensures probabilities sum to 1; Lesson 1545 — Calculating the Posterior Distribution
Normalize each metric: to a 0–100 scale first; Lesson 1699 — Engagement Scoring Systems
Normalize to UTC: Convert any timezone-aware timestamp to UTC for storage; Lesson 1042 — Working with Timestamps and Time Zones
Normalize Your Data: Lesson 1309 — Choropleth Maps: Basics and Best Practices
Normalizing values: means replacing variations with a single standard form.; Lesson 1138 — Cleaning and Standardizing Text Fields
normally distributed: (bell-shaped).; Lesson 71 — Z-Score Method for Outlier Detection Lesson 224 — CLT for Proportions
North Star Metric: (NSM) is the one metric that best captures the core value your product or service delivers to customers.; Lesson 1604 — What is a North Star Metric?
not: A"; Lesson 80 — Set Operations: Union, Intersection, and Complement Lesson 81 — Mutually Exclusive Events Lesson 267 — Interpreting Confidence Levels Lesson 548 — Independence of Observations Lesson 865 — Introduction to Logical Operators in SQL Lesson 868 — The NOT Operator Lesson 870 — Operator Precedence and Parentheses Lesson 884 — AVG: Computing Averages (+4 more)
Not a partition: Lesson 83 — Partitions of the Sample Space
Not collectively exhaustive: (missing 4 and 5); Lesson 82 — Collectively Exhaustive Events
NOT Operator: Lesson 871 — NULL Handling with Logical Operators
Not quite: Different libraries often maintain **separate random number generators** with independent states.; Lesson 2058 — Seed Scope and Multiple Libraries
Not reproducible: Others can't easily adapt your paths and settings; Lesson 2072 — Configuration Files vs Hard-Coded Values
Not Robust: For non-normal data, percentiles or IQR-based methods often work better than z-scores.; Lesson 201 — Z-Score Applications and Limitations
Not so fast: You've ignored the thousands of failed startups that *also* took big risks but went bankrupt.; Lesson 247 — Survivorship Bias
Not sure: Try multiple window sizes and compare how well they balance smoothness with responsiveness for your specific problem; Lesson 752 — Choosing the Window Size
Notebooks vs code: Use notebooks (`notebooks/`) for exploration and communication.; Lesson 2069 — Project Directory Structure
Novelty bias: Users often react differently to changes initially—either with excitement (novelty effect) or resistance (change aversion).; Lesson 1484 — Duration and Timing Considerations
Novelty Effect: Users interact more with something *because it's new and different*, not because it's actually better.; Lesson 1525 — Novelty and Primacy Effects
NPS surveys: that don't correlate with renewal rates in your specific business; Lesson 1616 — Metrics Divorced from Revenue
Null (H₀): The two variables are independent (no association); Lesson 433 — Conducting Fisher's Exact Test Lesson 787 — Ljung-Box Test for Residual Autocorrelation
Null deviance: measures how poorly an intercept-only model (just predicting the overall mean/rate) fits your data.; Lesson 698 — Null and Residual Deviance
Null hypothesis: The factor has no effect on the outcome (all group means for that factor are equal); Lesson 464 — Main Effects in Two-Way ANOVA Lesson 474 — Friedman Test: Non-Parametric Repeated Measures ANOVA Lesson 654 — Testing Interaction Significance Lesson 819 — Null Hypothesis in the Log- Rank Test Lesson 1467 — Testing Instrument Strength and Validity
Null hypothesis (H₀): The data comes from a normal distribution; Lesson 205 — Shapiro-Wilk Test Lesson 207 — Anderson-Darling Test Lesson 307 — Defining the Null Hypothesis (H₀)Lesson 354 — Setting Up Hypotheses for One-Sample t-Test Lesson 378 — Testing Normality: Statistical Tests Lesson 401 — Setting Up Hypotheses for Proportions Lesson 406 — Two- Sample Proportion Test Setup Lesson 415 — Setting Up Hypotheses for Goodness of Fit (+9 more)
NULL values: from unmatched rows can affect your counts differently than expected; Lesson 933 — Aggregating with LEFT JOINs
Number at risk: = all subjects who haven't yet had the event *and* haven't been censored before time *t*; Lesson 812 — Handling Event Times and Censoring
Number of categories: How many distinct groups or bins your data falls into; Lesson 418 — Degrees of Freedom in Goodness of Fit
Number of events: = only those who actually experienced the event at time *t*; Lesson 812 — Handling Event Times and Censoring
Number of failures: ≥ 10; Lesson 411 — Sample Size Requirements
Number of Groups: Lesson 446 — Power and Sample Size for ANOVA
Number of rows scanned: Fewer is better; Lesson 1077 — Measuring Performance Impact of Denormalization
Number of successes: ≥ 10; Lesson 411 — Sample Size Requirements
Number of variables: Are you showing one variable, comparing two, or exploring relationships among three or more?; Lesson 1230 — Choosing the Right Chart Type
Numeric × Numeric: When both variables are continuous or discrete numbers, you want to assess linear relationships, strength, and direction.; Lesson 1182 — Choosing Analysis Methods by Variable Types
Numeric columns: Find the lowest and highest numbers; Lesson 885 — MIN and MAX: Finding Extremes
Numeric-to-Categorical: Compare distributions using grouped summary statistics and visualizations (box plots by group).; Lesson 1210 — Relationship Exploration: Correlation and Association
Numeric-to-Numeric: Use correlation coefficients (Pearson, Spearman) and correlation matrices to spot linear and monotonic relationships.; Lesson 1210 — Relationship Exploration: Correlation and Association
Numerical data: (continuous or discrete); Lesson 39 — The Mean (Arithmetic Average)Lesson 41 — The Mode: Most Frequent Value
Numerical stability: Stan's implementation of Hamiltonian Monte Carlo (NUTS) includes automatic differentiation and careful numerical engineering; Lesson 1595 — Stan: High-Performance Bayesian Inference
NumPy: (`numpy.; Lesson 2058 — Seed Scope and Multiple Libraries
NUTS (No-U-Turn Sampler): is an advanced version of HMC that automatically tunes a critical parameter: how long to let the "ball" roll.; Lesson 1593 — Hamiltonian Monte Carlo and NUTS
NYC's coefficient: (say, +15): means NYC is 15 units higher than Boston; Lesson 643 — Interpreting Coefficients Relative to Reference

O

O'Brien-Fleming: Spends very little alpha early (conservative early looks), saving most for the final analysis; Lesson 1512 — Group Sequential Testing
Objective: is a clear, inspiring goal that describes *what* you want to achieve.; Lesson 1607 — Introduction to OKRs (Objectives and Key Results)
Objective Example: Lesson 1607 — Introduction to OKRs (Objectives and Key Results)
Objectives: are the qualitative, inspirational statements that describe *what* you want to achieve.; Lesson 1609 — Setting Effective Objectives
Observations are independent: Each observation doesn't influence others; Lesson 399 — When to Use the One-Sample Z-Test for Proportions
Observe: Collect data for a period (say, one day's worth of conversions); Lesson 1582 — Updating Beliefs with Test Data
Observed: The actual count you got in each category; Lesson 417 — The Chi-Squared Test Statistic Formula
Observed - Expected: the raw difference; Lesson 428 — Post-Hoc Analysis and Residuals
observed frequencies: are from your **expected frequencies** across all categories.; Lesson 414 — Introduction to Chi-Squared Goodness of Fit Test Lesson 416 — Calculating Expected Frequencies Lesson 423 — Contingency Tables and Expected Frequencies
odds: of success.; Lesson 673 — The Logit Link Lesson 680 — The Logit Link Function and Odds
Odds ratio: for proportions; Lesson 384 — What is Effect Size?Lesson 677 — Interpreting Coefficients Under Different Links
Offline: Can use computationally intensive methods; you can look both forward *and* backward from any point; Lesson 1414 — Offline vs Online Change-Point Detection
Offline (batch) change-point detection: works like a detective reviewing cold cases.; Lesson 1414 — Offline vs Online Change-Point Detection
offset: is a predictor whose coefficient is fixed at 1.; Lesson 692 — Offset Terms for Exposure Lesson 1023 — Introduction to Window Functions: LAG and LEAD Lesson 1024 — LAG Function: Accessing Previous Row Values Lesson 1025 — LEAD Function: Accessing Next Row Values
Omega-squared: provides a less biased, more conservative estimate by adjusting for sample size.; Lesson 445 — Effect Size: Eta-Squared and Omega-Squared
Omitted variable bias: A third variable influences both X and Y, creating a spurious relationship; Lesson 553 — Exogeneity: X Must Be Independent of Errors
Omitted variables: Important confounders are missing from your model and hide in the error term; Lesson 1464 — Instrumental Variables: The Endogeneity Problem
ON: is required for complex conditions (inequalities, multiple different columns); Lesson 953 — Join Conditions: ON vs USING
ON condition: The matching rule, usually comparing a column from each table; Lesson 919 — Basic INNER JOIN Syntax
On Linux: Use your package manager:; Lesson 1991 — Installing Git and Initial Configuration
On macOS: Open Terminal and type `git --version`.; Lesson 1991 — Installing Git and Initial Configuration
On macOS/Linux: Lesson 2040 — Creating and Activating Virtual Environments with venv
On Windows: Download the installer from [git-scm.; Lesson 1991 — Installing Git and Initial Configuration Lesson 2040 — Creating and Activating Virtual Environments with venv
Onboarding completion: When a user finishes setup steps; Lesson 1646 — Defining Cohort Start Events
once: in the output, not twice.; Lesson 953 — Join Conditions: ON vs USING Lesson 1003 — Set Operation Requirements and Rules Lesson 2057 — Setting Seeds in Python and R
once per row: in the outer query; Lesson 967 — Subqueries in the SELECT Clause Lesson 969 — Performance Considerations for SELECT Subqueries Lesson 978 — Correlated Subqueries in SELECT Clauses
Once spent, it's gone: You cannot query indefinitely—eventually you exhaust your budget and must stop; Lesson 1900 — Privacy Budget and Composition
One categorical independent variable: (the "factor") with **three or more levels/groups**; Lesson 438 — When to Use One-Way ANOVA
One continuous dependent variable: (the outcome you're measuring); Lesson 438 — When to Use One-Way ANOVA
One idea per paragraph: Don't mix method explanation with result interpretation; Lesson 1967 — Writing Clear and Concise Analysis Sections
One numerical, one categorical: Do salaries differ by department?; Lesson 1181 — What is Bivariate Analysis?
One sample: of data; Lesson 351 — When to Use a One-Sample t-Test Lesson 370 — Differences as the Unit of Analysis
One slide, one message: Don't cram.; Lesson 1944 — Executive Summary Best Practices
One-hot encoding: takes a different approach: it creates k dummy variables for k categories—one for *every* level, with no reference category left out.; Lesson 638 — One-Hot Encoding Overview
One-sided: When theory, cost, or practical concerns make only one direction meaningful; Lesson 311 — One-Sided vs Two-Sided Alternatives Lesson 345 — Directionality in Hypothesis Testing Lesson 401 — Setting Up Hypotheses for Proportions
One-sided (greater): "The parameter is *greater* than the null value" (>); Lesson 308 — Defining the Alternative Hypothesis (H₁ or H ₐ)Lesson 373 — Hypotheses for Paired t-Tests
One-sided (less): "The parameter is *less* than the null value" (<); Lesson 308 — Defining the Alternative Hypothesis (H₁ or H ₐ)Lesson 373 — Hypotheses for Paired t-Tests
One-sided (maximum): You're testing product dimensions where oversized items break downstream machinery, but undersized items are fine.; Lesson 1393 — Two-Sided vs One-Sided Grubbs' Test
One-sided (minimum): You're checking server response times where slow responses matter, but faster-than-expected times are welcomed.; Lesson 1393 — Two-Sided vs One-Sided Grubbs' Test
One-sided (one-tailed) tests: These focus on a specific direction:; Lesson 1393 — Two-Sided vs One-Sided Grubbs' Test
one-sided test: , you only care about one tail direction.; Lesson 319 — Calculating P-Values from Test Statistics Lesson 325 — The Rejection Region
One-size-fits-all: A news app *should* have high DAU/MAU; tax software shouldn't; Lesson 1694 — Daily Active Users (DAU) and Monthly Active Users (MAU)
One-step ahead forecasting: predicts just the next immediate time period (e.; Lesson 794 — Forecasting Concepts and Horizons
One-Step-Ahead Forecasting Only: Lesson 756 — Limitations of Moving Averages
One-tailed: H₁: p₁ > p₂ or H₁: p₁ < p₂ (testing for a specific direction); Lesson 406 — Two-Sample Proportion Test Setup
one-tailed test: (also called a one-sided test).; Lesson 347 — One-Tailed Tests: Testing for a Specific Direction Lesson 348 — P-Value Calculation Differences Lesson 349 — Power Advantages and Trade-offs Lesson 354 — Setting Up Hypotheses for One- Sample t-Test Lesson 433 — Conducting Fisher's Exact Test
ongoing monitoring: of processes over time.; Lesson 1397 — Shewhart Control Chart Basics Lesson 1975 — When to Build a Dashboard
Online: Must be fast enough to keep pace with incoming data; can only look backward at history; Lesson 1414 — Offline vs Online Change-Point Detection
Online (real-time) change-point detection: is like a security guard monitoring live camera feeds.; Lesson 1414 — Offline vs Online Change-Point Detection
Online communities: (Reddit's r/datascience, Twitter/X, LinkedIn, Discord servers, Stack Overflow) provide daily touchpoints.; Lesson 2144 — Networking and Community Engagement
Online reviews: Only people with strong opinions (very happy or very angry) typically write reviews; Lesson 246 — Volunteer and Self-Selection Bias
Online-only surveys: Excluding people without internet access; Lesson 249 — Coverage Error and Undercoverage
only: applies when events are independent.; Lesson 87 — Multiplication Rule for Independent Events Lesson 866 — The AND Operator Lesson 928 — LEFT JOIN vs INNER JOIN: When to Use Each Lesson 1821 — Hybrid Approaches and Modern Data Stacks
Opacity: De-emphasize less important points or show density; Lesson 1310 — Point Maps and Scatter Plots on Maps Lesson 1923 — Algorithmic Amplification of Harm
Open conflicting files: Look for conflict markers (`<<<<<<<`, `=======`, `>>>>>>>`) showing both versions; Lesson 2018 — Resolving Conflicts During Rebase
Open Questions: "Does the marketing team track promo codes consistently?; Lesson 2100 — Documenting Assumptions and Open Questions
Open-source contributions: demonstrate your skills publicly while improving tools others use.; Lesson 2144 — Networking and Community Engagement
Opening balance method: Use only customers at period start (simpler, more conservative); Lesson 1671 — Churn Rate Calculation Methods
OpenLineage: (open standard) embed lineage capture directly into your code.; Lesson 1164 — Tools for Lineage Tracking
OpenStreetMap: is the most popular open-source tile provider, offering street-level detail perfect for urban data visualization.; Lesson 1314 — Basemaps and Map Tiles
Operational alignment: Does the model match what your marketing team observes qualitatively?; Lesson 1734 — Comparing and Validating Attribution Models
Operational databases: Lesson 1807 — Data Warehouse vs Database: Architecture and Purpose
operators: (templates for tasks like PythonOperator, BashOperator, or SQLOperator).; Lesson 1833 — Introduction to Apache Airflow Lesson 1835 — Airflow Operators and Tasks
Opportunity: Can you convert core users to power users?; Lesson 1698 — Power User Curves and Engagement Distribution
Opportunity cost: What are you giving up by choosing one option?; Lesson 152 — Decision Making Under Uncertainty Lesson 1586 — Multi-Armed Bandit Connections Lesson 2118 — Cost-Benefit Analysis for Continued Work
Optimal bandwidth selectors: Methods like Imbens-Kalyanaraman or Calonico-Cattaneo-Titiunik that balance bias and variance; Lesson 1463 — RDD Bandwidth Selection and Local Estimation
Optimal intervention timing: Reach out *before* the high-risk window; Lesson 835 — Customer Churn Prediction with Survival Analysis
Optimization: Each channel has unique conversion funnels and drop-off patterns you can improve; Lesson 1711 — What Are Acquisition Channels?Lesson 1716 — Channel Mix and Portfolio Thinking
Optimization pressure: Algorithms optimize for accuracy on biased data, which means they get *better* at replicating and intensifying discriminatory patterns; Lesson 1882 — Algorithmic Amplification of Bias
Optimization traps: You improve the surrogate at the expense of the business metric (e.; Lesson 1518 — The Relationship Between Surrogate and Business Metrics
Optimize: your query by rearranging or combining operations; Lesson 1780 — Transformations vs Actions in Spark
Optimize execution: Tasks with no mutual dependencies can run in parallel; Lesson 1841 — Upstream and Downstream Dependencies
Optimize timing: See how much time passes between interactions; Lesson 1719 — The Customer Journey and Touchpoints
Optimize within bounds: for maximum revenue or profit, not individual metrics; Lesson 1759 — Optimizing ROAS, CAC, and Payback Together
Optimized execution: Spark's Catalyst optimizer rewrites your queries for performance; Lesson 1778 — DataFrames and Spark SQL Basics
Optimizes computation: by eliminating redundant operations; Lesson 1790 — What is Dask and When to Use It
Optimizing warranty periods: By fitting a Cox model or Kaplan-Meier curve to historical failure data, you can estimate what percentage of products will fail within 1 year, 2 years, etc.; Lesson 837 — Product Warranty and Failure Analysis
Oracle: Lesson 940 — Database Support and Alternatives
Orchestration: to schedule and monitor the entire pipeline; Lesson 1821 — Hybrid Approaches and Modern Data Stacks Lesson 1832 — Orchestration vs Scheduling
Orchestration layer: Manages task scheduling, dependencies, retries, and monitoring; Lesson 1822 — What is a Data Pipeline?
order: (q) tells you how many previous error terms to include.; Lesson 777 — Identifying MA Order (q) Using ACF Lesson 951 — Join Order and Performance
ORDER BY: Sorts the final result; Lesson 896 — GROUP BY Execution Order Lesson 912 — Fundamental Difference: Filter Timing
Order conditions by likelihood: Place the most frequently matched conditions first to minimize unnecessary evaluations.; Lesson 1037 — CASE Best Practices and Performance
Order Confirmed: (conversion); Lesson 1679 — Defining Funnel Steps and Events
order matters: .; Lesson 789 — Overfitting and Cross-Validation for Time Series Lesson 1355 — Layer Order and Plot Composition
Order reversal: The reciprocal transformation **reverses the order** of your values.; Lesson 216 — Reciprocal and Inverse Transformations
Order your p-values: from smallest to largest: p₁ ≤ p₂ ≤ .; Lesson 1504 — Holm-Bonferroni Method Lesson 1506 — Benjamini-Hochberg Procedure
Ordered position: (left-to-right or top-to-bottom); Lesson 1238 — Matching Encoding to Data Type
Ordering all event times: from earliest to latest; Lesson 809 — Introduction to the Kaplan-Meier Estimator
Orders: is a foreign key that must match a `customer_id` in **Customers**.; Lesson 1051 — Introduction to Foreign Keys
Orders table: order ID, customer ID, order amount; Lesson 918 — What is an INNER JOIN?
Ordinal: variables have categories with a natural, meaningful order or ranking.; Lesson 17 — Categorical Variables: Nominal and Ordinal Lesson 392 — Wilcoxon Signed-Rank Test
Ordinal data: Ranks matter, but exact distances don't (e.; Lesson 398 — Choosing Between Parametric and Non-Parametric Tests Lesson 487 — When to Use Spearman vs Pearson Lesson 1238 — Matching Encoding to Data Type
Organic: Unpaid search traffic from Google, Bing, etc.; Lesson 1712 — Common Channel Categories
Organic search: (SEO-driven traffic); Lesson 1711 — What Are Acquisition Channels?
Organizational conflicts: Your employer wants data to support a predetermined decision.; Lesson 35 — Conflicts of Interest and Independence
Organizational pressure: Your employer wants a particular conclusion to justify a strategy they've already committed to publicly.; Lesson 1930 — Managing Conflicts of Interest
Organize strategically: .; Lesson 2034 — Committing Data Artifacts and Model Outputs
Organize your 2×2 table: and identify b and c (the off-diagonal counts); Lesson 436 — Conducting McNemar's Test
Orientation: Tilted lines among horizontal lines pop out; Lesson 1235 — Pre-Attentive Attributes
Orientation matters: Rotating the view can completely change the story your data tells—a sign the visualization isn't robust; Lesson 1329 — Effective Use and Pitfalls of 3D Visualizations
Origin: Database name, URL, file path, API endpoint, or vendor name; Lesson 1161 — Documenting Data Sources
Original data: (the combined signal); Lesson 711 — Visualizing Components with Decomposition Plots Lesson 747 — Interpreting Decomposition Plots
ORM: (Object-Relational Mapper) is built on top of it.; Lesson 1118 — SQLAlchemy Core vs ORM
ORM (Object-Relational Mapper): is a tool that lets you interact with database tables using Python objects and classes instead of writing raw SQL queries.; Lesson 1117 — What is an ORM and Why Use It?
Ornamental borders: Keep focus on the data itself; Lesson 1237 — Chart Junk and Data-Ink Ratio
Ornamental illustrations: (like pictures of coins on financial charts); Lesson 1963 — Removing Chartjunk
Ornate borders and frames: Decorative elements around the chart; Lesson 1246 — Visual Clutter and Chartjunk
Orphaned tasks: (nodes with no connections); Lesson 1846 — Testing and Validating Dependency Graphs
Orthographic projection: All objects maintain their size regardless of distance from the camera.; Lesson 1326 — Viewing Angles and Projection Types
Other cloud platforms: like AWS, Google Cloud, or Azure offer the most flexibility and scalability but demand more technical knowledge around servers, containers, and networking.; Lesson 1338 — Deployment and Sharing Dashboards
Other Western Electric Rules: Lesson 1401 — Detecting Out-of-Control Signals
out of control: (something changed).; Lesson 1396 — Introduction to Control Charts Lesson 1400 — Control Limits vs Specification Limits
Outcome: Test scores; Lesson 463 — Introduction to Two-Way ANOVA
Outcome-focused: – Measures results, not activities; Lesson 1610 — Defining Measurable Key Results
Outcomes are mutually exclusive: You can't have both success and failure simultaneously; Lesson 123 — Bernoulli Trial Definition and Properties
Outdated lists: Phone directories missing new residents or unlisted numbers; Lesson 249 — Coverage Error and Undercoverage
Outer query: Average those department sums; Lesson 973 — Nested Subqueries in FROM
Outer query alias: (`outer`): identifies columns from the main query; Lesson 976 — Basic Correlated Subquery Syntax
outlier: in regression is an observation with an unusual **Y value** given its X value—it doesn't follow the pattern of the other data points.; Lesson 587 — Identifying Outliers in Regression Context Lesson 1389 — What is Grubbs' Test?
Outlier Detection: If a data point has a z-score beyond ±3, it's unusual enough to investigate.; Lesson 201 — Z-Score Applications and Limitations Lesson 1157 — Statistical Anomaly Detection in QA
Outliers: Are there unusual points far from the main pattern?; Lesson 480 — Scatterplots and Visual Assessment Lesson 487 — When to Use Spearman vs Pearson Lesson 537 — When R-Squared is Not Enough Lesson 556 — What Are Residuals and Why Plot Them?Lesson 745 — STL Decomposition (Seasonal-Trend Loess)Lesson 1175 — Histograms for Distribution Shape Lesson 1176 — Box Plots for Spread and Outliers Lesson 1183 — Scatter Plots for Two Numeric Variables (+5 more)
Outliers and influential points: Which observations have unusually large residuals that might distort your model?; Lesson 544 — The Role of Residuals in Diagnostics
Outliers present: The median is *robust*—extreme values don't affect it.; Lesson 42 — Comparing Mean, Median, and Mode
Output: A chi-squared-like test statistic with degrees of freedom = (k - 1), where k = number of conditions; Lesson 474 — Friedman Test: Non-Parametric Repeated Measures ANOVA Lesson 1580 — Bayesian vs Frequentist A/B Testing
Output files: Lesson 2031 — Using .gitignore for Data Science Projects
Output(s): The component property you'll update (e.; Lesson 1335 — Dash Callbacks: Adding Interactivity
Outputs: Lesson 1737 — Aggregate-Level Data in MMM Lesson 1874 — DVC Pipelines and Stages
Outputs are generated: Everything in `reports/` and `models/` should be reproducible from code—don't edit these files manually.; Lesson 2069 — Project Directory Structure
Outside [a, b]: f(x) = 0; Lesson 161 — The Continuous Uniform Distribution
Outside the bounds: The autocorrelation is **statistically significant**—there's likely a real pattern at that lag; Lesson 723 — Significance Bounds in ACF Plots
Over-controlling: Adding every available variable to your model without checking the DAG.; Lesson 1476 — Common DAG Patterns and Pitfalls
Over-crediting vanity metrics: that didn't drive real outcomes; Lesson 1637 — What is Metric Attribution?
Over-differencing: can introduce unnecessary complexity and make patterns harder to model.; Lesson 736 — Higher-Order Differencing
Over-investing: in channels that capture existing demand rather than create it; Lesson 1717 — Incrementality and True Channel Impact
Over-Optimizing Proxies: Lesson 1603 — Common Pitfalls in Indicator Selection
Over-smoothing: Mean income by state masks extreme inequality within states; Lesson 1245 — Misleading Aggregations and Binning
Overall Equipment Effectiveness (OEE): is the gold standard for measuring production efficiency.; Lesson 1636 — Manufacturing Metrics: OEE, Yield, and Cycle Time
Overall model F-test: The global significance doesn't change; Lesson 647 — Impact on Model Results and Reporting
Overdispersion: occurs when the actual variance in your data significantly exceeds the mean—violating this core Poisson assumption.; Lesson 693 — Overdispersion in Count Data
Overfit: to your specific dataset's noise; Lesson 632 — Parsimony and Occam's Razor Lesson 2124 — Insufficient or Low-Quality Data
overfitting: .; Lesson 14 — Model Evaluation and Validation Lesson 785 — Information Criteria: AIC and BIC Lesson 1938 — Using Metaphors and Analogies
Overfitting risk: increases with unnecessary predictors; Lesson 1197 — Identifying Variable Importance and Redundancy
Overhead allocation: portion of office space, utilities for marketing/sales teams; Lesson 1753 — Customer Acquisition Cost (CAC): Components and Calculation
Overlap: Too many bubbles or extreme size differences can create clutter—consider transparency or interactive tooltips; Lesson 1229 — Bubble Charts for Three Variables
Overlay geoms: Additional data layers for comparison; Lesson 1355 — Layer Order and Plot Composition
Owner: Customer Success team lead; Lesson 1948 — The Recommendation Slide: Making It Actionable

P

p̂: (p-hat) is your sample proportion; Lesson 278 — Confidence Interval Formula for One Proportion Lesson 402 — Calculating the Test Statistic for Proportions
p < 0.05: (common threshold): Strong evidence that survival curves differ significantly.; Lesson 822 — Interpreting Log-Rank Test Results Lesson 1692 — Statistical Significance and Iteration
p = 0.5: Perfectly symmetric (like a fair coin).; Lesson 128 — Binomial Parameters n and p Lesson 293 — Sample Size for Estimating a Proportion
p-hacking: manipulating the analysis until they get p < 0.; Lesson 329 — Choosing α Before Analysis Lesson 1485 — Documentation and Pre-Registration Lesson 1508 — Pre-Registration and Correction Strategy Lesson 1926 — The Honest Broker Role
p-value: Lesson 205 — Shapiro-Wilk Test Lesson 206 — Kolmogorov-Smirnov Test Lesson 207 — Anderson-Darling Test Lesson 208 — Jarque-Bera Test Lesson 317 — Sampling Distribution of the Test Statistic Lesson 318 — What is a P-Value?Lesson 319 — Calculating P-Values from Test Statistics Lesson 348 — P-Value Calculation Differences (+18 more)
P-value < 0.05: (or your chosen α): Reject the null → series is **stationary**; Lesson 716 — Augmented Dickey-Fuller Test
P-value ≥ 0.05: Fail to reject → series is **non-stationary** (has unit root); Lesson 716 — Augmented Dickey-Fuller Test
P-Value Approach: Lesson 327 — Decision Rules: Reject or Fail to Reject
p-values: may be unreliable (too high or too low); Lesson 202 — Why Test for Normality?Lesson 345 — Directionality in Hypothesis Testing Lesson 355 — Finding Critical Values and P-Values Lesson 1938 — Using Metaphors and Analogies
P(A ∩ B): is the probability that *both* A and B occur (the intersection); Lesson 92 — Definition and Notation of Conditional Probability
P(A): = your **prior belief** (what you thought before seeing evidence); Lesson 108 — Updating Beliefs with New Evidence
P(A) = Σ P(A|B ᵢ)×P(B ᵢ): Lesson 97 — Law of Total Probability
P(A|B): , read as "the probability of A *given* B.; Lesson 92 — Definition and Notation of Conditional Probability Lesson 108 — Updating Beliefs with New Evidence
P(A|B) = P(A): .; Lesson 105 — Independence in Conditional Probability
P(B): is the probability that B occurs (and must be greater than 0); Lesson 92 — Definition and Notation of Conditional Probability Lesson 108 — Updating Beliefs with New Evidence
P(B|A): Lesson 108 — Updating Beliefs with New Evidence
P(B|A) = P(B): , then A and B are independent; Lesson 105 — Independence in Conditional Probability
P(both Aces): = (4/52) × (3/51) ≈ 0.; Lesson 88 — General Multiplication Rule
P(data | θ): Lesson 1535 — The Likelihood Function
P(event): alongside **P(positive test | event)**; Lesson 110 — Base Rate Fallacy
P(Evidence | Innocent): probability of seeing this evidence if innocent; Lesson 112 — Legal Evidence and Jury Reasoning
P(positive test | event): Lesson 110 — Base Rate Fallacy
P(X < a): Probability that X is less than some value *a* — the area to the *left* of *a*; Lesson 173 — Calculating Probabilities with the Normal Distribution
P(X = k): The probability that random variable X equals exactly k successes; Lesson 127 — Binomial Distribution PMF
P(X > b): Probability that X is greater than *b* — the area to the *right* of *b*; Lesson 173 — Calculating Probabilities with the Normal Distribution
P(X > k): "More than k events" (complement of cumulative); Lesson 143 — Cumulative Poisson Probabilities
P(X ≤ k): "At most k events" (cumulative probability); Lesson 143 — Cumulative Poisson Probabilities
P(Z < −1.23): = same as P(Z > 1.; Lesson 198 — Using Z-Tables for Probability
P(Z > 1.23): = 1 − 0.; Lesson 198 — Using Z-Tables for Probability
p̂₁: and **p̂₂** are your two sample proportions; Lesson 287 — Confidence Intervals for the Difference Between Two Proportions Lesson 409 — Z-Test Statistic for Two Proportions
p̂₂: are your two sample proportions; Lesson 287 — Confidence Intervals for the Difference Between Two Proportions Lesson 409 — Z-Test Statistic for Two Proportions
PACF: (Partial Autocorrelation Function), however, measures **only the direct relationship** at lag k, controlling for all intermediate lags.; Lesson 728 — PACF vs ACF: Key Differences Lesson 733 — Using ACF and PACF Together Lesson 798 — SARIMA Model Selection
PACF of residuals: , you're checking the same thing:; Lesson 786 — ACF and PACF of Residuals
PACF plot: to identify **p** (AR order).; Lesson 779 — The Box-Jenkins Methodology
Package versions: (exact versions of every library you import); Lesson 2038 — What is Environment Management and Why It Matters
Page 1: `LIMIT 25 OFFSET 0` (rows 1-25); Lesson 878 — OFFSET: Skipping Rows for Pagination
Page 2: `LIMIT 25 OFFSET 25` (rows 26-50); Lesson 878 — OFFSET: Skipping Rows for Pagination
Page 3: `LIMIT 25 OFFSET 50` (rows 51-75); Lesson 878 — OFFSET: Skipping Rows for Pagination
Page Design: Lesson 1690 — Landing Page and CTA Optimization
Page view: → **CTA click** → **Conversion**; Lesson 1690 — Landing Page and CTA Optimization
Page views: (without engagement depth or conversion); Lesson 1612 — What Are Vanity Metrics?Lesson 1616 — Metrics Divorced from Revenue
Paid: Any channel where you pay for placement (Google Ads, Facebook Ads, display networks, sponsored content); Lesson 1712 — Common Channel Categories
Paid advertising: (Google Ads, Facebook, display networks); Lesson 1711 — What Are Acquisition Channels?
Paid CAC: isolates only the costs and customers from *paid advertising channels*:; Lesson 1754 — Blended CAC vs Paid CAC
Paid Search: has a 4-month payback, while **Referral** pays back in 6 months.; Lesson 1758 — Cohort-Based Payback Analysis
Paired or Repeated Measurements: If you measure the same subjects twice (before/after treatment), those measurements aren't independent—they're linked to the same person.; Lesson 381 — Independence Assumption and Its Violations
paired t-test: , which analyzes the *differences* within each pair, effectively reducing the problem to a one- sample test on those differences; Lesson 360 — Independent vs. Dependent Samples Lesson 369 — When to Use a Paired t-Test Lesson 375 — Paired t-Test vs Two-Sample t-Test
Paired t-tests: Remember, it's the *differences* that need to be normally distributed, not the original paired observations.; Lesson 376 — The Assumption of Normality in t-Tests
Pairing comparable units: Each treated unit gets matched with one or more control units based on observed characteristics (covariates); Lesson 1445 — The Matching Framework
Pairing related items: Identify rows that share common attributes but differ in others; Lesson 947 — Self-Joins for Comparisons Within a Table
Pairwise comparisons: which groups are being compared; Lesson 462 — Interpreting and Reporting Post-Hoc Results Lesson 469 — Follow-Up Tests for Two-Way ANOVA
Pairwise independence: means every *pair* of events is independent.; Lesson 103 — Mutual Independence vs Pairwise Independence
Pan and zoom: capabilities for exploring dense datasets; Lesson 1300 — Creating Basic Interactive Charts with Plotly Express
Pandas: , you use `pivot()` or `pivot_table()`:; Lesson 1146 — Pivoting Data Wider (Cast)
Paper submissions: `paper-submission-neurips2024`; Lesson 2037 — Tagging Releases and Experiment Snapshots
Parallel lines: → No interaction; effects are independent; Lesson 466 — Visualizing Interactions
parallel trends: without treatment, both groups would have changed similarly over time—a critical assumption you'll need to verify in practice.; Lesson 1452 — The Difference-in-Differences Setup Lesson 1453 — The Parallel Trends Assumption Lesson 1746 — Geo-Lift Experiments
Parameter uncertainty: The spread of the distribution shows how confident you are.; Lesson 1547 — Interpreting Posterior Distributions
Parameters: are numerical characteristics that describe a population.; Lesson 228 — Defining Populations and Parameters Lesson 229 — Defining Samples and Statistics
Parametric part: The model assumes covariates affect hazard through a mathematical formula with parameters (coefficients) you estimate.; Lesson 825 — What is the Cox Proportional Hazards Model?
Parental/guardian consent: for children, plus age-appropriate explanations; Lesson 1918 — Special Populations and Vulnerable Groups
Pareto: describes heavy-tailed phenomena where extreme values are common—wealth distributions, file sizes on servers, or social network connections.; Lesson 193 — Choosing Between Distributions in Practice
Pareto distribution: , which you learned about in the previous lesson.; Lesson 191 — Pareto Principle and the 80/20 Rule
Pareto Principle: , also called the **80/20 rule**.; Lesson 191 — Pareto Principle and the 80/20 Rule
Parquet: is a compressed, column-oriented format designed for efficiency.; Lesson 1129 — Parquet and Feather: Columnar Formats Lesson 1779 — Reading and Writing Data in Spark
Parquet and Feather: are columnar formats optimized for analytics.; Lesson 1133 — Performance Considerations Across Formats
Partial: Part of a composite key determines an attribute; Lesson 1063 — Functional Dependencies
Partial autocorrelation: (PACF) solves this by measuring the *direct* correlation between observations separated by k time steps, *after removing* the influence of all the intermediate lags.; Lesson 727 — What is Partial Autocorrelation?
Partial correlation: measures the relationship between two variables *after removing the influence of one or more other variables*.; Lesson 506 — Introduction to Partial Correlation Lesson 508 — Interpreting Partial Correlations Lesson 509 — Confounding Variables and Control Lesson 513 — Applications: Feature Selection and Multicollinearity
Partial duplicate detection: Identify rows that match on key fields (like name and birthdate) but differ elsewhere—these might represent the same entity entered multiple ways.; Lesson 1154 — Uniqueness and Duplication Checks
Partial F-Test: (which you learned in lesson 623) to formally test whether the extra predictors significantly improve the model; Lesson 626 — Nested vs Non-Nested Models
Partial failure recovery: uses **checkpoints** and **transaction boundaries** to save progress at strategic points.; Lesson 1853 — Partial Failure Recovery
Partial Failure Risk: What if update #1 succeeds but update #2 fails?; Lesson 1075 — Handling Data Consistency in Denormalized Schemas
Partial reads and chunking: Lesson 1141 — Recovering from Corrupted or Partially Broken Data
partition: of a sample space is a special collection of events that satisfies two critical properties simultaneously:; Lesson 83 — Partitions of the Sample Space Lesson 97 — Law of Total Probability Lesson 1782 — Spark Performance Basics: Partitions and Caching
Partitioning: and **clustering** tell the warehouse how to physically organize your data so queries can skip entire chunks of irrelevant data.; Lesson 1812 — Partitioning and Clustering Strategies
partitions: and processes them in parallel.; Lesson 1791 — Dask DataFrame Basics Lesson 1794 — Working with Partitions
Partnerships: (co-marketing, affiliate programs); Lesson 1711 — What Are Acquisition Channels?
Past interactions: Previous purchases, feature usage patterns; Lesson 1689 — Multivariate Testing and Personalization
Patient satisfaction scores: capture experience quality through surveys—Net Promoter Score (NPS) or HCAHPS scores— serving as leading indicators for loyalty and reputation.; Lesson 1633 — Healthcare Metrics: Patient Outcomes and Operational Efficiency
Pattern: Points curve **above** the line at the upper-right end and **below** the line at the lower-left end —like a gentle S-curve.; Lesson 567 — Common Q-Q Plot Patterns: Heavy Tails and Light Tails Lesson 722 — ACF Plots and Interpretation Lesson 726 — Using ACF for Model Identification
Pattern + Color: In bar charts or area plots, add hatching, dots, or line patterns alongside color fills; Lesson 1251 — Avoiding Reliance on Color Alone
Pattern over-generalization: Models find and exploit subtle correlations in biased data that humans might overlook (e.; Lesson 1882 — Algorithmic Amplification of Bias
patterns: univariate analysis misses; Lesson 1181 — What is Bivariate Analysis?Lesson 1191 — Scatter Plot Matrices and Pairplots Lesson 1222 — Scatter Plots for Relationships Lesson 1867 — Data Profiling and Monitoring Lesson 2087 — Stage 3: Exploratory Data Analysis
Patterns in plots: non-random patterns suggest model misspecification (e.; Lesson 701 — Deviance Residuals
Payment Submitted: Lesson 1679 — Defining Funnel Steps and Events
PCA: when you want speed, interpretability, and care about global structure.; Lesson 1196 — Dimensionality Reduction for Visualization
PDF: (`.; Lesson 1262 — Saving Figures to Files
PDF acts like weights: , telling you which regions contribute more to the average.; Lesson 159 — Expected Value and Variance for Continuous Variables
Pearson: detects *linear* relationships: as X increases by a constant amount, Y changes by a constant amount; Lesson 487 — When to Use Spearman vs Pearson Lesson 1184 — Correlation Coefficients in Bivariate Analysis
Pearson correlation: is your go-to for linear relationships between normally distributed variables.; Lesson 1184 — Correlation Coefficients in Bivariate Analysis
Pearson's r: measures the strength and direction of the linear relationship between two variables; Lesson 534 — R-Squared vs Correlation Squared
Peek frequently: without invalidating your test; Lesson 1510 — Sequential Testing Overview
Peer groups: "Among similar-sized companies, we rank in the top 10%"; Lesson 1962 — Contextualizing Numbers
Pennies-per-terabyte storage: (Amazon S3, Google Cloud Storage): Storing raw data became so cheap that the cost of keeping everything in its original form was negligible; Lesson 1818 — The Rise of ELT: Cloud Storage and Compute
Percentage contribution: `value / SUM(value) OVER (PARTITION BY category)`; Lesson 1019 — Comparing Values to Window Aggregates
Percentage of total: `sale_amount / regional_total * 100`; Lesson 1019 — Comparing Values to Window Aggregates
percentile: tells you what percentage of the data falls *below* a specific value.; Lesson 56 — Understanding Percentiles and Their Interpretation Lesson 199 — Finding Percentiles with Z- Scores
percentiles: (100 groups).; Lesson 57 — Quantiles: Quartiles, Deciles, and Beyond Lesson 62 — Percentiles vs Z-Scores: Complementary Position Measures Lesson 1173 — Numerical Variable Summary Statistics
Perfect: multicollinearity means two or more predictors are perfectly linearly related—one can be expressed as an exact linear combination of the others.; Lesson 551 — No Perfect Multicollinearity in Simple Regression
Performance: Each JOIN operation has a computational cost.; Lesson 1070 — When to Stop Normalizing Lesson 1092 — Connection Pooling Basics Lesson 1636 — Manufacturing Metrics: OEE, Yield, and Cycle Time
Performance benchmarks: and expected behavior; Lesson 2091 — Stage 7: Communication and Handoff
Performance bottlenecks: When specific queries consistently timeout or slow down user experience despite indexing and optimization.; Lesson 1071 — When to Denormalize: Performance Trade-offs
Performance monitoring: Query optimization as data volume grows; Lesson 1979 — Maintenance and Sustainability Considerations
Performance needs: (C-based drivers like psycopg2 are faster than pure-Python alternatives); Lesson 1087 — Database Drivers and Connection Libraries
Performance thresholds: "The model must achieve at least 85% accuracy" or "reduce processing time by 30%"; Lesson 2117 — Defining 'Good Enough' with Stakeholders
Performance-driven iteration: Your model doesn't meet accuracy thresholds, prompting cycles through feature engineering, data collection, or even problem rescoping.; Lesson 2092 — Iteration and Feedback Loops in Practice
Period 0: Usually 100% (the acquisition event itself); Lesson 1648 — Cohort Retention Rates
Period 1: % who returned in the first subsequent period; Lesson 1648 — Cohort Retention Rates
Period 2: % who returned in the second period, and so on; Lesson 1648 — Cohort Retention Rates
Period selection: Choose periods that match your business cycle.; Lesson 1671 — Churn Rate Calculation Methods
Periodic patterns: Cyclical or wave-like relationships; Lesson 1189 — Detecting Nonlinear Relationships
Permanent: Fail fast, log the issue, alert immediately, possibly route to a dead-letter queue for investigation; Lesson 1849 — Transient vs Permanent Failures
Permanent failures: usually involve:; Lesson 1849 — Transient vs Permanent Failures
Permissions and licensing: Who authorized access?; Lesson 1161 — Documenting Data Sources
Permissive licenses: (like MIT, BSD, and Apache 2.; Lesson 2081 — Understanding Open Source Licenses
Permutation methods: are useful when testing whether two groups differ.; Lesson 291 — Non-Parametric Alternatives for Difference Intervals
Permutation or bootstrap approaches: Distribution-free methods that don't assume normality; Lesson 470 — When Parametric ANOVA Assumptions Fail
Permutation tests: offer a clever alternative: they use resampling to build a reference distribution from your own data.; Lesson 502 — Permutation Tests for Correlation
Person-time: Modeling disease incidence rates with different follow-up durations; Lesson 692 — Offset Terms for Exposure
Personal conflicts: A friend asks you to help prove their startup idea will work.; Lesson 35 — Conflicts of Interest and Independence
Personal relationships: You're analyzing data about a friend's project or a competitor of someone close to you.; Lesson 1930 — Managing Conflicts of Interest
Personalization: High-value segments might receive premium support, exclusive offers, or early access to features.; Lesson 1669 — LTV Segmentation and Targeting
Personalize experiences: Tailor messaging based on where users are in their journey; Lesson 1719 — The Customer Journey and Touchpoints
Perspective projection: (default): Objects farther away appear smaller, mimicking how human eyes see the world.; Lesson 1326 — Viewing Angles and Projection Types
Peto test: (Peto-Peto modification) uses a weighting scheme between log-rank and Wilcoxon.; Lesson 823 — Log-Rank Test vs Other Tests
Phantom reads: A query returns different rows on repeat execution because another transaction inserted/deleted data; Lesson 1116 — Transaction Isolation and Concurrency
Phi: is a special case used exclusively for 2×2 contingency tables.; Lesson 429 — Effect Size: Cramér's V and Phi
Physical constraints: (negative ages, impossible dates); Lesson 75 — Domain-Specific Outlier Rules Lesson 1211 — Domain Validation and Sanity Checks
Physics/chemistry: R² > 0.; Lesson 533 — Interpreting R-Squared Values
Pick a population distribution: (any shape—uniform, exponential, skewed, bimodal, doesn't matter); Lesson 222 — Visualizing the CLT with Simulations
Pick comparable cohorts: Same definition (e.; Lesson 1659 — Comparing Retention Across Cohorts
Pie charts: Displaying parts of a whole (market share, budget allocation) — use sparingly and only with 2-5 slices; Lesson 1959 — Choosing Familiar Chart Types
Pilot Studies: Lesson 297 — Handling Unknown Population Parameters
Pin versions explicitly: (e.; Lesson 1987 — Environment and Dependency Management
Pipeline delays: Data usually arrives at 6 AM but starts arriving at 9 AM.; Lesson 2136 — Monitoring Gaps and Silent Failures
Pipeline Runtime: How long each run takes from start to finish.; Lesson 1856 — Key Metrics to Monitor
Pipeline validation: Detect unexpected changes in upstream data sources; Lesson 1871 — Why Version Control for Data?
Pipeline/job identifier: Lesson 1857 — Logging Best Practices
Pipenv: ) are modern dependency managers that treat your project like a publishable package from day one.; Lesson 2051 — Poetry and Modern Python Tools
Pitfall: Conditioning on M blocks the path from A to Y, hiding the causal effect you want to measure.; Lesson 1476 — Common DAG Patterns and Pitfalls
Plan changes: Modifying a task affects everything downstream; Lesson 1841 — Upstream and Downstream Dependencies
Planned vs. exploratory comparisons: Pre-specified contrasts vs.; Lesson 824 — Multiple Group Comparisons
Platform differences: compound the problem: A package compiled for Windows may behave differently than its macOS version, or the ARM architecture on newer Macs requires different binaries than Intel chips.; Lesson 2048 — The Dependency Hell Problem
Platform-friendly: Ad networks often require the slot to be filled; Lesson 1747 — Ghost Ads and PSA Tests
Platform-specific optimizations: Lesson 1691 — Mobile vs Desktop Conversion Analysis
Platykurtic: (kurtosis < 3 or excess kurtosis < 0): Light tails and a flatter peak.; Lesson 66 — Kurtosis: Definition and Interpretation
Plausibility: Does a reasonable mechanism explain *how* X could cause Y, given current scientific knowledge?; Lesson 498 — Bradford Hill Criteria for Causation
Plot your data: boxplots and histograms for each group; Lesson 290 — Assumptions and Diagnostics for Difference Intervals
Plotly: transform your geographic data into engaging web visualizations.; Lesson 1313 — Interactive Maps with Folium and Plotly Lesson 1321 — Interactive Network Graphs with Plotly and Pyvis Lesson 1371 — Default Aesthetics and Design Choices
Plotly charts: use `st.; Lesson 1333 — Displaying Charts and Tables in Streamlit
Plotly Express: by specifying an `animation_frame` parameter pointing to your time or category column.; Lesson 1306 — Animation and Time-Based Transitions
Plotting utilities: (`utils/plotting.; Lesson 2075 — Utility Modules and Helper Functions
PNG: (`.; Lesson 1262 — Saving Figures to Files
Pocock: Distributes alpha more evenly across all looks; Lesson 1512 — Group Sequential Testing
Poetry: (and similarly, **Pipenv**) are modern dependency managers that treat your project like a publishable package from day one.; Lesson 2051 — Poetry and Modern Python Tools
point estimate: of the difference is simply p̂₁ - p̂₂.; Lesson 280 — Confidence Intervals for Difference in Proportions Lesson 412 — Confidence Interval for Difference Lesson 607 — Confidence Intervals for Coefficients
Pointers: to storage locations rather than storing full copies in version control; Lesson 1871 — Why Version Control for Data?
Pointers, not files: For large models and datasets, commit references (like file hashes or DVC tracking files) rather than the actual binaries; Lesson 2034 — Committing Data Artifacts and Model Outputs
Points: (`geom_point`) for scatter plots showing individual observations; Lesson 1342 — Geometric Objects (geoms)
Points along the line: Perfect or near-perfect normality.; Lesson 566 — Reading Q-Q Plots: Interpreting Points Along the Reference Line
Points Beyond Control Limits: Lesson 1401 — Detecting Out-of-Control Signals
Points on the diagonal: Your data matches the normal distribution well; Lesson 204 — Q-Q Plots: Theory and Interpretation
Poisson: tracks "k events occurring at average rate λ"; Lesson 142 — Poisson as Limit of Binomial Lesson 154 — Real-World Use Cases: Customer Behavior and Events Lesson 664 — What is the Exponential Family of Distributions?Lesson 676 — Canonical vs Non- Canonical Links
Poisson distribution: when:; Lesson 153 — Real-World Use Cases: Quality Control and Defects Lesson 154 — Real-World Use Cases: Customer Behavior and Events Lesson 669 — The Dispersion Parameter φ
Poisson model: perfectly.; Lesson 154 — Real-World Use Cases: Customer Behavior and Events
Poisson probability tables: or statistical software.; Lesson 143 — Cumulative Poisson Probabilities
Poisson process: .; Lesson 139 — The Poisson Process and Rate Parameter Lesson 140 — Poisson Probability Mass Function
Poisson-distributed variables: (events occurring at a constant rate); Lesson 213 — Square Root and Cube Root Transformations
Polar coordinates: transform bar charts into pie charts or create radial plots; Lesson 1344 — Scales and Coordinate Systems
Polish the Presentation: Lesson 1217 — The Transition from Explore to Explain
Polynomial features: let you capture these curves *within* a linear regression framework by adding powers of your existing variables.; Lesson 657 — What Are Polynomial Features?Lesson 662 — Polynomial Features vs Splines
Pool all observations: and randomly reassign them to groups; Lesson 395 — Permutation Tests for Means and Beyond
Pooled variance: assumes both groups have the same underlying population variance.; Lesson 285 — Pooled vs Unpooled Variance Approaches
pooled variance t-test: is specifically designed for situations where you can reasonably assume both populations have the **same variance** (even if their means differ).; Lesson 361 — Pooled Variance t-Test Lesson 362 — Welch's t-Test for Unequal Variances Lesson 379 — The Assumption of Equal Variances (Homoscedasticity)
Poor decision-making: based on correlation rather than causation; Lesson 1637 — What is Metric Attribution?
Poor interpretation: (stakeholders misread what the metric actually measures); Lesson 1619 — What is Metric Ownership?
Poor model fit: Standard models assume stable variance and mean; Lesson 734 — Why Differencing and Detrending Matter
Poor objective: "Increase metrics"; Lesson 1609 — Setting Effective Objectives
population: ), you have complete information.; Lesson 50 — Population vs Sample Variance Lesson 228 — Defining Populations and Parameters Lesson 229 — Defining Samples and Statistics Lesson 232 — Notation Conventions Lesson 261 — Standard Error vs Standard Deviation Lesson 692 — Offset Terms for Exposure
Population Characteristics: Lesson 243 — Choosing the Right Sampling Method
Population distribution: is the complete album of everyone's heights in a country — every single person.; Lesson 258 — Comparing Population, Sample, and Sampling Distributions
Population mean (μ): The average of all values in the population; Lesson 228 — Defining Populations and Parameters
Population proportion (p): The fraction of the population with a certain characteristic; Lesson 228 — Defining Populations and Parameters
Population standard deviation (σ): How spread out the population values are; Lesson 228 — Defining Populations and Parameters Lesson 292 — Sample Size for Estimating a Mean
Population variability: (σ): More spread in the population → larger SE; Lesson 260 — Defining Standard Error
Population variance: Divide by **N** (total count of all values); Lesson 50 — Population vs Sample Variance
Population variance (σ²): Expected variability in each group; Lesson 289 — Sample Size Requirements for Difference Intervals
Portfolio thinking: means treating your channels like investments:; Lesson 1716 — Channel Mix and Portfolio Thinking
position: , viewers can judge "this is twice that" with about 5% error.; Lesson 1232 — Perceptual Accuracy Hierarchy Lesson 1238 — Matching Encoding to Data Type Lesson 1242 — Inappropriate Chart Types for Data Lesson 1341 — Data and Aesthetic Mappings
Position + Color: Use spatial separation or faceting along with color coding; Lesson 1251 — Avoiding Reliance on Color Alone
Position along an axis: (the most accurate encoding); Lesson 1238 — Matching Encoding to Data Type
Position along non-aligned scales: (e.; Lesson 1232 — Perceptual Accuracy Hierarchy
positive: (you can't divide by zero or use negatives without complications); Lesson 216 — Reciprocal and Inverse Transformations Lesson 539 — What Are Residuals?Lesson 652 — Interpreting Categorical × Continuous Interactions
Positive (Right) Skewness: Lesson 64 — Skewness: Definition and Interpretation
Positive coefficient: → that category has a higher average outcome than the reference; Lesson 637 — Interpreting Dummy Variable Coefficients
Positive correlation: Points trend upward from left to right (as one variable increases, so does the other); Lesson 1222 — Scatter Plots for Relationships
Positive r: Variables move together (height and weight, study time and test scores); Lesson 477 — Interpreting the Correlation Coefficient
Positive residuals: (`e_i > 0`) occur when the actual value is *above* the fitted line.; Lesson 540 — The Residual Formula
Positive slope: X and Y move together (X ↑ means Y ↑); Lesson 524 — The Meaning of the Slope
Positive values: = heavier tails than normal (leptokurtic); Lesson 67 — Calculating Kurtosis Lesson 212 — Log Transformations Lesson 720 — The Autocorrelation Function (ACF)
Post: is a binary indicator (1 if observation is from post-treatment period, 0 if pre-treatment); Lesson 1455 — DiD with Regression
Post-hoc considerations: include:; Lesson 824 — Multiple Group Comparisons
Post-hoc tests: (meaning "after this") are designed to make pairwise comparisons *after* finding a significant ANOVA result.; Lesson 455 — Why Post-Hoc Tests Are Needed After ANOVA
Posterior: Given a positive test, what's the probability they actually have the disease?; Lesson 107 — Bayes' Theorem Formula and Components Lesson 115 — Prior Sensitivity Analysis Lesson 1417 — Bayesian Change-Point Detection Lesson 1550 — What Are Conjugate Priors?Lesson 1552 — Gamma-Poisson Conjugacy Lesson 1557 — The Beta-Binomial Model
posterior distribution: is the end result of Bayesian inference—it's what you *actually care about*.; Lesson 1537 — The Posterior Distribution Lesson 1539 — Interpreting Posterior Probabilities Lesson 1563 — Sequential Updating with New Data
Posterior distributions: tell you not just "which variant is winning?; Lesson 1586 — Multi-Armed Bandit Connections Lesson 1587 — Bayesian A/B Testing in Practice
Posterior mean: a weighted average of your prior mean and the sample mean, weighted by their precisions (inverse variances); Lesson 1553 — Normal-Normal Conjugacy Lesson 1561 — Posterior Mean and Mode
Posterior Mode: The peak of the posterior distribution, also called the Maximum A Posteriori (MAP) estimate — the single most probable value.; Lesson 1561 — Posterior Mean and Mode
Posterior predictive checks: answer this by simulating new datasets from your posterior distribution and comparing them to your observed data.; Lesson 1596 — Posterior Predictive Checks and Model Comparison
Posterior variance: combines information from both the prior and the data; Lesson 1553 — Normal-Normal Conjugacy
Posterior: P(B|A): Your *updated* belief about A *after* observing evidence B; Lesson 107 — Bayes' Theorem Formula and Components
PostgreSQL: is an enterprise-grade, open-source DBMS known for handling complex queries and large datasets.; Lesson 845 — Database Management Systems (DBMS)Lesson 862 — Case Sensitivity in Text Filtering Lesson 940 — Database Support and Alternatives Lesson 1041 — Formatting and Parsing Dates
power: to detect effects in the predicted direction; Lesson 345 — Directionality in Hypothesis Testing Lesson 397 — Power and Efficiency of Non-Parametric Tests Lesson 475 — Choosing Between Parametric and Non-Parametric Tests Lesson 1495 — Power Analysis Fundamentals
Power (1 - β): , typically 0.; Lesson 296 — Sample Size for Comparing Two Groups Lesson 343 — Calculating Power for Common Tests Lesson 344 — Power Analysis in Study Design
Power analysis: is the process of determining the minimum sample size required to detect an effect of a given size with adequate statistical power, all while controlling your Type I error rate (alpha).; Lesson 344 — Power Analysis in Study Design
Power-imbalanced contexts: Employees consenting to employer tracking, students in research studies, prisoners, or patients in medical settings; Lesson 1918 — Special Populations and Vulnerable Groups
powerful: when your data is truly normal, but it's very sensitive to non-normality—it might reject equal variances simply because your data isn't perfectly bell-shaped, not because variances actually differ.; Lesson 380 — Testing Equal Variances: Levene's and Bartlett's Tests Lesson 450 — Homogeneity of Variance (Homoscedasticity)
Practical: Works when no complete population list exists; Lesson 238 — Multistage Sampling
Practical implementation: Lesson 2070 — Separating Data from Code
Practical limit: Most effective trees are 3-5 levels deep with 3-7 branches per node; Lesson 1623 — Depth vs Breadth in Metric Trees
practical significance: is the difference large enough to matter in context?; Lesson 367 — Interpreting Two-Sample Test Results Lesson 386 — Effect Size Interpretation Guidelines Lesson 387 — Confidence Intervals for Effect Sizes Lesson 389 — Reporting Effect Sizes in Practice Lesson 529 — Practical vs Statistical Significance Lesson 609 — Practical vs Statistical Significance Lesson 1480 — Minimum Detectable Effect (MDE)
Praise good work: When you see clever solutions or clear code, say so!; Lesson 2024 — Code Review Best Practices
Pre-creates: a set number of connections when your application starts; Lesson 1092 — Connection Pooling Basics
Pre-experiment validation: means running tests to ensure your randomization works properly and your metrics behave as expected *before* you expose users to actual treatment differences.; Lesson 1483 — Pre-Experiment Validation
Pre-filtering problems: Different groups experiencing different dropout rates during assignment; Lesson 1524 — Sample Ratio Mismatch (SRM)
Pre-register analyses: Decide your approach *before* seeing results; Lesson 30 — The Reproducibility Crisis and Solutions
Pre-register your alpha level: (usually 0.; Lesson 368 — Common Pitfalls and Best Practices
Pre-registration: means writing down your hypotheses, metrics, sample size, stopping rules, and correction methods *before* you peek at any results.; Lesson 1508 — Pre-Registration and Correction Strategy
precise control: , especially when creating multiple subplots or building complex visualizations.; Lesson 1256 — Two Interfaces: pyplot vs Object-Oriented Lesson 1277 — Adjusting Subplot Spacing and Layout
precision: of your sample statistic as an estimate of the true population parameter.; Lesson 265 — Using Standard Error in Practice Lesson 295 — Trade-offs: Precision, Confidence, and Cost Lesson 387 — Confidence Intervals for Effect Sizes Lesson 389 — Reporting Effect Sizes in Practice Lesson 1418 — Evaluating Change-Point Detection Methods Lesson 1567 — Posterior Mean as Weighted Average
Precision & Recall: For classification problems, how many relevant items did you catch, and how many false alarms did you trigger?; Lesson 14 — Model Evaluation and Validation
Precision is needed: Reading exact values from 3D axes is significantly harder than 2D; Lesson 1329 — Effective Use and Pitfalls of 3D Visualizations
Precision matters: viewers need to read exact values or make close comparisons; Lesson 1233 — Position as the Most Effective Channel
Predicting product lifespans: Manufacturers use k < 1 to model defects caught in early testing; Lesson 188 — Weibull Distribution: Hazard Function and Reliability
prediction intervals: a range where we expect the true value to fall with a certain confidence level (often 80% or 95%).; Lesson 794 — Forecasting Concepts and Horizons Lesson 800 — Generating Forecasts with SARIMA
Prediction intervals grow wider: as the forecast horizon extends.; Lesson 800 — Generating Forecasts with SARIMA
Predictions: Every observation gets the identical predicted value; Lesson 647 — Impact on Model Results and Reporting
Predictive parity: When the model predicts success, is it equally accurate across groups?; Lesson 1884 — Detecting Bias in Your Data
Predictive problems: answer "What will happen?; Lesson 2096 — Distinguishing Descriptive, Diagnostic, and Prescriptive Problems
Predicts sustainable growth: – When it improves, revenue and retention typically follow; Lesson 1604 — What is a North Star Metric?
Prefect: , and **Dagster** log every execution step.; Lesson 1164 — Tools for Lineage Tracking Lesson 1843 — Declaring Dependencies in Orchestration Tools
Pregnancy status: Lesson 1888 — Protected Classes and Sensitive Attributes
Preliminary evidence: correlation coefficients, group comparisons, or statistical summaries that suggest the hypothesis may hold (e.; Lesson 1203 — Documenting Hypotheses and Evidence
Prepare Your Data: Lesson 447 — Conducting One-Way ANOVA in Practice
Preprocessing utilities: (`utils/preprocessing.; Lesson 2075 — Utility Modules and Helper Functions
Prerequisites: Required software, packages, and versions; Lesson 1989 — Best Practices for Sharing Reproducible Reports
Prescriptive problems: answer "What should we do?; Lesson 2096 — Distinguishing Descriptive, Diagnostic, and Prescriptive Problems
Present contradictory evidence: that challenges your hypothesis; Lesson 1929 — Avoiding Cherry-Picking Results
Preserves slower-moving patterns: (the trend component); Lesson 755 — Moving Averages for Trend Estimation
Prevalence: the base rate of the disease in the population; Lesson 109 — Medical Diagnostic Testing Lesson 116 — From Bayes' Theorem to Bayesian Inference
Prevention: Use additive decomposition or agreed-upon attribution rules *before* initiatives launch.; Lesson 1642 — Attribution Pitfalls and Common Errors
Prevents Direct Pushes: No one can use `git push` directly to protected branches—all changes must go through pull requests.; Lesson 2027 — Protecting Branches and Required Reviews
Prevents Force Pushes: Protects against accidental history rewrites that could break reproducibility.; Lesson 2027 — Protecting Branches and Required Reviews
Preview data structure: without downloading entire tables; Lesson 877 — LIMIT: Restricting the Number of Rows Returned
Price sensitivity: (competitor pricing, perceived value); Lesson 1675 — Churn Attribution and Root Cause Analysis
Pricing optimization: Test whether annual plans reduce hazard rates compared to monthly; Lesson 838 — Subscription and Membership Duration Modeling
Primacy Effect: Conversely, existing users are *already comfortable with the old version*.; Lesson 1525 — Novelty and Primacy Effects
Primary and secondary metrics: you'll measure; Lesson 1508 — Pre-Registration and Correction Strategy
Primary contact: Your email or project maintainer's handle; Lesson 2083 — Contributing Guidelines and Contact Information
Primary data geoms: The main visual elements (points, lines, bars); Lesson 1355 — Layer Order and Plot Composition
Primary Key: A unique identifier for each row (like `customer_id`); Lesson 843 — Relational Database Concepts Lesson 921 — Primary and Foreign Key Relationships Lesson 1048 — What Are Primary Keys?Lesson 1051 — Introduction to Foreign Keys
primary metric: (or success metric) must directly align with your business goal.; Lesson 1478 — Defining Success Metrics Lesson 1485 — Documentation and Pre-Registration
Primary test: Compare mean purchase frequency between age groups using appropriate statistical tests; Lesson 1204 — From Hypothesis to Analysis Plan
Principal Data Scientist: Strategic technical direction, influence company-wide architecture, recognized external expert; Lesson 2140 — Individual Contributor vs Management Tracks
Prior: How common is the disease in the population?; Lesson 107 — Bayes' Theorem Formula and Components Lesson 1417 — Bayesian Change-Point Detection Lesson 1550 — What Are Conjugate Priors?Lesson 1552 — Gamma-Poisson Conjugacy
prior belief: (what you thought before seeing evidence); Lesson 108 — Updating Beliefs with New Evidence Lesson 112 — Legal Evidence and Jury Reasoning Lesson 115 — Prior Sensitivity Analysis Lesson 1417 — Bayesian Change-Point Detection Lesson 1557 — The Beta-Binomial Model Lesson 1566 — Conjugate Normal-Normal Model
prior distribution: quantifies your beliefs about a parameter *before* you observe any data.; Lesson 1534 — The Prior Distribution Lesson 1543 — Defining Prior Distributions Lesson 1544 — Informative vs Uninformative Priors Lesson 1563 — Sequential Updating with New Data Lesson 1565 — Prior Distributions for Normal Means Lesson 1581 — Setting Priors for A/B Tests
Prior knowledge: | Ignored | Incorporated explicitly |; Lesson 1580 — Bayesian vs Frequentist A/B Testing
Prior mean (μ₀): Your best guess for the population mean before seeing data; Lesson 1565 — Prior Distributions for Normal Means
Prior precision: How concentrated is your prior distribution?; Lesson 1549 — Prior-Likelihood Trade-offs
Prior probability P(Guilty): base rate of guilt before evidence; Lesson 112 — Legal Evidence and Jury Reasoning
Prior standard deviation (σ₀): How uncertain you are about that guess; Lesson 1565 — Prior Distributions for Normal Means
Prior: P(A): Your initial belief about A *before* seeing evidence B; Lesson 107 — Bayes' Theorem Formula and Components
Prioritize by pain: Refactor the parts of your pipeline that cause the most frequent issues or slow you down most; Lesson 2137 — Refactoring Strategies and Debt Paydown
Prioritize Interpretability: Lesson 678 — Choosing the Right Link Function
Prioritize ruthlessly: what matters most; Lesson 2121 — Timeboxing and Deadlines
Prioritized: Rank recommendations by impact, feasibility, or urgency.; Lesson 1970 — Recommendations and Next Steps
Priority level: based on business impact and testability, which hypotheses deserve formal testing first?; Lesson 1203 — Documenting Hypotheses and Evidence
Priors: Specify distributions for unknown parameters; Lesson 1594 — PyMC: Probabilistic Programming in Python
Priors are extreme: Starting at 0.; Lesson 115 — Prior Sensitivity Analysis
Priors are similar: Starting at 30% versus 35% won't create huge differences; Lesson 115 — Prior Sensitivity Analysis
Priors matter less when: Lesson 115 — Prior Sensitivity Analysis
Privacy Attacks: Models trained on sensitive data might leak information through inference attacks, even if you've applied privacy techniques.; Lesson 1920 — Anticipating Misuse of Data Products
Privacy budget (ε): How much privacy you're willing to "spend" (smaller ε = more noise = more privacy); Lesson 1899 — Adding Noise for Privacy
Privacy-preserving machine learning: where training data stays encrypted throughout; Lesson 1903 — Secure Multi-Party Computation
Proactive monitoring: Owner spots anomalies and drives root-cause analysis; Lesson 1619 — What is Metric Ownership?
Probability Density Function (PDF): .; Lesson 155 — Definition and Properties of Continuous Random Variables Lesson 156 — Probability Density Functions (PDFs)Lesson 162 — Uniform Distribution: PDF and CDF
Probability Mass Function (PMF): comes in.; Lesson 118 — Probability Mass Functions (PMF)Lesson 119 — Properties of Valid PMFs Lesson 120 — Cumulative Distribution Functions (CDF) for Discrete Variables Lesson 124 — Bernoulli Distribution PMF and Parameters
Probability of Being Best: directly answers this question by computing the probability that a given variant has the highest true conversion rate (or other metric) compared to all other variants.; Lesson 1583 — Probability of Being Best Lesson 1586 — Multi-Armed Bandit Connections
Probability sampling: means every member of the population has a *known, non-zero chance* of being selected.; Lesson 242 — Probability vs Non-Probability Sampling
Probability sampling advantages: Lesson 242 — Probability vs Non-Probability Sampling
Probability sampling challenges: Lesson 242 — Probability vs Non-Probability Sampling
Probability sampling methods: give you statistical validity.; Lesson 243 — Choosing the Right Sampling Method
Probability statements: You can make direct claims like "There's a 95% probability the conversion rate is between 0.; Lesson 1547 — Interpreting Posterior Distributions
Probability threshold: Stop when P(B better than A | data) > 0.; Lesson 1585 — Early Stopping in Bayesian Tests
Probe edge cases: "What happens if the model is wrong?; Lesson 2102 — Understanding Stakeholder Goals and Constraints
Probit: has thinner tails (based on the normal distribution); Lesson 674 — The Probit Link Lesson 678 — Choosing the Right Link Function
probit link: does the same job but uses the cumulative distribution function (CDF) of the standard normal distribution instead.; Lesson 674 — The Probit Link Lesson 676 — Canonical vs Non-Canonical Links Lesson 677 — Interpreting Coefficients Under Different Links
Problem Definition: Lesson 9 — The Data Science Lifecycle Overview Lesson 10 — Problem Definition and Scoping
Process: Each worker applies the same operation to its chunk independently; Lesson 1768 — Data Parallelism Fundamentals
Process everything, every time: Lesson 1828 — Incremental vs Full Load Strategies
Processing speed: One-pass transformation instead of read-then-transform; Lesson 1802 — Filtering During Read with dtype and Converters
Product A: 10,000 new users/month, 10% retention → 1,000 active users; Lesson 1614 — Growth Without Retention
Product B: 2,000 new users/month, 70% retention → 1,400 active users; Lesson 1614 — Growth Without Retention
Product categories: An item categorized as "electronics" cannot also be "clothing" (assuming mutually exclusive classification); Lesson 81 — Mutually Exclusive Events
Product changes: Measure impact on cohorts before vs after a launch; Lesson 1644 — What is Cohort Analysis?
Product feedback: Early adopters who volunteer feedback aren't typical users; Lesson 246 — Volunteer and Self-Selection Bias
Product focus: Should you optimize for retention of casual users or delight of power users?; Lesson 1698 — Power User Curves and Engagement Distribution
Product gaps: (missing features, usability issues); Lesson 1675 — Churn Attribution and Root Cause Analysis
Product Launches: Companies use DiD when rolling out features to some markets first.; Lesson 1459 — Real-World DiD Applications
Product Managers: prioritize:; Lesson 1951 — Understanding Stakeholder Priorities and Constraints
Product Page View: Lesson 1679 — Defining Funnel Steps and Events
Product recommendations: Finding pairs of products from the same `products` table; Lesson 945 — Introduction to Self-Joins
Product reviews: skew positive when only satisfied customers bother to write them; Lesson 247 — Survivorship Bias
Product Team Objective: Improve discovery experience; Lesson 1608 — Connecting North Star Metrics to OKRs
Product-market fit quality: Higher floors suggest stronger fit; Lesson 1658 — Flattening and Asymptotic Behavior
Production code: Applications should specify exactly which columns they need for clarity and performance; Lesson 851 — Selecting All Columns with Asterisk
Production deployment: Compiled models are easier to integrate into non-Python systems; Lesson 1595 — Stan: High-Performance Bayesian Inference
Production pipelines: that run automatically (ETL, model training, inference); Lesson 2074 — Notebooks vs Scripts: When to Use Each
Production pipelines dominate: Python integrates better with web services, APIs, and deployment infrastructure.; Lesson 1375 — Choosing Tools: When to Use R vs Python for Visualization
Production releases: `v1.; Lesson 2037 — Tagging Releases and Experiment Snapshots
Productivity: Focus on business logic, not query construction; Lesson 1117 — What is an ORM and Why Use It?Lesson 1469 — Building a Simple Causal DAG
Professional color palettes: (ColorBrewer, viridis); Lesson 1369 — Publication-Ready Plot Styling
Professionalism: Lesson 1292 — Introduction to Styling: Why Aesthetics Matter
Profiling reports: go deeper: statistics for numeric columns (mean, min, max), cardinality for categorical fields, missing value percentages, and distribution summaries.; Lesson 2067 — Automating Documentation with Code
Profitability: Lesson 1516 — Business Metrics: Definition and Examples
Profitability focus: By identifying unprofitable segments (LTV < CAC), you can adjust targeting criteria, reduce spend, or experiment with lower-cost channels.; Lesson 1669 — LTV Segmentation and Targeting
Programming: You'll need to write code to clean, analyze, and visualize data.; Lesson 7 — The Data Science Skill Stack
Project portability: Each project carries its own dependency specification, making deployment predictable; Lesson 2039 — Virtual Environments: Concept and Benefits
Project Structure: Brief overview of directory organization; Lesson 2077 — The Purpose and Anatomy of a Good README
Project templates: solve this by providing a blueprint—a cookie cutter, if you will—that stamps out a consistent structure every time you start fresh.; Lesson 2076 — Code Organization Templates and Cookiecutter
Project Title and Description: One-line summary and brief explanation of the project's purpose; Lesson 2077 — The Purpose and Anatomy of a Good README
Project-Join Normal Form: ) eliminates **join dependencies**.; Lesson 1068 — Higher Normal Forms: 4NF and 5NF
Project-level: "Final presentation is in 3 weeks—no exceptions"; Lesson 2121 — Timeboxing and Deadlines
Prometheus: , **Grafana**, and **Datadog** automate this process, offering dashboards that show pipeline status at a glance and trigger alerts when thresholds are breached.; Lesson 1861 — Monitoring Tools and Dashboards
Proportion test: When your metric is a conversion rate or percentage; Lesson 1749 — Measuring Statistical Significance
Proportional allocation: assigns credit based on estimated contribution size (e.; Lesson 1640 — Attribution in Multi-Team Environments
proportions: like the percentage of customers who click an ad, or the fraction of defective products?; Lesson 224 — CLT for Proportions Lesson 253 — Sampling Distribution of the Sample Proportion Lesson 297 — Handling Unknown Population Parameters Lesson 315 — Common Test Statistics: Z, t, Chi-Square, and F Lesson 1187 — Contingency Tables and Cross-Tabulations
Propose: a new location nearby (a candidate parameter value); Lesson 1590 — The Metropolis-Hastings Algorithm
Pros: Lightning-fast reads, simplified queries; Lesson 1076 — Materialized Views and Summary Tables
Prospects: People who've shown interest but haven't purchased yet.; Lesson 1704 — Customer Lifecycle Stages
Protanopia/Protanomaly: (red-weak): similar red-green confusion; Lesson 1248 — Color Blindness and Color Palette Design
Protected classes: are groups of people shielded by law from discrimination.; Lesson 1888 — Protected Classes and Sensitive Attributes
Protection from SQL injection: Parameterization is automatic; Lesson 1117 — What is an ORM and Why Use It?
Prototyping models: and experimenting with different approaches; Lesson 2074 — Notebooks vs Scripts: When to Use Each
Provenance questions: Can you trust data from third-party APIs or scraped sources?; Lesson 1762 — Extended Dimensions: Veracity and Value
Provide context: Explain *why* something matters.; Lesson 2024 — Code Review Best Practices
Provide fast retrieval: through indexing and optimized queries; Lesson 842 — What is a Database?
Proximity: Elements placed close together are perceived as related.; Lesson 1236 — Gestalt Principles in Visualization
Proxy validation is skipped: Teams assume a surrogate metric correlates with the real goal without validating that relationship (remember lesson 1520: Validating Surrogate Metrics).; Lesson 1530 — Mismatched Metrics and Goals
proxy variable: is a feature that correlates strongly with a protected attribute, allowing a model to infer sensitive information indirectly.; Lesson 1883 — Protected Classes and Proxy Variables Lesson 1889 — Proxy Variables and Redlining
Prunes intelligently: Eliminates candidate change-points that can never be part of the optimal solution, based on proven mathematical conditions; Lesson 1416 — PELT Algorithm: Pruned Exact Linear Time
Pseudonymization: replaces identifiers with artificial labels—Patient A, Patient B—allowing you to track the same individual across records without knowing their real identity.; Lesson 1895 — Data Anonymization Basics
Public datasets: Government databases, research repositories, open data portals; Lesson 11 — Data Collection and Acquisition
Public Task: Lesson 1906 — Legal Bases for Processing Personal Data
Purchase Frequency: counts how many purchases the typical customer makes in a given period (say, per year).; Lesson 1663 — Simple LTV: Average Revenue Per Customer
Pure AR (Autoregressive) Process: Lesson 733 — Using ACF and PACF Together
Pure coincidence: Random chance, especially with small samples or cherry-picked data; Lesson 493 — The Fundamental Difference: Association vs Cause-and-Effect Lesson 494 — Spurious Correlations and Coincidence
Purpose: What question does this report answer?; Lesson 1989 — Best Practices for Sharing Reproducible Reports Lesson 2007 — Branch Naming Conventions
Purpose limitation: Data collected for one purpose can't be repurposed for unrelated analytics without new consent; Lesson 1904 — What is GDPR and Why It Matters Lesson 1905 — Core Principles of GDPR
put it back: , shake the bag, and draw again.; Lesson 298 — The Bootstrap Method: Resampling Your Data Lesson 299 — How Bootstrap Resampling Works
Pyramid Principle: , developed by Barbara Minto at McKinsey, flips the traditional "journey" narrative on its head.; Lesson 1942 — The Pyramid Principle: Starting with the Conclusion Lesson 1944 — Executive Summary Best Practices Lesson 1945 — Logical Flow: From Question to Answer Lesson 1952 — The Pyramid Principle: Leading with Conclusions
Python: , `plotly.; Lesson 1374 — Interactivity: plotly in R vs Python and Integration Patterns Lesson 1987 — Environment and Dependency Management Lesson 2073 — Naming Conventions for Files and Functions
Python (pandas/statsmodels): Lesson 646 — Reference Categories in Statistical Software
Python class: represents a database **table**; Lesson 1117 — What is an ORM and Why Use It?
Python version: (or R, Julia, etc.; Lesson 2038 — What is Environment Management and Why It Matters
Python with NumPy: Lesson 482 — Calculating Pearson Correlation in Practice
Python with Pandas: Lesson 482 — Calculating Pearson Correlation in Practice
Python with SciPy: Lesson 482 — Calculating Pearson Correlation in Practice
Python's approach: is like learning the second language natively from the start.; Lesson 1374 — Interactivity: plotly in R vs Python and Integration Patterns
Python's built-in: `random` module; Lesson 2058 — Seed Scope and Multiple Libraries
PyTorch: (`torch.; Lesson 2058 — Seed Scope and Multiple Libraries
Pyvis: is purpose-built for network visualization.; Lesson 1321 — Interactive Network Graphs with Plotly and Pyvis

Q

Q-Q linearity: Points hugging the diagonal reference line; Lesson 377 — Testing Normality: Visual Methods
Q-Q plot: compares your data's quantiles against a theoretical normal distribution.; Lesson 377 — Testing Normality: Visual Methods Lesson 565 — What Q-Q Plots Show: Comparing Residual Distribution to Normal
Q-Q plot (quantile-quantile plot): Residuals should fall along a straight diagonal line; Lesson 449 — Normality of Residuals Lesson 788 — Checking Residual Normality
Q-Q plot first: Does the pattern look problematic for your purposes?; Lesson 570 — Q-Q Plots vs Formal Normality Tests: When Visual Checks Matter
Q-Q plots: , **histograms**, and tests like **Shapiro-Wilk**.; Lesson 290 — Assumptions and Diagnostics for Difference Intervals Lesson 587 — Identifying Outliers in Regression Context
Q1: (25th percentile): 25% of data falls below this value; Lesson 1383 — Understanding the Interquartile Range (IQR)
Q1 (First Quartile): The value at the 25% mark — one quarter of your data falls below this point; Lesson 51 — Interquartile Range (IQR)
Q3: (75th percentile): 75% of data falls below this value; Lesson 1383 — Understanding the Interquartile Range (IQR)
Q3 (Third Quartile): The value at the 75% mark — three quarters of your data falls below this point; Lesson 51 — Interquartile Range (IQR)
Quadratic or polynomial trends: When your data curves upward or downward in an accelerating pattern; Lesson 736 — Higher-Order Differencing
Quadrupling your sample size: cuts the standard error in half; Lesson 223 — Standard Error and the CLT
Qualitative and aspirational: "Transform user onboarding" beats "Improve metrics"; Lesson 1609 — Setting Effective Objectives
Quality: Good units ÷ total units produced (capturing defects); Lesson 1636 — Manufacturing Metrics: OEE, Yield, and Cycle Time
Quality checks: Validation rules applied, records removed; Lesson 2065 — Tracking Data Lineage
Quality control: A manufacturing process with high variability produces inconsistent products; Lesson 46 — What is Variability?Lesson 351 — When to Use a One-Sample t-Test
Quality control pass rates: (proportion of acceptable products); Lesson 184 — Beta Distribution: Bounded Between 0 and 1
Quality gates exist: Automated tests can run, and approval requirements can block poor code from merging; Lesson 2022 — Understanding Pull Requests
Quality metrics: Products manufactured in different batch sizes; Lesson 43 — Weighted Mean and Its Applications
Quantify business outcomes: Lesson 1969 — Translating Technical Findings for Business Audiences
Quantiles: are the general family of cut-points that divide ranked data into *any* equal-sized groups.; Lesson 57 — Quantiles: Quartiles, Deciles, and Beyond Lesson 306 — Bootstrap for Non-Standard Problems
Quarantine: bad records for review (flexible approach); Lesson 1826 — Data Validation and Schema Enforcement Lesson 1866 — Handling Failed Quality Checks
Quarantine new work: Apply strict standards to new features while gradually improving old ones; Lesson 2137 — Refactoring Strategies and Debt Paydown
Quarterly cycles: Business revenues influenced by fiscal quarters; Lesson 707 — Seasonality: Regular Periodic Patterns
Quarterly data: with yearly seasonality → period = 4; Lesson 746 — Choosing Seasonal Period
Quartiles: (4 groups): Cut your data into quarters.; Lesson 57 — Quantiles: Quartiles, Deciles, and Beyond
Quartiles (4 groups): Lesson 1010 — NTILE(): Dividing Rows into Buckets
Quartiles or deciles: Divide customers into equal-sized groups (top 10%, next 10%, etc.; Lesson 1669 — LTV Segmentation and Targeting
Query Complexity: Higher normalization means more tables.; Lesson 1070 — When to Stop Normalizing
Query execution time: The obvious metric, but run queries multiple times to account for caching; Lesson 1077 — Measuring Performance Impact of Denormalization
Query performance: Only read columns you need.; Lesson 1811 — Columnar Storage and Query Optimization
Queue theory: Time until multiple service completions; Lesson 181 — Gamma Distribution: Shape and Rate Parameters
Quick ad-hoc queries: You're doing temporary analysis and speed matters more than precision; Lesson 851 — Selecting All Columns with Asterisk
Quick fix: Use `tight_layout()`; Lesson 1277 — Adjusting Subplot Spacing and Layout
Quick Ratio: measures how much new and expansion revenue you gain versus how much you lose:; Lesson 1629 — SaaS Growth Metrics: Quick Ratio and Net Revenue Retention
Quick updates: Brief email summaries or Slack messages for "no blockers, progressing as planned"; Lesson 2104 — Communication Cadence and Updates
Quintiles: (5 groups): Split data into fifths, useful in economic studies and portfolio analysis.; Lesson 57 — Quantiles: Quartiles, Deciles, and Beyond
Quota: 30 people aged 18-35, 30 aged 36-55, 40 aged 56+; Lesson 240 — Quota Sampling

R

r = -1: Perfect negative linear relationship (as one variable increases, the other decreases proportionally); Lesson 476 — What is Pearson Correlation?Lesson 477 — Interpreting the Correlation Coefficient
r = +1: Perfect positive linear relationship (as one variable increases, the other increases proportionally); Lesson 476 — What is Pearson Correlation?Lesson 477 — Interpreting the Correlation Coefficient
r = 0: No linear relationship (the variables don't follow a straight-line pattern together); Lesson 476 — What is Pearson Correlation?Lesson 477 — Interpreting the Correlation Coefficient
R Charts: Best for small subgroups (n ≤ 10).; Lesson 1399 — Control Charts for Variability (R and S Charts)
R Charts (Range Charts): track the difference between the highest and lowest values in each sample group.; Lesson 1399 — Control Charts for Variability (R and S Charts)
R-hat statistic: Compares variance within and between multiple chains; values near 1.; Lesson 1592 — Burn-in, Thinning, and Convergence Diagnostics
R-squared: (written as R² or r²) tells you the **proportion of variance in Y that is explained by X**.; Lesson 531 — What is R-Squared?Lesson 543 — Residuals as Unexplained Variation
R-squared and adjusted R-squared: Model fit is unchanged; Lesson 647 — Impact on Model Results and Reporting
R's approach: is like having an interpreter who translates your speech (ggplot2 code) into another language (plotly).; Lesson 1374 — Interactivity: plotly in R vs Python and Integration Patterns
R's base: random generator vs.; Lesson 2058 — Seed Scope and Multiple Libraries
R²: measures the proportion of variance in Y explained by your regression model; Lesson 534 — R-Squared vs Correlation Squared Lesson 613 — The Adjusted R-Squared Formula
R² = 0: Your model explains none of the variance; you might as well use the mean of Y as your prediction; Lesson 531 — What is R-Squared?Lesson 533 — Interpreting R-Squared Values
R² = 0.15: Only 15% of variance is explained; 85% remains unexplained.; Lesson 533 — Interpreting R-Squared Values
R² = 0.7: Your model explains 70% of the variance in Y; Lesson 531 — What is R-Squared?
R² = 0.85: Your model explains 85% of the variance—most of the variation is captured by your regression line.; Lesson 533 — Interpreting R-Squared Values
R² = 1: Your model perfectly predicts every Y value (rare in real life!; Lesson 531 — What is R-Squared?
R² = 1.0: Perfect fit.; Lesson 533 — Interpreting R-Squared Values
R² = r²: .; Lesson 531 — What is R-Squared?Lesson 534 — R-Squared vs Correlation Squared
Race and ethnicity: Lesson 1888 — Protected Classes and Sensitive Attributes
Radio silence after complaints: Sometimes the absence of follow-up signals they've given up; Lesson 1673 — Leading Indicators of Churn
Radioactive decay: An atom that hasn't decayed for an hour is no more "due" to decay than a fresh atom; Lesson 167 — Memoryless Property of Exponential
Rainbow palettes: They suggest order where none exists and aren't colorblind-friendly.; Lesson 1309 — Choropleth Maps: Basics and Best Practices
Rainfall models: Amount of rain over time; Lesson 181 — Gamma Distribution: Shape and Rate Parameters
RAM the dataset consumes: .; Lesson 1206 — Initial Data Profiling: Shape, Types, and Memory
Random Assignment: Each participant has an equal chance of being assigned to either group; Lesson 1435 — What is a Randomized Controlled Trial?Lesson 1486 — Why Randomization Matters in A/B Tests
Random failures: (constant hazard rate).; Lesson 189 — Fitting Weibull Models to Lifetime Data
Random number generators: Does each digit appear with equal frequency?; Lesson 421 — Applications: Uniform, Genetic Ratios, and Distributions
Random sampling: Your data comes from a random process; Lesson 419 — Assumptions and Minimum Expected Frequencies
Random scatter: Good!; Lesson 556 — What Are Residuals and Why Plot Them?
random seeds: come to the rescue.; Lesson 28 — Random Seeds and Deterministic Computation Lesson 29 — Code and Environment Management
Randomization: Lesson 400 — Assumptions and Conditions for Proportion Tests Lesson 499 — Why Controlled Experiments Are Needed Lesson 1436 — The Gold Standard for Causality
Randomization Quality: Both groups should have similar characteristics (demographics, behavior patterns) if randomization works correctly; Lesson 1483 — Pre-Experiment Validation
Randomization unit: User, session, or other unit you defined; Lesson 1485 — Documentation and Pre-Registration
Randomize assignment: Split users randomly into control and treatment groups (e.; Lesson 1641 — Isolating Effects with Control Groups
Randomized Controlled Trial (RCT): is an experimental method where participants are randomly assigned to either a **treatment group** (receives the intervention) or a **control group** (does not receive the intervention).; Lesson 1435 — What is a Randomized Controlled Trial?Lesson 1677 — Measuring Churn Reduction Impact
Randomizing by session: gives you more experimental units (higher power), but risks violating independence assumptions and creates inconsistent experiences.; Lesson 1481 — Unit of Randomization
Randomizing by user: gives cleaner results and consistent experience, but requires more users to detect effects.; Lesson 1481 — Unit of Randomization
Randomly assign users: to treatment (real ad) or control (PSA/ghost ad); Lesson 1747 — Ghost Ads and PSA Tests
randomly assigned: to treatment or control groups.; Lesson 1436 — The Gold Standard for Causality Lesson 1526 — Selection Bias in Opt-In Tests
Randomly select: some clusters; Lesson 237 — Cluster Sampling
range: .; Lesson 47 — Range: The Simplest Measure Lesson 54 — When to Use Each Measure Lesson 266 — What is a Confidence Interval?
Range analysis: "What's the age range of our customers?; Lesson 885 — MIN and MAX: Finding Extremes
Range queries: (`WHERE age BETWEEN 25 AND 35`) find the starting point, then scan consecutive sorted leaves; Lesson 1079 — B-Tree Indexes: Structure and Mechanics
Range retention: User was active *at any point* from start through that period (cumulative); Lesson 1648 — Cohort Retention Rates
Range sliders: excel with time-series data or any ordered sequence where users need to examine specific intervals (e.; Lesson 1303 — Range Sliders and Zoom Controls
Range violations: Negative ages or dates in the future; Lesson 1109 — Input Validation and Defense in Depth Lesson 1150 — What is Data Validation?
Rank the absolute values: of differences from smallest to largest; Lesson 392 — Wilcoxon Signed-Rank Test
Rank them: from 1 (smallest) to n (largest), averaging tied ranks; Lesson 393 — Mann-Whitney U Test (Wilcoxon Rank-Sum)
Rank users: by their activity level (highest to lowest); Lesson 1698 — Power User Curves and Engagement Distribution
RANK(): produces: 1, 2, 2, 4 (gap!; Lesson 1009 — DENSE_RANK(): Ranking Without Gaps
ranking: you pool all observations from both groups, assign ranks from smallest to largest (ignoring which group they came from), then sum the ranks for each group.; Lesson 393 — Mann-Whitney U Test (Wilcoxon Rank-Sum)Lesson 474 — Friedman Test: Non-Parametric Repeated Measures ANOVA Lesson 488 — Computing Spearman Correlation
Rankings: (1st, 2nd, 3rd) alongside the actual data values; Lesson 1005 — Introduction to Window Functions
Rankings are important: ordering items from high to low; Lesson 1233 — Position as the Most Effective Channel
Ranks all observations: from smallest to largest across *all* groups combined (ignoring group membership temporarily); Lesson 471 — Kruskal-Wallis H Test: The Non-Parametric One-Way ANOVA
Rare Events: Earthquakes per year, typos per page, or accidents per month—anything that happens occasionally but at a predictable average rate.; Lesson 144 — Poisson Applications: Arrivals and Events
Raster formats: (like PNG, JPG) store pixels.; Lesson 1273 — Saving Figures: Formats and Resolution
rate: is stable but individual occurrences are unpredictable; Lesson 153 — Real-World Use Cases: Quality Control and Defects Lesson 692 — Offset Terms for Exposure Lesson 1552 — Gamma-Poisson Conjugacy
Rate data: Counts per unit of time, space, or population (e.; Lesson 689 — When to Use Poisson Regression
Rate limiting: Prevent bulk misuse of APIs; Lesson 1925 — Mitigation Strategies and Responsible Disclosure
rate parameter: .; Lesson 139 — The Poisson Process and Rate Parameter Lesson 165 — Exponential Distribution: PDF and CDF Lesson 1552 — Gamma-Poisson Conjugacy
Rate parameter (β, "beta"): Controls how quickly probability "decays" or spreads out.; Lesson 181 — Gamma Distribution: Shape and Rate Parameters
rate parameter λ: (lambda), you can calculate the probability of observing *exactly* k events in your interval.; Lesson 140 — Poisson Probability Mass Function Lesson 166 — Exponential Distribution: Mean and Variance
rates: when your observations have unequal exposure times or denominators.; Lesson 692 — Offset Terms for Exposure Lesson 1613 — Raw Counts vs. Rates and Ratios
Ratio data: (numeric with meaningful zero: height, count, salary) leverages:; Lesson 1238 — Matching Encoding to Data Type
Ratio to partition average: `value / AVG(value) OVER (PARTITION BY category)`; Lesson 1019 — Comparing Values to Window Aggregates
Ratios: (like revenue per customer); Lesson 306 — Bootstrap for Non-Standard Problems Lesson 1613 — Raw Counts vs. Rates and Ratios
Raw Kurtosis (Fisher's): The complete formula above, which subtracts 3 at the end.; Lesson 67 — Calculating Kurtosis
RDD (Resilient Distributed Dataset): is Spark's core data structure—a collection of objects distributed across the nodes in your cluster.; Lesson 1777 — RDDs: Resilient Distributed Datasets Fundamentals
React Slowly to Changes: Lesson 1598 — Characteristics of Lagging Indicators
Reactivations: If churned customers return, do you subtract them from "customers lost"?; Lesson 1671 — Churn Rate Calculation Methods
Reactive mode: You're constantly firefighting instead of preventing fires; Lesson 1617 — The Danger of Lagging-Only Metrics
Read both versions carefully: to understand what each branch changed; Lesson 2011 — Resolving Merge Conflicts
Read Committed: You only see committed data, but values can change during your transaction; Lesson 1116 — Transaction Isolation and Concurrency
Read the intersection: this value is P(Z ≤ your Z-score); Lesson 198 — Using Z-Tables for Probability
Read Uncommitted: You can see other transactions' uncommitted changes (risky!; Lesson 1116 — Transaction Isolation and Concurrency
Read-heavy workloads: If a table is queried 10,000 times daily but updated once, duplicating data to avoid joins is worthwhile.; Lesson 1071 — When to Denormalize: Performance Trade-offs Lesson 1073 — Storing Computed Values and Aggregates
Readability: Listing columns in a logical hierarchy makes your query easier to understand; Lesson 906 — Order Matters: Column Sequence in GROUP BY Lesson 924 — Using Table Aliases in Joins Lesson 974 — When to Use FROM Subqueries vs CTEs Lesson 1106 — Parameter Placeholders: Named Parameters Lesson 1292 — Introduction to Styling: Why Aesthetics Matter
Readability matters: Can you understand what the code does?; Lesson 2024 — Code Review Best Practices
Readmission rate: measures the percentage of patients returning within 30 days—a lagging indicator of both care quality and discharge planning effectiveness.; Lesson 1633 — Healthcare Metrics: Patient Outcomes and Operational Efficiency
Real-time: (milliseconds-to-seconds) demands streaming pipelines with immediate processing.; Lesson 1825 — Designing Pipeline Architecture
Real-time learning: Update beliefs as information arrives rather than waiting; Lesson 1538 — Updating Beliefs with Sequential Data
Real-world example: Consider medical testing.; Lesson 100 — Common Conditional Probability Mistakes
Real-world examples: Lesson 805 — Left and Interval Censoring
Real-World Needs: If you're building an analytics dashboard that constantly needs customer names with their order totals, joining `customers` and `orders` thousands of times per hour might waste resources.; Lesson 1070 — When to Stop Normalizing
Realism matters more: Your domain knowledge doesn't fit standard conjugate families; Lesson 1556 — Choosing Between Conjugate and Non-Conjugate Priors
Reassess consent: before any new use case—even internal ones; Lesson 1915 — Secondary Use and Scope Creep
Rebalance quarterly: as business conditions evolve; Lesson 1759 — Optimizing ROAS, CAC, and Payback Together
Rebase: rewrites history by moving your branch's commits to start from a different point.; Lesson 2014 — Understanding Git Rebase vs Merge Lesson 2016 — Rebasing Feature Branches
Rebuild Fragmented Indexes: When fragmentation exceeds 30-40%, rebuild the index to reorganize data pages.; Lesson 1086 — Index Maintenance and Monitoring
Recalculate the test statistic: for this permuted dataset; Lesson 395 — Permutation Tests for Means and Beyond
Recalculates centers: based on the customers assigned to them; Lesson 1705 — K-Means Clustering for Segmentation
Recall: TP / (TP + FN) — of all real changes, how many did you catch?; Lesson 1418 — Evaluating Change-Point Detection Methods
Recency: How recently did they make a purchase?; Lesson 1703 — RFM Analysis: Recency, Frequency, Monetary Value
Reciprocal: (`1/Y`) can handle extreme heteroscedasticity but changes interpretation dramatically.; Lesson 591 — When and Why to Transform Variables
Recognizing boundaries of competence: means honestly assessing what you know versus what a problem requires, and making responsible decisions about whether to proceed alone, seek help, or decline the work entirely.; Lesson 34 — Recognizing Boundaries of Competence
Recommendation: Launch automated alerts for at-risk accounts; Lesson 1948 — The Recommendation Slide: Making It Actionable
Recommendations: Actionable next steps tied to findings; Lesson 1966 — Report Structure and Executive Summary
Recommendations backed by evidence: , not just observations; Lesson 2091 — Stage 7: Communication and Handoff
Recommended action: (specific and time-bound); Lesson 1966 — Report Structure and Executive Summary
Reconcile findings: Lesson 210 — Combining Visual and Statistical Methods
Record time-to-event: Days/months until failure (or censoring if still working at study end); Lesson 837 — Product Warranty and Failure Analysis
Recovery from encoding issues: Lesson 1141 — Recovering from Corrupted or Partially Broken Data
Recovery is risky: Fixing an error by rerunning might make things worse; Lesson 1847 — What is Idempotency?
Recursive Member: The self-referencing query that adds the next "layer" by joining back to what you've already found.; Lesson 996 — Recursive CTEs: Introduction
Recursive operations: CTEs support recursion; subqueries don't; Lesson 974 — When to Use FROM Subqueries vs CTEs
Recuse yourself: from projects where you can't be objective; Lesson 35 — Conflicts of Interest and Independence
Recycles: the connection back to the pool when you're done (via `close()` or context manager); Lesson 1092 — Connection Pooling Basics
Red flags: include:; Lesson 562 — Index Plots and Time-Ordered Residuals Lesson 584 — Correlation Matrices for Predictors
Red flags for non-stationarity: Lesson 715 — Visual Tests for Stationarity
Redshift: Offers both traditional nodes and newer "Spectrum" for separated storage; Lesson 1813 — Modern Cloud Data Warehouses: Snowflake, BigQuery, Redshift
Reduce: .; Lesson 1770 — The MapReduce Programming Model
Reduce multicollinearity: VIF values drop for remaining predictors; Lesson 585 — Remedies: Variable Selection
Reduce noise: – Random variations get averaged out; Lesson 750 — What is a Moving Average?
Reduce Redundancy: Instead of storing a customer's address in every order record, you store it once in a `customers` table and reference it using a foreign key.; Lesson 1061 — Introduction to Normalization
Reduce wasted effort: If the simple answer settles the question, you saved days of work; Lesson 2110 — The Minimum Viable Analysis (MVA)Lesson 2111 — Fast Feedback Loops with Stakeholders
Reduced data redundancy: Category names aren't repeated for every product; Lesson 1810 — Snowflake Schema and Normalization Trade-offs
Reduced Feature Adoption: When active users stop exploring new features or abandon key workflows they once used regularly, disengagement may be brewing.; Lesson 1700 — Leading Indicators of Disengagement
Reduced human error: No forgotten runs or copy-paste mistakes; Lesson 1986 — Automated Report Generation
Reduced LTV: Shorter customer lifespans mean less total revenue per customer; Lesson 1670 — What is Churn and Why It Matters
Reduced model: Uses only your baseline predictors (e.; Lesson 623 — Partial F-Tests for Nested Models Lesson 654 — Testing Interaction Significance
Reduced opportunity cost: of running inferior variants; Lesson 1515 — Trade-offs: Sample Size, Speed, and Complexity
Reduced peak memory: Never materialize the "wrong" version; Lesson 1802 — Filtering During Read with dtype and Converters
Reduced power: – Your ability to detect true effects decreases; Lesson 342 — Alpha Level Trade-offs
Reduced power per test: With the same overall sample size, each pairwise comparison has less data and thus less ability to detect real effects; Lesson 1528 — Testing Too Many Variants
Reduced sampling variability: Larger samples produce statistics (like means) that cluster more tightly around the true population value.; Lesson 340 — Power and Sample Size Relationship
Reduced statistical significance: Even though your overall model might fit well (good R-squared), individual predictors may appear non-significant; Lesson 580 — What is Multicollinearity?
Reduces clarity: by creating visual noise; Lesson 1963 — Removing Chartjunk
Reducing skewness: – Converting the stretched-out tail into a more symmetric bell shape; Lesson 212 — Log Transformations
Redundant labels: If the axis already shows values, don't repeat them on every bar; Lesson 1237 — Chart Junk and Data-Ink Ratio Lesson 1246 — Visual Clutter and Chartjunk Lesson 1963 — Removing Chartjunk
Redundant variables: highly correlated features that provide similar information; Lesson 1192 — Correlation Matrices and Heatmaps
reference category: or **baseline**.; Lesson 636 — The Reference Category Lesson 642 — What is a Reference Category?Lesson 644 — Choosing a Reference Category
Reference lines: Add `geom_hline()` or `geom_vline()` early so data appears over them, or late to emphasize thresholds; Lesson 1355 — Layer Order and Plot Composition Lesson 1962 — Contextualizing Numbers
referential integrity: they guarantee that relationships between tables remain valid.; Lesson 1051 — Introduction to Foreign Keys Lesson 1055 — What is Referential Integrity?Lesson 1150 — What is Data Validation?
Referral: Traffic from links on other websites (blogs, news articles, partner sites); Lesson 1712 — Common Channel Categories Lesson 1758 — Cohort-Based Payback Analysis
Referrals: (word-of-mouth, referral programs); Lesson 1711 — What Are Acquisition Channels?
Referrer Headers: are automatically sent by browsers, telling your server which website the user came from.; Lesson 1713 — Tracking Users by Channel
Reflects Customer Value: Lesson 1605 — Characteristics of Good North Star Metrics
Reframe, don't just refuse: Instead of "I can't do that," try:; Lesson 1931 — When to Push Back on Requests
Regression and Feature Importance: Lesson 1602 — Identifying Leading Indicators for Your Metrics
Regression models: treat LTV as a continuous outcome.; Lesson 1668 — Predictive LTV Models
Regression plots: Fit and display linear models; Lesson 1281 — Introduction to Seaborn's Statistical Plots
Regular: | 80 | 40 | 120 |; Lesson 423 — Contingency Tables and Expected Frequencies
Regular aggregate (collapses rows): Lesson 1014 — Introduction to Window Aggregation Functions
Regular audits: Schedule quarterly reviews to identify unused notebooks, deprecated feature columns, and abandoned model variants.; Lesson 2135 — Dead Experimental Code and Feature Sprawl
Regular Sync Points: Lesson 2046 — Best Practices for Environment Management in Teams
Regular, predictable patterns: that repeat at fixed intervals—daily, weekly, monthly, or yearly.; Lesson 705 — The Four Classical Components
Regularization: adds a penalty to the model that discourages large coefficient values, stabilizing estimates even when predictors overlap.; Lesson 586 — Remedies: Regularization Preview Lesson 1569 — Shrinkage and Regularization Effects
Regularly: (daily sales reports); Lesson 1831 — What is Job Scheduling?
Regulatory constraints: Are there legal requirements (HIPAA, GDPR) or industry standards that limit what you can analyze or recommend?; Lesson 1168 — Understanding Domain Context
Regulatory context: (HIPAA for healthcare, SOX for finance); Lesson 2145 — Transitioning Between Industries and Domains
Reject: the entire batch (strict pipelines); Lesson 1826 — Data Validation and Schema Enforcement
Reject all hypotheses: up to (but not including) that stopping point; Lesson 1504 — Holm-Bonferroni Method Lesson 1506 — Benjamini-Hochberg Procedure
Reject H₀: | Type I Error (α) | Correct (Power = 1-β) |; Lesson 338 — What is Statistical Power?
reject the null hypothesis: .; Lesson 327 — Decision Rules: Reject or Fail to Reject Lesson 427 — Interpreting Chi-Squared Test Results
Rejecting invalid inserts: You can't add a row with a foreign key value that doesn't exist in the parent table; Lesson 1055 — What is Referential Integrity?
rejection region: the specific zone in your test statistic's distribution where the evidence is strong enough to reject the null hypothesis.; Lesson 325 — The Rejection Region Lesson 336 — Visualizing Error Types with Sampling Distributions Lesson 345 — Directionality in Hypothesis Testing
Rejection region shrinks: – Fewer test statistics will fall in the "reject H₀" zone; Lesson 342 — Alpha Level Trade-offs
Related issues: (links to tickets or prior discussions); Lesson 2023 — Creating a Pull Request
Relational plots: Explore relationships between variables (scatter, line plots with confidence intervals); Lesson 1281 — Introduction to Seaborn's Statistical Plots
relationships: between tables.; Lesson 843 — Relational Database Concepts Lesson 1121 — Column Types, Constraints, and Relationships Lesson 1316 — Introduction to Network Graphs and Graph Theory Basics Lesson 2087 — Stage 3: Exploratory Data Analysis
Relevance: Was it collected recently enough for your problem?; Lesson 23 — Data Provenance and Metadata
Relevant Scales: Help audiences grasp magnitude.; Lesson 1939 — Context and Comparison: Making Numbers Meaningful
Reliability: is the probability a system survives beyond time *t*.; Lesson 188 — Weibull Distribution: Hazard Function and Reliability Lesson 1822 — What is a Data Pipeline?
Religion: Lesson 1888 — Protected Classes and Sensitive Attributes
Remainder: (or residual): Everything left over (like improvisations)—the noise and potential anomalies; Lesson 1406 — Decomposing Seasonality
Remove all conflict markers: (`<<<<<<<`, `=======`, `>>>>>>>`); Lesson 2011 — Resolving Merge Conflicts
Remove chart junk: Delete unnecessary gridlines (keep only what's needed for reading values), drop borders, eliminate 3D effects, and ditch decorative fills.; Lesson 1958 — Simplifying Visual Complexity
Remove zeros: (ties where difference = 0); Lesson 392 — Wilcoxon Signed-Rank Test
Removes between-subject variability: (some people naturally weigh more); Lesson 370 — Differences as the Unit of Analysis
Removing duplicates: Identifying and eliminating repeated entries that could skew your analysis.; Lesson 12 — Data Cleaning and Preparation
Removing outliers: Identifying unusual values that might be errors or genuinely extreme cases requiring special handling.; Lesson 12 — Data Cleaning and Preparation
Removing redundancy: Cleaning up result sets with unwanted duplicates; Lesson 873 — Understanding DISTINCT: Removing Duplicate Rows
Repeat: steps 2-3 thousands of times (e.; Lesson 395 — Permutation Tests for Means and Beyond Lesson 703 — Sequential Model Building Strategy Lesson 1492 — Rerandomization and Practical Implementation Lesson 1582 — Updating Beliefs with Test Data Lesson 1590 — The Metropolis-Hastings Algorithm Lesson 1591 — Gibbs Sampling for Multivariate Posteriors
Repeatable Read: Once you read a value, it stays the same in your transaction; Lesson 1116 — Transaction Isolation and Concurrency
Repeated Measures: Lesson 369 — When to Use a Paired t-Test
Repeats: until clusters stabilize; Lesson 1705 — K-Means Clustering for Segmentation
Replace metrics with outcomes: "5% improvement in precision" becomes "prevents 50 wasted sales calls per month"; Lesson 2105 — Translating Between Technical and Business Language
Report all preregistered analyses: , not just "successful" ones; Lesson 1929 — Avoiding Cherry-Picking Results
Report Effect Size: Lesson 447 — Conducting One-Way ANOVA in Practice
Reporting and analytics: Dashboards often aggregate data from many tables.; Lesson 1071 — When to Denormalize: Performance Trade-offs
Reports excel at explanation: , providing the context, methodology, and recommendations that dashboards can't accommodate.; Lesson 1980 — Hybrid Approaches and When to Use Both
Repository: The sealed, labeled package that's been officially sent and recorded; Lesson 1993 — The Three States: Working Directory, Staging, Repository
Representativeness: Does your dataset reflect the full population or just a subset?; Lesson 1169 — Clarifying Assumptions and Constraints
reproducibility: and **replicability** sound similar but mean different things—and both are essential for trustworthy science.; Lesson 26 — Reproducibility vs. Replicability Lesson 29 — Code and Environment Management Lesson 33 — Transparency and Explainability Lesson 1643 — Building Attribution Frameworks Lesson 1871 — Why Version Control for Data?Lesson 1990 — What is Version Control and Why Git?Lesson 2039 — Virtual Environments: Concept and Benefits Lesson 2047 — What is Dependency Management? (+1 more)
reproducible: when someone else (or future-you) can take the same raw data and the same code, run it again, and get *exactly* the same results, tables, figures, and conclusions.; Lesson 1981 — What Makes a Report Reproducible?Lesson 2036 — Code Review Practices for Data Science
Reproducible code: Clean GitHub repos with proper READMEs (as you've learned); Lesson 2141 — Building a Portfolio and Personal Brand
Request more budget: (often not feasible); Lesson 295 — Trade-offs: Precision, Confidence, and Cost
Required for self-joins: When joining a table to itself (covered later), aliases become essential.; Lesson 924 — Using Table Aliases in Joins
Required sample size: (what you're solving for); Lesson 388 — Effect Size in Sample Size Planning
Required transformations: "Log-transform `income` to reduce skewness"; Lesson 1212 — EDA Summary Documentation and Next Steps
Requirements: Python/R version, key dependencies or link to `requirements.; Lesson 2077 — The Purpose and Anatomy of a Good README
requirements.txt: file:; Lesson 2043 — Creating and Exporting Environment Specifications Lesson 2044 — Recreating Environments from Specifications
Requires Reviews: You can mandate that 1, 2, or more team members approve a pull request before it can merge.; Lesson 2027 — Protecting Branches and Required Reviews
Rerandomization: is a technique where you check covariate balance *before* starting your experiment, and if balance is poor, you rerandomize until you get acceptable balance.; Lesson 1492 — Rerandomization and Practical Implementation
Resample your data: with replacement many times (typically 1,000–10,000 times); Lesson 306 — Bootstrap for Non-Standard Problems
Research Goals: Lesson 243 — Choosing the Right Sampling Method
Research sharing: Publish datasets for reproducibility without exposing participants; Lesson 1901 — Synthetic Data Generation
Reset to that state: `git reset --hard <commit-hash>` restores your branch to that exact point; Lesson 2021 — Recovering from Rebase Mistakes
residual: (or error).; Lesson 515 — What Makes a 'Best Fit' Line?Lesson 539 — What Are Residuals?Lesson 542 — Computing Fitted Values and Residuals Lesson 711 — Visualizing Components with Decomposition Plots Lesson 742 — Components of Seasonal Decomposition
Residual (e ᵢ): = Yᵢ - Ŷᵢ (the difference you learned about earlier); Lesson 538 — What Are Fitted Values?
Residual (e): "Here's how much the *actual* value differs from that prediction"; Lesson 543 — Residuals as Unexplained Variation
Residual Autocorrelation: Lesson 782 — Residual Diagnostics for ARIMA
Residual component: (leftover random noise); Lesson 711 — Visualizing Components with Decomposition Plots Lesson 742 — Components of Seasonal Decomposition
Residual deviance: measures how poorly your *fitted* model (with all predictors) fits.; Lesson 698 — Null and Residual Deviance
Residual patterns: A high R-squared can coexist with systematic patterns in your residuals—violations of the core assumptions that make your predictions unreliable.; Lesson 537 — When R-Squared is Not Enough Lesson 1189 — Detecting Nonlinear Relationships
Residual plots: to check for patterns and violations; Lesson 537 — When R-Squared is Not Enough Lesson 657 — What Are Polynomial Features?
Residual Standard Error (RSE): comes in.; Lesson 536 — Residual Standard Error (RSE)Lesson 537 — When R-Squared is Not Enough
residuals: the differences between each observation and its group mean—follow a normal distribution.; Lesson 449 — Normality of Residuals Lesson 451 — Diagnostic Plots for ANOVA Lesson 516 — Residuals: The Distance from Prediction Lesson 543 — Residuals as Unexplained Variation Lesson 550 — Normality of Residuals Lesson 556 — What Are Residuals and Why Plot Them?Lesson 575 — Cook's Distance Lesson 593 — Box-Cox Transformation (+4 more)
Resilient: RDDs automatically recover from node failures.; Lesson 1777 — RDDs: Resilient Distributed Datasets Fundamentals
Resilient Distributed Dataset (RDD): a fault-tolerant collection partitioned across nodes.; Lesson 1774 — What is Apache Spark and Why Use It?
Resilient Distributed Datasets (RDDs): Lesson 1775 — Spark Components: Core, SQL, MLlib, Streaming
resistant to outliers: and extreme values.; Lesson 51 — Interquartile Range (IQR)Lesson 54 — When to Use Each Measure Lesson 1383 — Understanding the Interquartile Range (IQR)
Resolution (action): What specific decision should stakeholders make based on this evidence?; Lesson 1933 — The Power of Narrative in Data Communication
Resource allocation: High-LTV customers justify higher acquisition costs (CAC) and more personalized outreach.; Lesson 1669 — LTV Segmentation and Targeting Lesson 1711 — What Are Acquisition Channels?
Resource Constraints: Lesson 243 — Choosing the Right Sampling Method
Resource management: Prevents exhausting database connection limits; Lesson 1092 — Connection Pooling Basics
Resource Utilization: CPU, memory, disk I/O, and network usage during pipeline execution.; Lesson 1856 — Key Metrics to Monitor
Resource waste: Running tasks that depend on failed upstream tasks wastes compute resources and makes debugging harder.; Lesson 1840 — What is Dependency Management in Pipelines?
Resourced: Include rough estimates of time, cost, or personnel needed.; Lesson 1970 — Recommendations and Next Steps
Response expectations: "We typically respond within 48 hours"; Lesson 2083 — Contributing Guidelines and Contact Information
Response variable: Counts (0, 1, 2, 3, .; Lesson 690 — The Poisson Distribution as a GLM
Responsible Disclosure: Lesson 1925 — Mitigation Strategies and Responsible Disclosure Lesson 1931 — When to Push Back on Requests
Restore the signs: to each rank (positive or negative); Lesson 392 — Wilcoxon Signed-Rank Test
RESTRICT: (or NO ACTION) prevents the parent operation if children exist:; Lesson 1054 — Cascading Actions: DELETE and UPDATE Lesson 1057 — ON DELETE and ON UPDATE Actions
Result: All quotas filled, but the sample includes only shoppers willing to stop and talk; Lesson 240 — Quota Sampling Lesson 1566 — Conjugate Normal-Normal Model
Results/Output: What the project produces and where to find it; Lesson 2077 — The Purpose and Anatomy of a Good README
Retailer loyalty programs: data sold to data brokers who build detailed consumer profiles; Lesson 1922 — Surveillance and Secondary Data Uses
retention: strategies aim to prevent at-risk customers from leaving in the first place.; Lesson 1676 — Win-Back and Retention Strategies Lesson 1696 — Feature Adoption and Usage Frequency
Retention curves: plot the percentage of users who *remain active* over time (Day-1: 60%, Day-7: 40%, Day-30: 25%).; Lesson 1660 — Retention Curves vs Churn Analysis Lesson 1661 — What is Customer Lifetime Value (LTV)?Lesson 1678 — What is Funnel Analysis?
Retention insights: See if customers stick around longer over time; Lesson 1644 — What is Cohort Analysis?
Retention rates: 45% of Jan cohort returned in Week 3; Lesson 1647 — Building a Cohort Table
Retraining is constant: You must retrain models regularly to capture new patterns; Lesson 2128 — Data Distribution Shifts Frequently
Retrieving: data (asking questions); Lesson 844 — What is SQL?
Retry exhaustion: Slack channel with link to logs; Lesson 1851 — Error Logging and Notifications
retry logic: for transient errors, **idempotency** so rerunning doesn't corrupt data, **checkpointing** to resume mid-pipeline, and **monitoring/alerts** for quick detection.; Lesson 1825 — Designing Pipeline Architecture Lesson 1854 — Testing Error Handling
Reusability: When you'll reference the same result set multiple times; Lesson 974 — When to Use FROM Subqueries vs CTEs Lesson 1106 — Parameter Placeholders: Named Parameters
Reusable functions and modules: that multiple projects import; Lesson 2074 — Notebooks vs Scripts: When to Use Each
Reveal trends: – The underlying direction becomes clearer; Lesson 750 — What is a Moving Average?
Revealing sequences: How rankings shift over years; Lesson 1306 — Animation and Time-Based Transitions
Revenue: is the quintessential lagging indicator—it tells you what already happened.; Lesson 1600 — Business Examples: Revenue vs Pipeline
Revenue accuracy: Which model's channel weights best predict revenue when you shift budget?; Lesson 1734 — Comparing and Validating Attribution Models
Revenue churn: measures *how much MRR* you lost from cancellations.; Lesson 1628 — SaaS Metrics: MRR, ARR, and Logo Churn
Revenue forecasting: Estimate lifetime value by modeling expected subscription duration; Lesson 838 — Subscription and Membership Duration Modeling Lesson 1644 — What is Cohort Analysis?
Revenue generation: How much money comes in; Lesson 1516 — Business Metrics: Definition and Examples
Revenue per user: = total revenue / users (not just "made $50k!; Lesson 1613 — Raw Counts vs. Rates and Ratios
Revenue-focused: Lesson 1516 — Business Metrics: Definition and Examples
reverse: conditional probabilities—it lets you flip P(A|B) into P(B|A).; Lesson 107 — Bayes' Theorem Formula and Components Lesson 430 — Common Applications and Pitfalls
Reverse causality: occurs when two variables are correlated, but the direction of influence is the reverse of what you thought.; Lesson 496 — Reverse Causality Lesson 553 — Exogeneity: X Must Be Independent of Errors Lesson 1424 — Reverse Causality Lesson 1464 — Instrumental Variables: The Endogeneity Problem
Reverse causation: Maybe Y causes X, not X causes Y; Lesson 493 — The Fundamental Difference: Association vs Cause-and-Effect
Reverse geocoding: works the opposite direction: you have coordinates (42.; Lesson 1315 — Geocoding and Reverse Geocoding
Reversibility is high: Changes can be rolled back easily if problems emerge later; Lesson 1522 — Balancing Speed and Accuracy in Metric Selection
Reversing range logic: Lesson 868 — The NOT Operator
Reversing the Hypotheses: Lesson 313 — Common Pitfalls in Hypothesis Formulation
Review against WCAG checklist: document what passes and what needs fixing; Lesson 1254 — Testing Visualizations for Accessibility
Review checkpoint: Show results to stakeholders at sprint end; Lesson 2113 — Timeboxing and Sprint Planning for Data Projects
Review logs: Examine both application logs and database server logs for detailed error messages; Lesson 1093 — Troubleshooting Connection Issues
Review notebook-specific PRs carefully: Understand that diffs may still be noisy even with best practices.; Lesson 2030 — Version Control for Notebooks: Challenges and Solutions
Review promptly: Respect the author's time by reviewing within a day or two.; Lesson 2024 — Code Review Best Practices
Review recent changes: Check pipeline code commits, configuration changes, or dependency updates around when the issue started; Lesson 1870 — Root Cause Analysis for Quality Issues
Reweighting: Adjust training data by giving higher weight to underrepresented or historically disadvantaged groups.; Lesson 1894 — Auditing and Remediation Strategies
Rework: means repeating work because something was missed, misunderstood, or poorly executed the first time—rerunning analysis because you forgot to document your seed, rebuilding features because requirements weren't clarified, or re-validating a model b...; Lesson 2112 — Iteration vs Rework: Learning from Each Cycle
Rideshare apps: Drivers in treatment might reduce wait times for riders in control; Lesson 1527 — Ignoring Network Effects
Ridge regression: modifies least squares by adding a penalty proportional to the *squared* coefficient values.; Lesson 586 — Remedies: Regularization Preview
Right: H₀: The drug has no effect (μ = 0), H₁: The drug works (μ > 0); Lesson 313 — Common Pitfalls in Hypothesis Formulation
RIGHT JOIN: returns *every row from the right (second) table*, along with matching data from the left (first) table where available.; Lesson 929 — RIGHT JOIN Syntax and Semantics Lesson 936 — FULL OUTER JOIN Syntax
Right pane: Their changes (incoming branch); Lesson 2019 — Using Diff Tools for Conflict Resolution
Right tail: α/2 (e.; Lesson 346 — Two-Tailed Tests: Testing for Any Difference
Right to erasure: ("right to be forgotten"): People can request deletion of their data, impacting training datasets and model retraining; Lesson 1904 — What is GDPR and Why It Matters Lesson 1909 — Right to Erasure and Data Retention Policies Lesson 1911 — GDPR Compliance for Data Scientists
Right to explanation: Individuals can demand to understand automated decisions affecting them—black-box models become problematic; Lesson 1904 — What is GDPR and Why It Matters
Right to Withdraw: Lesson 1913 — Elements of Valid Consent
Right-continuous: It's continuous from the right side at jump points; Lesson 810 — The Survival Function S(t)
Right-only rows: Left-side columns are NULL; Lesson 937 — Identifying Matched vs Unmatched Rows
Right-skewed: The distribution has a long tail extending to the right (high values); Lesson 178 — Log-Normal Distribution: Definition and Properties
Right-skewed (positive skew): A long tail stretches to the right; most values cluster at the lower end (e.; Lesson 1175 — Histograms for Distribution Shape
Risk: With small samples, you might accidentally get imbalanced groups (e.; Lesson 1437 — Randomization Mechanisms
Risk assessment: Two investments with the same average return might have wildly different risks; Lesson 46 — What is Variability?
Risk evaluation: how likely?; Lesson 1910 — Data Protection Impact Assessments (DPIAs)
Risk identification: what could go wrong?; Lesson 1910 — Data Protection Impact Assessments (DPIAs)
Risk of gaming exists: Surrogates might improve while harming long-term value; Lesson 1522 — Balancing Speed and Accuracy in Metric Selection
Risk tolerance: High variance might be unacceptable even with better expected value; Lesson 152 — Decision Making Under Uncertainty
Risk-adjusted returns: balancing profitability with stability; Lesson 1716 — Channel Mix and Portfolio Thinking
River One: Statistics (1800s–1900s): Lesson 5 — The Evolution of Data Science
River Two: Computing (1950s–1990s): Lesson 5 — The Evolution of Data Science
ROAS < 1: You're losing money directly on ad spend (spending more than you earn); Lesson 1751 — Return on Ad Spend (ROAS): Definition and Calculation
ROAS = 1: Breaking even on ad spend (but likely unprofitable after other costs); Lesson 1751 — Return on Ad Spend (ROAS): Definition and Calculation
ROAS > 1: Generating positive return, but profitability depends on margins; Lesson 1751 — Return on Ad Spend (ROAS): Definition and Calculation
robust: to extreme values than standard deviation because it doesn't square deviations (which amplifies outliers).; Lesson 52 — Mean Absolute Deviation (MAD)Lesson 115 — Prior Sensitivity Analysis Lesson 363 — Testing Equality of Variances Lesson 380 — Testing Equal Variances: Levene's and Bartlett's Tests Lesson 450 — Homogeneity of Variance (Homoscedasticity)Lesson 1572 — Sensitivity Analysis and Prior Robustness
Robust regression: techniques offer an alternative: they fit models that automatically downweight or ignore outliers during estimation, so extreme points don't drag your fitted line off course.; Lesson 590 — Robust Regression Techniques
robustness: .; Lesson 397 — Power and Efficiency of Non-Parametric Tests Lesson 452 — Consequences of Assumption Violations Lesson 475 — Choosing Between Parametric and Non-Parametric Tests
Robustness Testing: ensures your model performs consistently.; Lesson 2089 — Stage 5: Model Development and Validation
ROI measurement: Understand true return on marketing investment; Lesson 1718 — Introduction to Marketing Attribution
Role-play each audience type: with a colleague; Lesson 1956 — Anticipating and Addressing Audience Questions
Rollback Mechanisms: Simulate a mid-pipeline failure during a database write or transformation.; Lesson 1854 — Testing Error Handling
Rolling a die: Lesson 78 — Events as Subsets of the Sample Space Lesson 82 — Collectively Exhaustive Events
Rolling statistics: Mean and variance shouldn't drift systematically; Lesson 741 — Testing Stationarity After Transformation
Rolling window: Train on a fixed-size window (e.; Lesson 789 — Overfitting and Cross-Validation for Time Series
Root cause analysis: becomes nearly impossible when you discover issues weeks later; Lesson 2136 — Monitoring Gaps and Silent Failures
Rotating 3D views: Spin a 3D plot to reveal all angles; Lesson 1327 — Creating Animations with FuncAnimation
Roughly constant variance: the noise level should be stable; Lesson 709 — Irregular Component: Random Noise
row: in the table; Lesson 1117 — What is an ORM and Why Use It?Lesson 1358 — facet_grid() for Two Variables
Row proportion: 50/60 = 0.; Lesson 98 — Conditional Probability with Tables
Row proportions: divide each cell by its row total.; Lesson 98 — Conditional Probability with Tables
Row Total: |; Lesson 423 — Contingency Tables and Expected Frequencies
Row-level aggregations: Comparing individual values to group statistics; Lesson 967 — Subqueries in the SELECT Clause
Row-level analytics: that require context from other rows without losing detail; Lesson 1005 — Introduction to Window Functions
Rows: Estimated vs actual row counts.; Lesson 1084 — Reading and Interpreting Query Execution Plans Lesson 1647 — Building a Cohort Table
Rows (Records): Each row represents a single instance or observation.; Lesson 843 — Relational Database Concepts
RSS: = Residual Sum of Squares (the sum of all squared residuals); Lesson 536 — Residual Standard Error (RSE)
Rule: Check that both np ≥ 10 *and* n(1-p) ≥ 10, where n is your sample size and p is your sample proportion.; Lesson 282 — Checking Assumptions for Proportion Intervals
Rule 1: Each drawer label describes one type of information (not "Age&Address").; Lesson 1143 — The Three Rules of Tidy Data Lesson 1402 — Western Electric Rules
Rule 2: Each folder holds one person's complete record (not scattered pieces).; Lesson 1143 — The Three Rules of Tidy Data Lesson 1402 — Western Electric Rules
Rule 3: Employee files and project files live in separate cabinets (not jumbled together).; Lesson 1143 — The Three Rules of Tidy Data Lesson 1402 — Western Electric Rules
Rule 4: Eight consecutive points on one side of the centerline (even if within 1σ); Lesson 1402 — Western Electric Rules
Rule of thumb: When sampling without replacement, your sample size should be less than 10% of the population to maintain approximate independence.; Lesson 282 — Checking Assumptions for Proportion Intervals Lesson 577 — DFBETAS: Influence on Individual Coefficients Lesson 1467 — Testing Instrument Strength and Validity Lesson 1481 — Unit of Randomization
Rule-of-thumb approaches: Use formulas based on sample size and variance; Lesson 1463 — RDD Bandwidth Selection and Local Estimation
Run optimization: using constrained optimization algorithms (like scipy's `minimize` with bounds); Lesson 1742 — Budget Optimization Using MMM
Run Robustness Checks: Lesson 579 — What to Do with Influential Points
Run statistical tests: Apply Shapiro-Wilk (for smaller samples) or Anderson-Darling (for general use).; Lesson 210 — Combining Visual and Statistical Methods
Run tests longer: Allow time for behaviors to stabilize (typically 2-4 weeks minimum for behavioral changes); Lesson 1525 — Novelty and Primacy Effects
Run the experiment: Increase marketing in test regions for a fixed period; Lesson 1746 — Geo-Lift Experiments
Running hypothesis tests: (t-tests, z-tests) that rely on normal theory; Lesson 202 — Why Test for Normality?
Running totals: or moving averages while preserving individual transactions; Lesson 1005 — Introduction to Window Functions
Runs: Lesson 1401 — Detecting Out-of-Control Signals
Runs in linear time: Under typical conditions, achieves O(n) complexity instead of O(n²)—a massive speedup for large datasets; Lesson 1416 — PELT Algorithm: Pruned Exact Linear Time
Runtime overhead: Lesson 1785 — Cost-Benefit Analysis: Spark Overhead vs Performance Gains
Russian nesting dolls: the innermost subquery runs first, its result becomes a table for the next level up, and so on.; Lesson 973 — Nested Subqueries in FROM

S

S Charts: Preferred for larger subgroups (n > 10) where range becomes less efficient at capturing true variability.; Lesson 1399 — Control Charts for Variability (R and S Charts)
S-shaped curve: Data is skewed (right skew = curve bends up on right; left skew = bends down on left); Lesson 204 — Q-Q Plots: Theory and Interpretation Lesson 565 — What Q-Q Plots Show: Comparing Residual Distribution to Normal Lesson 566 — Reading Q-Q Plots: Interpreting Points Along the Reference Line
S(∞) = 0: Eventually, everyone experiences the event (in theory); Lesson 810 — The Survival Function S(t)
S(0) = 1: Everyone starts "alive" or event-free; Lesson 810 — The Survival Function S(t)
SaaS: Trial signup or paid subscription; Lesson 1686 — Defining Conversions and Conversion Rate
SaaS Products: Lesson 1657 — Day-1, Day-7, Day-30 Benchmarks
SaaS Sign-up: Landing Page → Sign-up Form → Email Verification → Onboarding → First Use; Lesson 1678 — What is Funnel Analysis?
SaaS tools: 10-30% (depends on use case); Lesson 1694 — Daily Active Users (DAU) and Monthly Active Users (MAU)
Sales Analysis: Lesson 908 — Multi-Level Grouping in Business Analytics
Sales expenses: sales team salaries and commissions, sales software (CRM, outreach tools), travel and entertainment; Lesson 1753 — Customer Acquisition Cost (CAC): Components and Calculation
Sales pipeline metrics: , on the other hand, are leading indicators.; Lesson 1600 — Business Examples: Revenue vs Pipeline
Sales(t): is your outcome variable at time *t* (weekly sales, conversions, etc.; Lesson 1738 — The Core MMM Regression Model
same number of columns: with **compatible data types**.; Lesson 998 — Introduction to Set Operations Lesson 1001 — INTERSECT: Finding Common Rows
same variance: (even if their means differ).; Lesson 361 — Pooled Variance t-Test Lesson 379 — The Assumption of Equal Variances (Homoscedasticity)
Same-store sales (SSS): , also called "comparable store sales" or "comps," isolates growth from stores open at least 12-13 months, revealing organic performance by controlling for expansion.; Lesson 1634 — Retail Metrics: Same-Store Sales and Inventory Turnover
sample: a subset meant to represent the population.; Lesson 50 — Population vs Sample Variance Lesson 229 — Defining Samples and Statistics Lesson 230 — Why We Sample Instead of Census Lesson 232 — Notation Conventions Lesson 237 — Cluster Sampling Lesson 261 — Standard Error vs Standard Deviation
Sample distribution: is one snapshot from that album — maybe 100 randomly selected people.; Lesson 258 — Comparing Population, Sample, and Sampling Distributions
Sample from each stratum: Use simple random sampling *within* each stratum, maintaining the correct proportions; Lesson 236 — Stratified Sampling
Sample Mean (x̄): The expected value of the sample mean equals the population mean (μ).; Lesson 255 — Expected Value of Sample Statistics
Sample Proportion (p̂): The expected value equals the true population proportion (p).; Lesson 255 — Expected Value of Sample Statistics
Sample quantiles: (your actual residual values, sorted) on the y-axis; Lesson 565 — What Q-Q Plots Show: Comparing Residual Distribution to Normal
Sample size: (n): Larger samples → smaller SE; Lesson 260 — Defining Standard Error Lesson 294 — Margin of Error and Its Components Lesson 324 — Common Significance Levels: 0.05, 0.01, and 0.10 Lesson 389 — Reporting Effect Sizes in Practice Lesson 1549 — Prior-Likelihood Trade-offs Lesson 1692 — Statistical Significance and Iteration Lesson 1749 — Measuring Statistical Significance
Sample size (n): Larger samples → smaller standard error → smaller margin of error.; Lesson 271 — Margin of Error Lesson 335 — Calculating Type II Error Probability (Beta)Lesson 343 — Calculating Power for Common Tests Lesson 344 — Power Analysis in Study Design Lesson 1496 — The Four Parameters of Sample Size Calculation
Sample size calculation: Based on your Minimum Detectable Effect and power; Lesson 1485 — Documentation and Pre-Registration Lesson 1494 — Effect Size: The Minimum Detectable Effect Lesson 1508 — Pre-Registration and Correction Strategy
Sample size challenges: Intersectional groups may be small, making statistical analysis harder; Lesson 1893 — Intersectionality in Fairness
Sample size is large: More observations make the data speak louder than assumptions; Lesson 115 — Prior Sensitivity Analysis
Sample size is small: With little data, your starting belief dominates; Lesson 115 — Prior Sensitivity Analysis
Sample size limitations: "Based on 500 customers, we're confident in the direction but not precise magnitude"; Lesson 2122 — When Uncertainty Is Acceptable
Sample size matters: Typically, n ≥ 30 is considered sufficient for the CLT to "kick in," though it depends on how non- normal the original population is.; Lesson 218 — What the Central Limit Theorem States
Sample Size Per Group: Lesson 446 — Power and Sample Size for ANOVA
Sample sizes: (how much data supports this?; Lesson 1244 — Omitting Uncertainty and Variability
Sample variance: Divide by **N-1** (one less than your sample size); Lesson 50 — Population vs Sample Variance Lesson 255 — Expected Value of Sample Statistics
Sampling: Training a facial recognition model primarily on one demographic; Lesson 1878 — What is Bias in Data?Lesson 2055 — Why Randomness Matters in Data Science
Sampling bias: is a systematic error in how you collect your sample that pushes your results in one direction, away from the truth.; Lesson 248 — Sampling Error vs Sampling Bias Lesson 249 — Coverage Error and Undercoverage Lesson 1879 — Selection Bias and Sampling Bias
sampling distribution: is the probability distribution of a statistic (like the mean, median, or proportion) computed from *all possible samples* of a fixed size drawn from the same population.; Lesson 251 — What is a Sampling Distribution?Lesson 257 — Shape of Sampling Distributions Lesson 258 — Comparing Population, Sample, and Sampling Distributions
Sampling error: is the natural, random variation you get just because you didn't measure everyone.; Lesson 248 — Sampling Error vs Sampling Bias
Sampling new records: Generate fresh rows that follow the learned patterns but represent no actual person; Lesson 1901 — Synthetic Data Generation
Sampling zeros: People who *could* experience it but happened not to (e.; Lesson 695 — Zero-Inflated Models
Sargan: or **Hansen J-test**:; Lesson 1467 — Testing Instrument Strength and Validity
SARIMA: (Seasonal ARIMA) adds a second layer of similar components that operate specifically on the seasonal lags.; Lesson 795 — Seasonal ARIMA (SARIMA) Structure
Satellite imagery: Shows actual photographs from above; Lesson 1314 — Basemaps and Map Tiles
Saturated model: Perfect fit with one parameter per observation; Lesson 697 — Deviance: A Measure of Model Fit
Saturation: is the intensity or purity of the color, ranging from vivid/vibrant to dull/grayish.; Lesson 1234 — Color: Hue, Saturation, and Luminance
Saturation/luminance: (lighter to darker shades); Lesson 1238 — Matching Encoding to Data Type
Say: "For every additional hour of study time, we expect students' test scores to increase by about 2.; Lesson 530 — Communicating Results to Non-Technical Audiences Lesson 1955 — Framing Insights in Business Language
Scalability: Handles concurrent requests efficiently in multi-threaded or async applications; Lesson 1092 — Connection Pooling Basics Lesson 1816 — What is ELT? Extract, Load, Transform Explained Lesson 1822 — What is a Data Pipeline?
Scale parameter (λ): Stretches or compresses the distribution along the time axis; Lesson 187 — The Weibull Distribution: Shape, Scale, and Survival Lesson 189 — Fitting Weibull Models to Lifetime Data
Scale Transformations: Switch to logarithmic scales with `set_xscale('log')` when data spans multiple orders of magnitude (think: population sizes from villages to countries).; Lesson 1270 — Customizing Axes: Labels, Limits, and Scales
Scale-Location plot: solves this by plotting the *square root* of the *absolute value* of standardized residuals against fitted values.; Lesson 560 — Scale-Location Plot (Spread-Location Plot)
Scaled fonts: that remain legible; Lesson 1369 — Publication-Ready Plot Styling
Scatter plot matrices: (Lesson 1191) visually show near-perfect linear relationships; Lesson 1197 — Identifying Variable Importance and Redundancy
Scatter plots: remain your most powerful tool here.; Lesson 1189 — Detecting Nonlinear Relationships Lesson 1284 — Pair Plots for Multivariate Exploration
Schedule quarterly reviews: with stakeholders to assess whether the tree still represents reality and strategy.; Lesson 1626 — Maintaining and Evolving Metric Trees
Scheduler: Monitors DAGs and triggers tasks when dependencies are met; Lesson 1833 — Introduction to Apache Airflow
Scheduling: is like setting alarm clocks: "Run this job every day at 2 AM.; Lesson 1832 — Orchestration vs Scheduling
schema: is an organizational container that groups related tables together.; Lesson 846 — Tables, Schemas, and Data Types Lesson 1151 — Schema Validation
Schema assumptions: Your code expects a column named `user_id`, but upstream decides to rename it to `customer_id`.; Lesson 2133 — Undocumented Data Dependencies
Schema awareness: Spark knows your column names and data types; Lesson 1778 — DataFrames and Spark SQL Basics
Schema Changes: Tracking modifications to data structure (new columns, type changes, renamed fields).; Lesson 1856 — Key Metrics to Monitor Lesson 2136 — Monitoring Gaps and Silent Failures
Schema extraction: pulls structural information: column names, data types, primary keys, constraints.; Lesson 2067 — Automating Documentation with Code
Schema validation: checks structural requirements:; Lesson 1826 — Data Validation and Schema Enforcement
scikit-learn: for prediction-focused workflows and machine learning pipelines.; Lesson 545 — Extracting Residuals and Fitted Values in Python Lesson 2058 — Seed Scope and Multiple Libraries
Scope: "What route are we taking?; Lesson 2103 — Managing Expectations and Defining Success
Scoped: "Identify the top 3 pages where users abandon our checkout process, so we can redesign them to increase completed purchases by 10%"; Lesson 10 — Problem Definition and Scoping Lesson 1166 — Defining the Business Question
Scoping: means setting clear boundaries: What will you measure?; Lesson 10 — Problem Definition and Scoping
Scoping constraints: What data is available?; Lesson 2085 — Stage 1: Problem Definition and Scoping
Score Test: Lesson 830 — Testing Coefficient Significance
Screen reader testing: with tools like NVDA, JAWS, or VoiceOver reveals whether your alternative text and data tables are actually helpful; Lesson 1254 — Testing Visualizations for Accessibility
Scripts: are executable files that run a complete workflow — useful for automation and reproducibility.; Lesson 2071 — Modular Code: Functions and Scripts
SD(X) = √λ: (standard deviation); Lesson 141 — Mean and Variance of Poisson Distribution
SE: is the standard error of the mean; Lesson 269 — Confidence Interval Formula for One Mean Lesson 287 — Confidence Intervals for the Difference Between Two Proportions Lesson 353 — Calculating the t-Statistic Lesson 402 — Calculating the Test Statistic for Proportions Lesson 409 — Z-Test Statistic for Two Proportions
SE(p̂): is the standard error of the proportion: √(p̂(1-p̂)/n); Lesson 278 — Confidence Interval Formula for One Proportion
Seaborn: was built specifically to improve on Matplotlib's defaults.; Lesson 1371 — Default Aesthetics and Design Choices Lesson 1373 — Statistical Transformations: Built-in vs Manual
Seaborn FacetGrid: Similar benefits to ggplot2, with convenient statistical plotting functions built in; Lesson 1372 — Faceting: ggplot2 vs Seaborn and Matplotlib Subplots
Seaborn's FacetGrid: follows a similar declarative philosophy.; Lesson 1372 — Faceting: ggplot2 vs Seaborn and Matplotlib Subplots
Seamless visualization: Plotting libraries expect data in predictable formats.; Lesson 1149 — Benefits of Tidy Data for Downstream Work
Search and matching failures: "café" might not match "cafe" in pattern searches.; Lesson 1139 — Dealing with Special Characters and Unicode
Searched CASE: is more flexible—each WHEN clause can contain any boolean condition.; Lesson 1031 — Simple CASE vs Searched CASE
Seasonal: Fixed period (always 365 days for annual patterns); Lesson 708 — Cyclical Patterns: Non-Fixed Fluctuations Lesson 711 — Visualizing Components with Decomposition Plots Lesson 742 — Components of Seasonal Decomposition Lesson 744 — Classical Decomposition Methods Lesson 747 — Interpreting Decomposition Plots Lesson 767 — Holt-Winters Additive Model Lesson 795 — Seasonal ARIMA (SARIMA) Structure
Seasonal AR terms: appear as significant spikes in the PACF at seasonal lags that cut off, while the ACF shows a gradual decay at those seasonal intervals.; Lesson 796 — Identifying Seasonal Patterns
seasonal component: , you might misinterpret normal variation as something special (or vice versa).; Lesson 707 — Seasonality: Regular Periodic Patterns Lesson 711 — Visualizing Components with Decomposition Plots Lesson 742 — Components of Seasonal Decomposition
Seasonal decomposition: – separating the data into trend, seasonal, and residual components; Lesson 1405 — What is Seasonal Hybrid ESD?
Seasonal differencing: works the same way, but instead of subtracting adjacent points, you subtract observations that are *one full season apart*.; Lesson 737 — Seasonal Differencing Lesson 797 — Seasonal Differencing
Seasonal effects: If your business has monthly billing cycles, holiday shopping patterns, or fiscal calendar impacts, your test duration should span these periods.; Lesson 1484 — Duration and Timing Considerations
Seasonal equation: Updates the seasonal pattern for each period; Lesson 767 — Holt-Winters Additive Model Lesson 768 — Holt-Winters Multiplicative Model
Seasonal fluctuations remain constant: in absolute size regardless of the trend level; Lesson 743 — Additive vs Multiplicative Models
Seasonal Hybrid ESD: approach you've learned extends to multiple periods by iteratively or simultaneously accounting for each cycle.; Lesson 1408 — Handling Multiple Seasonal Periods
Seasonal lags: (12, 24, 36.; Lesson 796 — Identifying Seasonal Patterns
Seasonal MA terms: show up as significant spikes in the ACF at seasonal lags (12, 24, 36) while cutting off after a certain seasonal lag.; Lesson 796 — Identifying Seasonal Patterns
seasonal pattern: evolves over time.; Lesson 769 — Smoothing Parameters: Alpha, Beta, Gamma Lesson 771 — Forecasting with Holt-Winters
seasonal patterns: that need specialized modeling.; Lesson 726 — Using ACF for Model Identification Lesson 760 — Forecasting with Simple Exponential Smoothing
Seasonality: Do ice cream sales spike every summer?; Lesson 19 — Temporal Data and Time Series Lesson 708 — Cyclical Patterns: Non-Fixed Fluctuations Lesson 710 — Additive vs Multiplicative Models Lesson 711 — Visualizing Components with Decomposition Plots Lesson 765 — Introduction to Holt-Winters Method Lesson 1406 — Decomposing Seasonality Lesson 1412 — What is Change-Point Detection?Lesson 1694 — Daily Active Users (DAU) and Monthly Active Users (MAU) (+1 more)
Seasonally adjusted data: is your original time series with the seasonal component removed, leaving you with just the trend and irregular components.; Lesson 748 — Seasonally Adjusted Data
Second batch arrives: Use Beta(12, 17) as your new prior → observe 5 successes, 8 failures → get Beta(17, 25) posterior; Lesson 1563 — Sequential Updating with New Data
Second difference: Control group's change = (After - Before); Lesson 1452 — The Difference-in-Differences Setup
Second difference (DiD): Subtract the control group's change from the treatment group's change:; Lesson 1454 — Calculating the DiD Estimator
Second evidence (witness testimony): Use that 60% as your new prior → apply Bayes' Theorem again → posterior becomes 85%.; Lesson 114 — Sequential Updating
Second join: (INNER): Only rows where payment exists stay in the result; Lesson 952 — Mixing Join Types
Second layer: Three supporting pillars—"Customer surveys show strong demand," "A/B test validated the prediction," "Risk analysis shows minimal downside.; Lesson 1952 — The Pyramid Principle: Leading with Conclusions
Second-order differencing: means you difference the already-differenced data:; Lesson 736 — Higher-Order Differencing
Secondary metrics: protect you from winning the battle but losing the war.; Lesson 1478 — Defining Success Metrics Lesson 1485 — Documentation and Pre-Registration
Secondary use: occurs when data collected for one specific purpose gets repurposed for something else—often without obtaining fresh consent from the individuals involved.; Lesson 1915 — Secondary Use and Scope Creep
Secure auctions: where bids remain secret until the winner is determined; Lesson 1903 — Secure Multi-Party Computation
Security updates: Credentials refresh, access control adjustments; Lesson 1979 — Maintenance and Sustainability Considerations
See dynamic effects: Does the policy effect grow or fade over time?; Lesson 1457 — Multiple Time Periods and Staggered Adoption
Seek peer review: from colleagues with no stake in the outcome; Lesson 35 — Conflicts of Interest and Independence
Segment by path type: – compare conversion rates across different journey patterns; Lesson 1683 — Multi-Path and Non-Linear Funnels
Segment by user tenure: Compare new users (no primacy effect) separately from existing users; Lesson 1525 — Novelty and Primacy Effects
Segment by user type: Power users, casual users, and at-risk users have different engagement profiles; Lesson 1693 — Defining User Engagement
Segment differences: Compare curves using the log-rank test to see which groups need different retention strategies; Lesson 835 — Customer Churn Prediction with Survival Analysis
Segment insights: Do paid users stick around longer than free users?; Lesson 1659 — Comparing Retention Across Cohorts
Segmented analysis: By customer, product line, or geography; Lesson 1984 — Parameterized Reports
SELECT: Applies aggregate functions to each group and projects columns; Lesson 896 — GROUP BY Execution Order Lesson 909 — Combining Multiple Groups with SELECT Lesson 912 — Fundamental Difference: Filter Timing
SELECT columns: Choose which columns to display from either or both tables; Lesson 919 — Basic INNER JOIN Syntax
Select every kth element: Starting from position 3, select every 10th element: the 3rd, 13th, 23rd, 33rd.; Lesson 235 — Systematic Sampling
Select the numeric variable: you want to summarize; Lesson 1185 — Grouped Summary Statistics
Select the parameters: that minimize the chosen error metric; Lesson 772 — Holt-Winters Parameter Optimization
Select What Matters: Lesson 1217 — The Transition from Explore to Explain
Selectboxes: provide dropdown menus for choosing from predefined options:; Lesson 1332 — Streamlit Widgets: Inputs and Controls
Selecting features: that matter most: removing redundant or irrelevant variables that add noise without signal, reducing dimensionality while preserving information.; Lesson 2088 — Stage 4: Feature Engineering and Preparation
selection bias: and **nonresponse bias** you've already learned—survivorship bias is a specific type where the "non-survivors" physically can't be in your dataset.; Lesson 247 — Survivorship Bias Lesson 1432 — Colliders and Bad Controls Lesson 1473 — Conditioning on Colliders: Selection Bias Lesson 1526 — Selection Bias in Opt-In Tests Lesson 1879 — Selection Bias and Sampling Bias Lesson 1938 — Using Metaphors and Analogies
Selective reporting: Hiding inconvenient findings or uncertainty; Lesson 1926 — The Honest Broker Role
Selectivity: is how well a query condition narrows down the result set.; Lesson 1083 — Index Selectivity and Cardinality
Self-contained logic: Keep related calculations within a single query instead of multiple separate queries; Lesson 959 — Introduction to Subqueries in WHERE
Self-selection: When people choose whether to participate.; Lesson 244 — Selection Bias and Its Causes Lesson 1444 — Selection Bias and Treatment Assignment
Seller utilization: % of available supply actually transacted; Lesson 1630 — Marketplace Metrics: GMV, Take Rate, and Liquidity
Senior Data Scientist: Own complex projects end-to-end, mentor juniors informally; Lesson 2140 — Individual Contributor vs Management Tracks
Senior Manager/Director: Manage multiple teams or managers, set team strategy, align with business; Lesson 2140 — Individual Contributor vs Management Tracks
sensitive: `WHERE name = 'John'` only matches "John"; Lesson 862 — Case Sensitivity in Text Filtering Lesson 1478 — Defining Success Metrics
Sensitive attributes: leak through proxy variables—attributes correlated with protected classes.; Lesson 1888 — Protected Classes and Sensitive Attributes
Sensitive to all values: Every number in your dataset affects the mean—change one value, and the mean changes; Lesson 39 — The Mean (Arithmetic Average)
Sensitivity: probability the test is positive *given* you have the disease (true positive rate); Lesson 109 — Medical Diagnostic Testing Lesson 216 — Reciprocal and Inverse Transformations Lesson 1534 — The Prior Distribution Lesson 1899 — Adding Noise for Privacy
Sensitivity analyses: showing how results change under different assumptions; Lesson 1949 — Anticipating Questions: Building in Appendices
Sensitivity analysis: is the practice of deliberately varying your prior choices and observing how the posterior distribution responds.; Lesson 1572 — Sensitivity Analysis and Prior Robustness
Sensors: are specialized operators that continuously check for specific conditions—like whether another pipeline has completed or if a particular file exists in storage.; Lesson 1845 — Cross-Pipeline Dependencies
Sensors and IoT devices: Real-time measurements from physical equipment; Lesson 11 — Data Collection and Acquisition
Separate must-fix from suggestions: Use tags like "critical" vs "nit" or "optional.; Lesson 2024 — Code Review Best Practices
Separate signal from noise: by isolating the long-term pattern from short-term variability; Lesson 706 — Trend: Long-Term Direction
Separate when: Lesson 1147 — Separating and Uniting Columns
Separating columns: means splitting one column containing compound data (like "Smith, John" or "2024-01-15 14:30:00") into multiple columns ("LastName", "FirstName" or "Date", "Time").; Lesson 1147 — Separating and Uniting Columns
separation of concerns: the statistical calculation is independent of how you choose to visualize it.; Lesson 1352 — Statistical Transformations with stat_* Layers Lesson 2069 — Project Directory Structure
Sequence validation: Check if values follow expected patterns; Lesson 1024 — LAG Function: Accessing Previous Row Values
Sequential: No gaps in numbering (always 1, 2, 3.; Lesson 1007 — ROW_NUMBER(): Assigning Unique Row Numbers
Sequential analysis: Analyze patterns across adjacent time periods; Lesson 1023 — Introduction to Window Functions: LAG and LEAD
Sequential chains: `task_a >> task_b >> task_c`; Lesson 1843 — Declaring Dependencies in Orchestration Tools
Sequential decomposition: Remove the strongest seasonal component first, then detect weaker ones in the residuals; Lesson 1408 — Handling Multiple Seasonal Periods
Sequential events: Comparing different timestamps or events within a single `events` table; Lesson 945 — Introduction to Self-Joins
Sequential ordering: Time flows in one direction; past observations may predict future ones, but not vice versa; Lesson 704 — What Makes Time Series Data Different?
Sequential testing: (also called *sequential analysis* or *continuous monitoring*) provides statistical methods that account for continuous or repeated looks at accumulating data.; Lesson 1510 — Sequential Testing Overview
Sequential updating: means applying Bayes' Theorem iteratively: your **posterior probability after one update becomes the prior probability for the next update**.; Lesson 114 — Sequential Updating Lesson 116 — From Bayes' Theorem to Bayesian Inference Lesson 1555 — Advantages and Limitations of Conjugate Priors Lesson 1570 — Comparing Two Means: Bayesian Approach Lesson 1586 — Multi-Armed Bandit Connections
sequentially: each new lag builds on previous calculations.; Lesson 729 — Calculating Partial Autocorrelations Lesson 1037 — CASE Best Practices and Performance Lesson 1531 — Interference from Concurrent Tests
Serializable: Transactions run as if they're completely alone (safest but slowest); Lesson 1116 — Transaction Isolation and Concurrency
Server metrics monitoring: CPU, memory, or network traffic that follows daily business cycles; Lesson 1411 — Applications and Limitations
Service Level Agreement (SLA): is a formal promise made to stakeholders or customers about minimum service levels, often with consequences if broken.; Lesson 1860 — SLA and SLO Definitions
Service Level Objective (SLO): is a specific, measurable target for a service's performance—think of it as your internal goal.; Lesson 1860 — SLA and SLO Definitions
Session: Each visit to your site gets randomized independently; Lesson 1481 — Unit of Randomization
Session data: timestamps, referral sources, pages visited; Lesson 1719 — The Customer Journey and Touchpoints
Session Depth: counts the number of actions or page views within a session.; Lesson 1695 — Session-Based Engagement Metrics
session duration: ?; Lesson 1624 — Counter-Metrics and Guardrails Lesson 1695 — Session-Based Engagement Metrics
Session Frequency: measures how often a user starts new sessions over a given period (e.; Lesson 1695 — Session-Based Engagement Metrics
Session Recency: measures the time since a user's last session.; Lesson 1695 — Session-Based Engagement Metrics
Sessions: are your workspace for database operations.; Lesson 1122 — Creating Tables and Session Management
Set alpha accordingly: Lower if Type I is costly; higher if Type II is costly.; Lesson 334 — Setting Alpha: Choosing Your Significance Level
Set constraints: based on cash position (max payback acceptable); Lesson 1759 — Optimizing ROAS, CAC, and Payback Together
SET DEFAULT: Similar to SET NULL, but sets the foreign key to a predefined default value instead.; Lesson 1057 — ON DELETE and ON UPDATE Actions
SET NULL: clears the foreign key in child records:; Lesson 1054 — Cascading Actions: DELETE and UPDATE Lesson 1057 — ON DELETE and ON UPDATE Actions
Set priors: for each group's mean (often using the Normal-Inverse-Gamma or Normal-Normal models you've learned); Lesson 1570 — Comparing Two Means: Bayesian Approach
Set random seeds: Make randomness predictable; Lesson 30 — The Reproducibility Crisis and Solutions
Set thresholds in advance: document your acceptance criteria before seeing data; Lesson 1492 — Rerandomization and Practical Implementation
Set time windows carefully: – decide if a 30-day journey with loops still counts as a single funnel attempt; Lesson 1683 — Multi-Path and Non-Linear Funnels
Set up hypotheses: H₀: The probability of switching in either direction is equal; Lesson 436 — Conducting McNemar's Test
Set your objective: maximize total conversions, revenue, or profit; Lesson 1742 — Budget Optimization Using MMM
Setup (context): What problem motivated this analysis?; Lesson 1933 — The Power of Narrative in Data Communication
Setup cells: Import libraries and load data (code + output); Lesson 1982 — Literate Programming with Notebooks
Setup overhead: Lesson 1785 — Cost-Benefit Analysis: Spark Overhead vs Performance Gains
shape: of continuous data: where values cluster, how spread out they are, and whether the distribution is symmetric or lopsided.; Lesson 1175 — Histograms for Distribution Shape Lesson 1183 — Scatter Plots for Two Numeric Variables Lesson 1208 — Distribution Checks for All Variables Lesson 1220 — Histograms for Continuous Distributions Lesson 1238 — Matching Encoding to Data Type Lesson 1341 — Data and Aesthetic Mappings
Shape + Color: In scatter plots, use different point shapes (circles, triangles, squares) in addition to different colors for categories; Lesson 1251 — Avoiding Reliance on Color Alone
Shape information: (is it symmetric?; Lesson 300 — Bootstrap Distribution of a Statistic
Shape parameter: α_new = α_prior + Σx (add all observed counts); Lesson 1552 — Gamma-Poisson Conjugacy
Shape parameter (k): Controls how the failure rate changes over time; Lesson 187 — The Weibull Distribution: Shape, Scale, and Survival Lesson 189 — Fitting Weibull Models to Lifetime Data
Shape parameter (α, "alpha"): Controls the shape of the curve.; Lesson 181 — Gamma Distribution: Shape and Rate Parameters
Shapefiles: (`.; Lesson 1311 — Working with Shapefiles and GeoJSON
Shapiro-Wilk: .; Lesson 290 — Assumptions and Diagnostics for Difference Intervals Lesson 1208 — Distribution Checks for All Variables
Shapiro-Wilk test: Tests the null hypothesis that residuals are normally distributed; Lesson 449 — Normality of Residuals Lesson 570 — Q-Q Plots vs Formal Normality Tests: When Visual Checks Matter
Share of Voice: tracks your brand's mentions versus competitors—critical for measuring platform or brand dominance within a category.; Lesson 1631 — Social Media Metrics: DAU/MAU and Content Engagement
Share your environment: Specify exact software versions; Lesson 30 — The Reproducibility Crisis and Solutions
Shared axes: Zooming on the x-axis affects all subplots; Lesson 1304 — Subplots and Linked Interactions
Shared credit models: treat the outcome as jointly owned, rewarding collaboration.; Lesson 1640 — Attribution in Multi-Team Environments
Sharp: A scholarship given to *all* students scoring ≥70 on an entrance exam.; Lesson 1461 — Sharp vs Fuzzy RDD
sharp cutoff: .; Lesson 730 — Interpreting PACF Plots Lesson 732 — PACF Patterns for Common Models
Shift+Tab: (to move backward), **Enter** or **Space** (to activate), and **arrow keys** (for fine control).; Lesson 1253 — Interactive Accessibility: Keyboard Navigation
Ship something: rather than nothing; Lesson 2121 — Timeboxing and Deadlines
Shopping patterns: → protected class membership; Lesson 1889 — Proxy Variables and Redlining
Short queries: When the subquery is just a few lines; Lesson 974 — When to Use FROM Subqueries vs CTEs
Short-term initiatives: (this quarter); Lesson 1970 — Recommendations and Next Steps
Shortened Session Duration: Sessions getting briefer over time suggest decreasing value extraction—users aren't finding what they need or losing interest.; Lesson 1700 — Leading Indicators of Disengagement
Should: we keep working on this?; Lesson 2118 — Cost-Benefit Analysis for Continued Work
Show sensitivity analyses: that reveal how fragile findings are; Lesson 1929 — Avoiding Cherry-Picking Results
Show, don't tell: Give viewers the chart without explaining it.; Lesson 1964 — Testing Visualizations with Audiences
Showing temporal change: Population growth, stock prices, disease spread; Lesson 1306 — Animation and Time-Based Transitions
Showing uncertainty: through confidence intervals; Lesson 1288 — Point Plots for Trend Visualization
Shrinkage: means your posterior estimate gets "pulled" away from extreme sample values toward your prior belief.; Lesson 1569 — Shrinkage and Regularization Effects
sign test: offers a simple, robust alternative.; Lesson 391 — The Sign Test for Medians Lesson 392 — Wilcoxon Signed-Rank Test
Signal strength: How informative is each observation?; Lesson 1549 — Prior-Likelihood Trade-offs
Significance bounds: (also called confidence intervals) help you answer this question.; Lesson 723 — Significance Bounds in ACF Plots
Significance indicators: often asterisks or yes/no flags; Lesson 462 — Interpreting and Reporting Post-Hoc Results
significance level: , denoted by the Greek letter **α** (alpha), is a predetermined probability threshold you set *before* conducting a hypothesis test.; Lesson 323 — What is a Significance Level (α)?Lesson 388 — Effect Size in Sample Size Planning
Significance level (α): , typically 0.; Lesson 296 — Sample Size for Comparing Two Groups Lesson 328 — The Relationship Between α and Confidence Level Lesson 335 — Calculating Type II Error Probability (Beta)Lesson 343 — Calculating Power for Common Tests Lesson 405 — Sample Size and Power for Proportion Tests Lesson 446 — Power and Sample Size for ANOVA Lesson 1496 — The Four Parameters of Sample Size Calculation
Signs of productive iteration: Lesson 2112 — Iteration vs Rework: Learning from Each Cycle
Signs of wasteful rework: Lesson 2112 — Iteration vs Rework: Learning from Each Cycle
Signup date: When a user creates an account (classic acquisition cohort); Lesson 1646 — Defining Cohort Start Events
Silent failures: If Task A fails but Task B runs anyway (because it doesn't know to wait), you'll process incomplete or corrupted data without realizing it.; Lesson 1840 — What is Dependency Management in Pipelines?
silhouette scores: to quantify segment quality at each cut point.; Lesson 1706 — Hierarchical Clustering for Segmentation Lesson 1708 — Choosing the Number of Segments
Similarity: Objects sharing visual properties (color, shape, size) are seen as belonging together.; Lesson 1236 — Gestalt Principles in Visualization
Simple area chart: When you want to emphasize cumulative growth or magnitude over time; Lesson 1227 — Area Charts and Stacked Area Charts
Simple CASE: works like a switch statement in programming.; Lesson 1031 — Simple CASE vs Searched CASE
Simple composition: If you run 10 queries each with ε=0.; Lesson 1900 — Privacy Budget and Composition
Simple Exponential Smoothing: for level-only data and **Double Exponential Smoothing (Holt's Method)** for data with trend.; Lesson 765 — Introduction to Holt-Winters Method
simple linear regression: (one predictor).; Lesson 534 — R-Squared vs Correlation Squared Lesson 595 — From Simple to Multiple Linear Regression Lesson 622 — Relationship Between F-Test and t-Tests
Simple Moving Average (SMA): smooths out short-term fluctuations in your time series by averaging the most recent *n* data points.; Lesson 751 — Simple Moving Average (SMA)
Simple random sampling: keeps it straightforward.; Lesson 243 — Choosing the Right Sampling Method
Simple ratio check: Calculate the ratio of residual deviance to degrees of freedom from your fitted Poisson model.; Lesson 693 — Overdispersion in Count Data
Simple tables: → CSV; Lesson 22 — File Formats: CSV, JSON, and Beyond
Simple, one-time transformations: When you need a quick intermediate step and won't reference it again; Lesson 974 — When to Use FROM Subqueries vs CTEs
Simplify and Focus: Lesson 1217 — The Transition from Explore to Explain
Simplify communication: "Our average customer is 34 years old" is clearer than showing a spreadsheet of 10,000 ages; Lesson 38 — What is Central Tendency?
Simpson's Paradox: Lesson 430 — Common Applications and Pitfalls Lesson 1194 — Simpson's Paradox and Confounding Lesson 1893 — Intersectionality in Fairness
Simulate color blindness: on your chart—can distinctions still be seen?; Lesson 1254 — Testing Visualizations for Accessibility
Simulation tools: let you preview how your visualizations appear under different accessibility conditions:; Lesson 1254 — Testing Visualizations for Accessibility
Simulation visualization: Animate particle movements or algorithm steps; Lesson 1327 — Creating Animations with FuncAnimation
Simultaneity: X and Y determine each other simultaneously; Lesson 553 — Exogeneity: X Must Be Independent of Errors
Simultaneous decomposition: Use methods like STL (Seasonal-Trend decomposition using Loess) with multiple seasonal periods specified; Lesson 1408 — Handling Multiple Seasonal Periods
Single auto-incrementing integer: `customer_id` (1, 2, 3, .; Lesson 1048 — What Are Primary Keys?
Single column: "What's the average salary per department?; Lesson 905 — Grouping by Multiple Columns: Basics
Single outlier: Designed to detect one outlier at a time; Lesson 1389 — What is Grubbs' Test?
Single peak: One clear mode, not multiple humps; Lesson 377 — Testing Normality: Visual Methods
Single samples vary: Your one sample mean might be 170 cm, but someone else's might be 168 cm.; Lesson 251 — What is a Sampling Distribution?
Single source of truth: One person ensures consistent calculation and definition; Lesson 1619 — What is Metric Ownership?
Single trial: One Bernoulli trial = one observation; Lesson 123 — Bernoulli Trial Definition and Properties
Sinks: Lesson 1823 — Pipeline Components: Sources, Transformations, Sinks
size: of differences while remaining non-parametric (no normality assumption required).; Lesson 392 — Wilcoxon Signed-Rank Test Lesson 1229 — Bubble Charts for Three Variables Lesson 1235 — Pre-Attentive Attributes Lesson 1238 — Matching Encoding to Data Type Lesson 1310 — Point Maps and Scatter Plots on Maps Lesson 1341 — Data and Aesthetic Mappings
Size of the gap: between groups over time; Lesson 817 — Comparing Multiple Survival Curves
Size perception: Humans judge area imperfectly, so don't encode critical comparisons in bubble size alone; Lesson 1229 — Bubble Charts for Three Variables
Size variation: can represent a third numeric variable—larger bubbles for higher values create a "bubble chart" effect.; Lesson 1265 — Scatter Plots: Relationships Between Variables
Skeptical stakeholders: Meet them where they are.; Lesson 1953 — Adjusting Statistical Depth by Audience
skewed: bootstrap distributions or have systematic bias.; Lesson 304 — BCa Bootstrap Intervals: Bias Correction Lesson 503 — Confidence Intervals for Correlation Coefficients Lesson 568 — Skewness in Q-Q Plots: Left and Right Deviations
Skewed distributions: (lopsided): The mean gets "pulled" toward extreme values.; Lesson 42 — Comparing Mean, Median, and Mode Lesson 221 — CLT for Different Population Distributions
Skewness: Does one tail stretch longer?; Lesson 63 — Understanding Distribution Shape Lesson 208 — Jarque-Bera Test
Skewness direction: (long tail left or right); Lesson 1286 — Violin Plots and Distribution Shape
Skip tasks conditionally: using trigger rules; Lesson 1836 — Task Dependencies and Flow Control
Skip the jargon: No one outside your team needs to hear "coefficient" or "residuals"; Lesson 530 — Communicating Results to Non-Technical Audiences
SLA misses: Dashboard alert for stakeholders; Lesson 1851 — Error Logging and Notifications
Slack: Messages sent by teams (value = collaboration enabled); Lesson 1604 — What is a North Star Metric?Lesson 1606 — Examples of North Star Metrics by Industry
Sleep Quality: → **Alertness** (poor sleep reduces alertness); Lesson 1469 — Building a Simple Causal DAG
Sliders: let users select numeric values within a range—perfect for filtering years, adjusting thresholds, or setting parameters:; Lesson 1332 — Streamlit Widgets: Inputs and Controls
Slope (β₁): +0.; Lesson 529 — Practical vs Statistical Significance
Slow down dramatically: processing scales with the product, not the sum; Lesson 943 — CROSS JOIN Results: Size and Structure
Slow onboarding: It's unclear which files or features represent the "real" solution; Lesson 2135 — Dead Experimental Code and Feature Sprawl
Slow sorting: operations (especially with `ORDER BY`); Lesson 911 — Performance Considerations with Multiple Groups
Slow-moving funnel: Users take days between steps (friction, confusion, or decision paralysis); Lesson 1681 — Time-Based Funnel Analysis
Slower payback: = need more capital or slower scaling; Lesson 1757 — Payback Period: Definition and Importance
Slowly decaying ACF: Bars decrease gradually → suggests a trend or non-stationarity; Lesson 722 — ACF Plots and Interpretation
small: (< 100 rows typically); Lesson 943 — CROSS JOIN Results: Size and Structure Lesson 1356 — What Are Facets and Small Multiples?Lesson 2034 — Committing Data Artifacts and Model Outputs
Small drop: Your predictors may not be useful—the intercept-only model was nearly as good.; Lesson 698 — Null and Residual Deviance
Small effect: d ≈ 0.; Lesson 385 — Cohen's d for Standardized Mean Differences Lesson 386 — Effect Size Interpretation Guidelines Lesson 429 — Effect Size: Cramér's V and Phi
Small Expected Frequencies: Lesson 430 — Common Applications and Pitfalls
Small multiples: Show different "slices" of your data in separate 2D panels; Lesson 1329 — Effective Use and Pitfalls of 3D Visualizations
small p-value: (typically < 0.; Lesson 380 — Testing Equal Variances: Levene's and Bartlett's Tests Lesson 606 — Statistical Significance of Individual Coefficients Lesson 717 — KPSS Test
Small p-value (e.g., 0.01): Your observed data would be very rare if H₀ were true.; Lesson 318 — What is a P-Value?
Small sample sizes: (n < 30): Your confidence intervals and p-values rely heavily on the normality assumption; Lesson 550 — Normality of Residuals
small samples: (n < 50): Tests may fail to detect real non-normality (low power—you might miss problems).; Lesson 209 — Sample Size Considerations in Normality Tests Lesson 265 — Using Standard Error in Practice Lesson 398 — Choosing Between Parametric and Non-Parametric Tests Lesson 554 — Consequences of Violating Assumptions Lesson 1379 — Assumptions and Limitations
Small tables: The table has few columns and you genuinely need all of them; Lesson 851 — Selecting All Columns with Asterisk
Small λ: (e.; Lesson 1403 — CUSUM and EWMA Charts
Smaller storage footprint: Less duplication means less disk space; Lesson 1810 — Snowflake Schema and Normalization Trade-offs
Smaller ε: = stronger privacy (more noise); Lesson 1898 — Differential Privacy Fundamentals
Smart home devices: recording conversations used for product development (and sometimes reviewed by humans); Lesson 1922 — Surveillance and Secondary Data Uses
Smooth lines: use `stat_smooth()` to fit regression or loess curves; Lesson 1343 — Statistical Transformations
Smooth seasonal patterns: – Short-term irregularities fade away; Lesson 750 — What is a Moving Average?
Smooth trends: `stat_smooth()` fits regression lines or curves; Lesson 1352 — Statistical Transformations with stat_* Layers
Snapshots: rather than patches (you can't "diff" binary files meaningfully); Lesson 1871 — Why Version Control for Data?Lesson 2044 — Recreating Environments from Specifications
Snowflake: Pure separation; pause compute clusters without affecting data; Lesson 1813 — Modern Cloud Data Warehouses: Snowflake, BigQuery, Redshift
Social: Unpaid clicks from social media platforms (Facebook, Twitter, LinkedIn, Instagram); Lesson 1712 — Common Channel Categories
Social media: 50-60% (daily habit); Lesson 1694 — Daily Active Users (DAU) and Monthly Active Users (MAU)Lesson 1711 — What Are Acquisition Channels?
Social media followers: (without reach or influence); Lesson 1612 — What Are Vanity Metrics?
Social media likes: without measuring conversion or brand lift; Lesson 1616 — Metrics Divorced from Revenue
Social sciences: R² = 0.; Lesson 533 — Interpreting R-Squared Values
Software-defined assets: approach treats data assets as first-class citizens.; Lesson 1839 — Alternative Orchestration Tools
Solution: Filter out NULLs in the subquery:; Lesson 962 — NOT IN with Subqueries Lesson 1068 — Higher Normal Forms: 4NF and 5NF Lesson 1765 — Big Data vs Big Compute
Some aggregations: that require full dataset knowledge are unavailable or slow.; Lesson 1796 — Limitations and Differences from Pandas
Sorted retrieval: (`ORDER BY`) comes nearly free since data is already ordered; Lesson 1079 — B-Tree Indexes: Structure and Mechanics
Source: Which table, file, or API it came from; Lesson 1163 — Metadata and Data Dictionaries Lesson 1823 — Pipeline Components: Sources, Transformations, Sinks
Source connectors: Extract data from databases, APIs, cloud storage, or streaming services; Lesson 1822 — What is a Data Pipeline?
Source information: Original data location, collection date, version; Lesson 2065 — Tracking Data Lineage
Source URL or Location: The exact web address, API endpoint, database connection string, or file path where you obtained the data.; Lesson 2063 — Essential Metadata to Capture
Source/Derivation: Where the data came from or how it was calculated; Lesson 2064 — Creating Data Dictionaries
Sources: Lesson 1823 — Pipeline Components: Sources, Transformations, Sinks
Space: (to activate), and **arrow keys** (for fine control).; Lesson 1253 — Interactive Accessibility: Keyboard Navigation
Spark Core: is the foundation of the entire framework.; Lesson 1775 — Spark Components: Core, SQL, MLlib, Streaming
Spark SQL: brings structured data processing to Spark.; Lesson 1775 — Spark Components: Core, SQL, MLlib, Streaming Lesson 1778 — DataFrames and Spark SQL Basics
Spark Streaming: enables real-time data processing through micro-batching:; Lesson 1775 — Spark Components: Core, SQL, MLlib, Streaming
Spatial Correlation: Geographic data points near each other (neighboring counties, adjacent plots of land) tend to be more similar than distant ones.; Lesson 381 — Independence Assumption and Its Violations
Spatial data: Neighboring geographic areas influence each other; Lesson 548 — Independence of Observations
Spatial grouping: (separate positions, but not ordered); Lesson 1238 — Matching Encoding to Data Type
Spatial heatmaps: and **density maps** solve this by showing *where* activity is most concentrated, creating smooth gradients that reveal patterns invisible in raw point data.; Lesson 1312 — Heatmaps and Density Maps for Spatial Data
Spearman: .; Lesson 487 — When to Use Spearman vs Pearson Lesson 1184 — Correlation Coefficients in Bivariate Analysis
Spearman correlation: works with ranked data instead of raw values.; Lesson 1184 — Correlation Coefficients in Bivariate Analysis
Spearman's Rho: correlates the *ranks* of your data, essentially asking "how well does a linear relationship fit the ranked data?; Lesson 490 — Kendall's Tau vs Spearman's Rho
Special populations: include:; Lesson 1918 — Special Populations and Vulnerable Groups
specific: .; Lesson 228 — Defining Populations and Parameters Lesson 1166 — Defining the Business Question Lesson 1912 — What is Informed Consent in Data Science?Lesson 2094 — Defining Success Metrics Upfront
Specific and Actionable: Avoid vague advice like "improve customer retention.; Lesson 1970 — Recommendations and Next Steps
Specific and quantifiable: – Uses numbers, percentages, or binary outcomes; Lesson 1610 — Defining Measurable Key Results
Specification Limits: are the "voice of the customer.; Lesson 1400 — Control Limits vs Specification Limits
Specificity: "Increase sales" becomes "Predict which existing customers are likely to purchase Product X in the next 30 days"; Lesson 10 — Problem Definition and Scoping Lesson 109 — Medical Diagnostic Testing Lesson 498 — Bradford Hill Criteria for Causation Lesson 1200 — Formulating Specific, Testable Hypotheses
Speed: Parquet/Feather > CSV > JSON > Excel; Lesson 1133 — Performance Considerations Across Formats Lesson 2123 — Simple Rules Beat Complex Models
Speed and Scale: A biased recommendation algorithm can expose millions to harmful content in hours, far beyond what human curation could achieve.; Lesson 1923 — Algorithmic Amplification of Harm
Speed and simplicity: No transformation bottleneck during load—get data in fast, ask questions later.; Lesson 1816 — What is ELT? Extract, Load, Transform Explained
Speed matters: You need rapid inference or real-time updates; Lesson 1556 — Choosing Between Conjugate and Non-Conjugate Priors Lesson 1595 — Stan: High- Performance Bayesian Inference
Speed up development: by working with small, fast result sets; Lesson 877 — LIMIT: Restricting the Number of Rows Returned
Spikes at regular intervals: (e.; Lesson 722 — ACF Plots and Interpretation
Spillovers: happen when the treatment affects the control group indirectly.; Lesson 1458 — Common DiD Pitfalls
Splines: and **piecewise methods** offer an alternative approach with some key advantages.; Lesson 662 — Polynomial Features vs Splines
Split: Partition your data into independent chunks (often by rows); Lesson 1768 — Data Parallelism Fundamentals
Split each party's data: into encrypted "shares" distributed among participants; Lesson 1903 — Secure Multi-Party Computation
Split your data: Reserve the last portion (e.; Lesson 790 — Out-of-Sample Forecast Evaluation
Split your dataset: into strata based on confounder values (e.; Lesson 1430 — Controlling for Confounders: Stratification
Spot critical drop-off points: Is Week 1 your danger zone?; Lesson 1656 — Visualizing Retention Curves
Spot early warning signs: when new cohorts show unusual churn patterns; Lesson 1672 — Cohort-Based Churn Analysis
Spot real trends: See if unemployment is genuinely rising or just following seasonal patterns; Lesson 748 — Seasonally Adjusted Data
Spot trends over time: Are newer cohorts retaining better than older ones?; Lesson 1659 — Comparing Retention Across Cohorts
Spot underutilized gems: Low adoption but high frequency among adopters suggests poor discoverability; Lesson 1696 — Feature Adoption and Usage Frequency
Spotify: Time spent listening (value = entertainment delivered); Lesson 1604 — What is a North Star Metric?Lesson 1606 — Examples of North Star Metrics by Industry
Spotify's lightweight framework: that emphasizes simplicity and file-based targets.; Lesson 1839 — Alternative Orchestration Tools
Spread: Lesson 1172 — What is Univariate Analysis?Lesson 1176 — Box Plots for Spread and Outliers Lesson 1208 — Distribution Checks for All Variables Lesson 1220 — Histograms for Continuous Distributions
Spreads: Which group shows more variability (wider IQR)?; Lesson 1186 — Box Plots and Violin Plots by Group
Spring: Unknown structure, want to discover communities; Lesson 1318 — Network Layout Algorithms
Sprint goal: "Deliver initial churn prediction baseline with three features"; Lesson 2113 — Timeboxing and Sprint Planning for Data Projects
Sprint-level: "This week's goal is baseline model only"; Lesson 2121 — Timeboxing and Deadlines
spurious correlation: occurs when two variables appear statistically related but have no genuine cause-and-effect relationship.; Lesson 494 — Spurious Correlations and Coincidence Lesson 1422 — Spurious Correlations
Spurious relationships: We might detect patterns or correlations that don't actually exist, leading to false confidence in our forecasts.; Lesson 713 — Why Stationarity Matters Lesson 734 — Why Differencing and Detrending Matter
SQL and Stats Tests: often come first as screeners.; Lesson 2142 — Interviewing: Technical and Behavioral Prep
SQL Server: Often case-insensitive, but depends on collation settings; Lesson 862 — Case Sensitivity in Text Filtering Lesson 940 — Database Support and Alternatives
SQLAlchemy Core: provides a *SQL Expression Language*—a Pythonic way to write SQL queries using functions and methods instead of raw strings.; Lesson 1118 — SQLAlchemy Core vs ORM
SQLAlchemy ORM: provides a higher-level abstraction where you work with *Python classes and objects* instead of tables and rows.; Lesson 1118 — SQLAlchemy Core vs ORM
SQLite: is a lightweight DBMS that stores your entire database in a single file.; Lesson 845 — Database Management Systems (DBMS)Lesson 940 — Database Support and Alternatives Lesson 1041 — Formatting and Parsing Dates
Square root: (`sqrt(Y)`) is gentler than log and works well for count data.; Lesson 591 — When and Why to Transform Variables
Square Root Transformation: (`sqrt(x)`) works particularly well for:; Lesson 213 — Square Root and Cube Root Transformations
SS (Sum of Squares): How much total variation comes from each source; Lesson 444 — The ANOVA Table
Stability: Less erratic behavior at data boundaries; Lesson 662 — Polynomial Features vs Splines Lesson 1734 — Comparing and Validating Attribution Models
Stability over time: The relationship shouldn't suddenly shift; Lesson 1518 — The Relationship Between Surrogate and Business Metrics
Stabilize coefficient estimates: Less wobbling between models; Lesson 585 — Remedies: Variable Selection
Stabilizing variance: – Making the spread of data more consistent across different ranges; Lesson 212 — Log Transformations
Stable patterns: All cohorts behave similarly.; Lesson 1650 — Comparing Cohorts Over Time
stack: the bars on top of each other, or **group** them side-by-side.; Lesson 1226 — Stacked and Grouped Bar Charts Lesson 1353 — Position Adjustments: Dodge, Stack, and Jitter
Stack traces: The full path of execution leading to the failure; Lesson 1851 — Error Logging and Notifications
Stacked bar charts: pile segments on top of each other to show both part-to-whole relationships and totals.; Lesson 1188 — Stacked and Grouped Bar Charts
Stacked bars: work best for showing composition and totals simultaneously.; Lesson 1188 — Stacked and Grouped Bar Charts Lesson 1226 — Stacked and Grouped Bar Charts Lesson 1266 — Bar Plots: Categorical Comparisons
Staff Data Scientist: Technical leadership across multiple projects, set standards, solve org-wide problems; Lesson 2140 — Individual Contributor vs Management Tracks
Stage: new users vs.; Lesson 1701 — What is Customer Segmentation?Lesson 1874 — DVC Pipelines and Stages
Stage 1: Use cluster sampling to randomly select a few states; Lesson 238 — Multistage Sampling
Stage 2: Use stratified sampling to select universities within those states (ensuring you get different types: public, private, large, small); Lesson 238 — Multistage Sampling
Stage 3: Use simple random sampling to select individual students from each chosen university; Lesson 238 — Multistage Sampling
Stage a single file: Lesson 1994 — Staging Changes with git add
Stage multiple files: Lesson 1994 — Staging Changes with git add
Stage the resolved files: with `git add <filename>` (or `git add .; Lesson 2011 — Resolving Merge Conflicts Lesson 2018 — Resolving Conflicts During Rebase
Staged files: Files you've added to the staging area with `git add`, ready for the next commit; Lesson 1998 — Checking Repository Status
Staging Area: (Index): The box where you arrange items you've decided to ship; Lesson 1993 — The Three States: Working Directory, Staging, Repository
Stakeholder Alignment: Everyone agrees on what "success" looks like before you start; Lesson 10 — Problem Definition and Scoping Lesson 1973 — Report Review and Quality Checklist
Stakeholder communication: Inform affected communities before public release when possible; Lesson 1925 — Mitigation Strategies and Responsible Disclosure
Stakeholder confidence: They wonder why you're still working instead of moving forward; Lesson 2120 — The Opportunity Cost of Iteration
Stakeholder indifference: Additional precision doesn't change the business decision; Lesson 2116 — Diminishing Returns and the 80/20 Rule
Stakeholder learning: Non-technical partners often don't fully understand what they need until they see something concrete.; Lesson 2109 — Why Data Science is Inherently Iterative
Stakeholder management: Translating technical work into business impact; Lesson 2142 — Interviewing: Technical and Behavioral Prep
Stakeholder-driven iteration: Business users see preliminary results and refine requirements.; Lesson 2092 — Iteration and Feedback Loops in Practice
Stakeholders need self-service analytics: (executives checking KPIs, analysts exploring trends); Lesson 1330 — Introduction to Interactive Dashboards
Stakes are high: Major feature launches, pricing changes, or algorithm overhauls; Lesson 1522 — Balancing Speed and Accuracy in Metric Selection Lesson 1556 — Choosing Between Conjugate and Non-Conjugate Priors
Stakes are low: Minor UI tweaks, button colors, or copy changes; Lesson 1522 — Balancing Speed and Accuracy in Metric Selection
Stale tracking: (data pipeline breaks, no one notices for weeks); Lesson 1619 — What is Metric Ownership?
Stamen Terrain: Emphasizes topography and natural features; Lesson 1314 — Basemaps and Map Tiles
Standard: 2-4 weeks to account for novelty bias; Lesson 1484 — Duration and Timing Considerations
Standard Attribution Logic: Lesson 1643 — Building Attribution Frameworks
Standard deviation: solves this by taking the square root of the variance, returning the measure to the original units:; Lesson 49 — Standard Deviation: Interpretable Spread Lesson 52 — Mean Absolute Deviation (MAD)Lesson 54 — When to Use Each Measure Lesson 122 — Variance and Standard Deviation of Discrete Random Variables Lesson 136 — Expectation and Variance of the Negative Binomial Lesson 141 — Mean and Variance of Poisson Distribution Lesson 148 — Variance and Standard Deviation of Discrete Distributions Lesson 166 — Exponential Distribution: Mean and Variance (+5 more)
Standard Deviation (SD): measures how spread out the *individual values* in your dataset are from the mean.; Lesson 261 — Standard Error vs Standard Deviation
Standard deviation = 1: One unit on the horizontal axis equals one standard deviation; Lesson 194 — The Standard Normal Distribution
standard error: ) is:; Lesson 223 — Standard Error and the CLT Lesson 224 — CLT for Proportions Lesson 256 — Variability of Sample Statistics Lesson 260 — Defining Standard Error Lesson 271 — Margin of Error Lesson 276 — Sampling Distribution of a Proportion Lesson 277 — Standard Error for Proportions Lesson 300 — Bootstrap Distribution of a Statistic (+3 more)
Standard Error (SE): measures how spread out the *sample means* would be if you took many samples from the same population.; Lesson 261 — Standard Error vs Standard Deviation
Standard error (unpooled): Lesson 412 — Confidence Interval for Difference
standard normal distribution: is a special case of the normal distribution with a **mean (μ) of 0** and a **standard deviation (σ) of 1**.; Lesson 194 — The Standard Normal Distribution Lesson 403 — Finding P-Values for Proportion Tests
Standard normal tables: (Z-tables) after converting to Z-scores; Lesson 173 — Calculating Probabilities with the Normal Distribution
Standardization for Comparison: Comparing SAT scores (mean 1050, SD 200) to ACT scores (mean 21, SD 5) directly is meaningless.; Lesson 201 — Z-Score Applications and Limitations
Standardize the Approach: Lesson 2046 — Best Practices for Environment Management in Teams
Standardized: divided by the standard error of that cell; Lesson 428 — Post-Hoc Analysis and Residuals Lesson 588 — Standardized and Studentized Residuals
Standardized coefficients: (also called **beta weights** or **β weights**) put all predictors on the same scale by expressing them in standard deviation units.; Lesson 608 — Standardized Coefficients (Beta Weights)
Standardized residuals: divide each residual by an estimate of its standard deviation:; Lesson 563 — Standardized and Studentized Residuals Lesson 588 — Standardized and Studentized Residuals
Standardizing capitalization: ensures "Apple", "APPLE", and "apple" are recognized as the same.; Lesson 1138 — Cleaning and Standardizing Text Fields
star schema: is a common data warehouse design where one central **fact table** (containing measurements like sales amounts, quantities, or counts) connects to multiple **dimension tables** (containing descriptive attributes like customer names, product detail...; Lesson 956 — Star Schema Joins Lesson 1808 — Star Schema and Fact Tables
start: a transaction block and how to **commit** it to save your work.; Lesson 1112 — Starting and Committing Transactions Lesson 1582 — Updating Beliefs with Test Data
Start small and targeted: Don't attempt to rewrite everything at once.; Lesson 2137 — Refactoring Strategies and Debt Paydown
Start visual: Create a histogram and Q-Q plot.; Lesson 210 — Combining Visual and Statistical Methods
Start with d=0: Check if your original series is already stationary using visual inspection and the Augmented Dickey-Fuller or KPSS tests you learned earlier.; Lesson 778 — Determining Differencing Order (d)
Start with domain knowledge: Which predictors make theoretical sense?; Lesson 633 — Practical Model Selection Strategy
Start with initial guesses: for all parameters (θ₁, θ₂, .; Lesson 1591 — Gibbs Sampling for Multivariate Posteriors
Start with the answer: Lead with your key finding or recommendation (remember the Pyramid Principle from lesson 1952).; Lesson 1965 — Progressive Disclosure Techniques
Start with your prior: `P(θ)` — your belief about parameter θ before seeing data; Lesson 1545 — Calculating the Posterior Distribution
Starts at 0: F(-∞) = 0 (no probability accumulated yet); Lesson 157 — Cumulative Distribution Functions (CDFs) for Continuous Variables
State conclusions in context: , not just statistical jargon; Lesson 368 — Common Pitfalls and Best Practices
State your hypotheses: For example, H₀: median = 50 vs H₁: median ≠ 50; Lesson 391 — The Sign Test for Medians Lesson 396 — Bootstrap Hypothesis Testing Lesson 447 — Conducting One-Way ANOVA in Practice
Static validation: Parse your DAG definition without executing it.; Lesson 1846 — Testing and Validating Dependency Graphs
Stationarity: means that a time series has **constant statistical properties over time**.; Lesson 712 — What is Stationarity?Lesson 740 — Choosing Between Differencing and Detrending Lesson 1169 — Clarifying Assumptions and Constraints
stationary: Lesson 716 — Augmented Dickey-Fuller Test Lesson 725 — Decay Rates in ACF Lesson 734 — Why Differencing and Detrending Matter
Statistical confirmation: Run stationarity tests after differencing.; Lesson 778 — Determining Differencing Order (d)
Statistical exploration is central: R's grammar of graphics makes iterative statistical visualization seamless.; Lesson 1375 — Choosing Tools: When to Use R vs Python for Visualization
Statistical hypothesis testing: lets you quantify whether observed differences are likely real effects or just sampling noise.; Lesson 1684 — Statistical Significance in Funnel Comparisons
Statistical independence: Sessions from the same user aren't independent—they're correlated.; Lesson 1481 — Unit of Randomization
Statistical methods: you choose (some tests only work for continuous data); Lesson 18 — Numerical Variables: Discrete and Continuous Lesson 1209 — Outlier Detection and Investigation
Statistical power: is the probability that your hypothesis test will correctly reject a false null hypothesis.; Lesson 338 — What is Statistical Power?Lesson 375 — Paired t-Test vs Two-Sample t-Test Lesson 397 — Power and Efficiency of Non-Parametric Tests Lesson 405 — Sample Size and Power for Proportion Tests Lesson 446 — Power and Sample Size for ANOVA Lesson 505 — Sample Size and Power for Correlation Tests Lesson 1493 — Why Sample Size Matters in A/B Tests Lesson 1529 — Running Underpowered Tests
Statistical power increases: – the test becomes better at detecting *any* deviation; Lesson 209 — Sample Size Considerations in Normality Tests Lesson 341 — Effect Size and Power
Statistical power varies: Some comparisons have more precision than others; Lesson 468 — Balanced vs Unbalanced Designs
Statistical significance: (p = 0.; Lesson 389 — Reporting Effect Sizes in Practice Lesson 529 — Practical vs Statistical Significance Lesson 609 — Practical vs Statistical Significance Lesson 1858 — Alerting Strategies
Statistical significance testing: answers the question: "Is this predictor's coefficient reliably different from zero, or could I have gotten this result just from random variation?; Lesson 606 — Statistical Significance of Individual Coefficients
Statistical sophistication: Teams must understand alpha spending functions, confidence sequences, or group boundaries— not just basic t-tests; Lesson 1515 — Trade-offs: Sample Size, Speed, and Complexity
Statistical test: Run a DiD-style regression using only pre-treatment data, with placebo "treatment" dates.; Lesson 1456 — Testing Parallel Trends
Statistical testing: you're testing whether categories differ from the reference, not whether they differ from zero; Lesson 643 — Interpreting Coefficients Relative to Reference
Statistical tests: (Shapiro-Wilk, Kolmogorov-Smirnov, Anderson-Darling, Jarque-Bera) give you *objective numbers* with p-values.; Lesson 210 — Combining Visual and Statistical Methods Lesson 217 — Evaluating Transformation Effectiveness Lesson 734 — Why Differencing and Detrending Matter Lesson 788 — Checking Residual Normality Lesson 1491 — Covariate Balance and Diagnostics
statistical transformations: (or "stats").; Lesson 1343 — Statistical Transformations Lesson 1352 — Statistical Transformations with stat_* Layers
Statistical validation: A test confirming the change isn't random noise; Lesson 1946 — Supporting Your Claims with Evidence
statistically significant: doesn't mean it's **practically meaningful**.; Lesson 609 — Practical vs Statistical Significance Lesson 723 — Significance Bounds in ACF Plots
Statistics: focuses on testing hypotheses and understanding uncertainty with mathematical rigor.; Lesson 1 — Defining Data Science Lesson 7 — The Data Science Skill Stack Lesson 229 — Defining Samples and Statistics
Statistics (stat): Transformations applied to data (means, counts, smoothing); Lesson 1340 — The Seven Layers of Grammar
Status dependencies: Certain field combinations are impossible.; Lesson 1155 — Consistency Checks Across Fields
Stay interpretable: Stakeholders understand exactly what changed and why; Lesson 2128 — Data Distribution Shifts Frequently
Steep drops: Many events happening at specific times; Lesson 815 — Survival Curve Plots and Interpretation
Step 1: Check Independence: Lesson 383 — Diagnostic Workflow: When to Proceed or Switch Tests
Step 1: Decompose: your historical data into trend, seasonal, and remainder components using your chosen method (classical or STL).; Lesson 749 — Using Decomposition for Forecasting
Step 2: Achieve 1NF: Lesson 1069 — Normalization Process Step-by-Step
Step 2: Assess Normality: Lesson 383 — Diagnostic Workflow: When to Proceed or Switch Tests
Step 2: Calculate IQR: Lesson 1385 — Calculating IQR Fences in Practice
Step 4: Flag Outliers: Lesson 1385 — Calculating IQR Fences in Practice
Step 4: Reach 3NF: Lesson 1069 — Normalization Process Step-by-Step
Step 5: Recombine: using your model type:; Lesson 749 — Using Decomposition for Forecasting
step function: that drops at each event time, creating the characteristic "survival curve" you'll visualize.; Lesson 809 — Introduction to the Kaplan-Meier Estimator Lesson 815 — Survival Curve Plots and Interpretation Lesson 1639 — Time Windows and Attribution Decay
Step-by-step instructions: How to run the analysis from start to finish; Lesson 1989 — Best Practices for Sharing Reproducible Reports
STL: stands for **S**easonal-**T**rend decomposition using **L**oess.; Lesson 745 — STL Decomposition (Seasonal-Trend Loess)
Stop: at the first p-value that fails to reject; all subsequent tests are also not rejected; Lesson 1504 — Holm-Bonferroni Method
Stop and accept H₀: (no significant difference); Lesson 1511 — Sequential Probability Ratio Test (SPRT)
Stop and reject H₀: (declare a winner); Lesson 1511 — Sequential Probability Ratio Test (SPRT)
Stop early: when evidence is strong (saving time and resources); Lesson 1510 — Sequential Testing Overview
Stop when stationary: Don't difference more than necessary—if your tests confirm stationarity, stop there.; Lesson 778 — Determining Differencing Order (d)
Stopping: | Fixed sample size or sequential correction needed | Natural sequential updating, stop anytime |; Lesson 1580 — Bayesian vs Frequentist A/B Testing
Stopping rules: (fixed horizon?; Lesson 1508 — Pre-Registration and Correction Strategy
Storage Limitation: Lesson 1905 — Core Principles of GDPR
Storage space: You're duplicating information that could be derived; Lesson 1073 — Storing Computed Values and Aggregates Lesson 1074 — Duplicating Data Across Tables Lesson 1077 — Measuring Performance Impact of Denormalization
Store data separately: Use cloud storage (S3, Google Cloud), shared drives, or dedicated data warehouses; Lesson 2070 — Separating Data from Code
Straight line: Normality assumption holds; Lesson 565 — What Q-Q Plots Show: Comparing Residual Distribution to Normal
Strain applications: displaying or processing thousands of rows; Lesson 911 — Performance Considerations with Multiple Groups
strata: (homogeneous subgroups) and then sampling proportionally from each stratum.; Lesson 236 — Stratified Sampling Lesson 817 — Comparing Multiple Survival Curves
Strategic boundaries: Choosing cutoffs that produce desired patterns rather than natural ones; Lesson 1245 — Misleading Aggregations and Binning
Strategic Callouts: Lesson 1960 — Annotation and Labeling Best Practices
Strategic goals: Long-term company objectives; Lesson 1516 — Business Metrics: Definition and Examples
Strategic planning: Identify which touchpoints work best at different customer journey stages; Lesson 1718 — Introduction to Marketing Attribution
Strategically aligned: Connect directly to your North Star Metric or broader business priorities; Lesson 1609 — Setting Effective Objectives
Strategy 4: Column-by-column parsing: Lesson 1136 — Handling Mixed Encodings in a Single Dataset
Stratified Cox models: allow you to account for a variable's effect on survival *without* assuming proportional hazards for that variable.; Lesson 832 — Stratified Cox Models
Stratified or adjusted approaches: More sophisticated corrections that balance power and error control; Lesson 824 — Multiple Group Comparisons
Stratified randomization: solves this by first dividing your sample into homogeneous subgroups (strata) based on key covariates, then randomizing *within* each stratum.; Lesson 1489 — Stratified Randomization Fundamentals
Stratified sampling: solves this by dividing your population into **strata** (homogeneous subgroups) and then sampling proportionally from each stratum.; Lesson 236 — Stratified Sampling Lesson 237 — Cluster Sampling Lesson 240 — Quota Sampling Lesson 243 — Choosing the Right Sampling Method Lesson 1885 — Mitigation Strategies: Data Collection
Streaming is essential when: Lesson 1824 — Batch vs Streaming Pipelines
Streaming pipelines: work like a phone call—process information instantly as it flows through.; Lesson 1824 — Batch vs Streaming Pipelines
Streamlit: prioritizes **simplicity and speed**.; Lesson 1330 — Introduction to Interactive Dashboards
Streamlit Cloud: is the easiest option for Streamlit apps—simply connect your GitHub repository, and it deploys automatically.; Lesson 1338 — Deployment and Sharing Dashboards
Strength: Are points tightly clustered along a line, or scattered widely?; Lesson 480 — Scatterplots and Visual Assessment Lesson 498 — Bradford Hill Criteria for Causation Lesson 1183 — Scatter Plots for Two Numeric Variables
strong: when it's much more likely to appear if the person is guilty than if innocent.; Lesson 112 — Legal Evidence and Jury Reasoning Lesson 1610 — Defining Measurable Key Results
Strong correlation: Changes in the surrogate should consistently predict changes in the business metric; Lesson 1518 — The Relationship Between Surrogate and Business Metrics
Strong relationships: jump out as values near +1 or -1.; Lesson 511 — Reading and Interpreting Correlation Matrices
Strong validation exists: Your surrogate has proven correlation with business outcomes; Lesson 1522 — Balancing Speed and Accuracy in Metric Selection
Structural zeros: People who would *never* experience the event (e.; Lesson 695 — Zero-Inflated Models
Structure: How many columns?; Lesson 1151 — Schema Validation
Structure your narrative: around these three pillars—each becomes a mini-story within your larger presentation; Lesson 1940 — The Rule of Three in Data Storytelling
Structured data: is information organized into rows and columns, like a spreadsheet or database table.; Lesson 16 — Structured vs Unstructured Data Lesson 20 — Primary Data Sources: Databases and Data Warehouses Lesson 22 — File Formats: CSV, JSON, and Beyond
Structured Query Language: .; Lesson 844 — What is SQL?
Student's t distributions: for heavier tails (more robust to outliers); Lesson 1565 — Prior Distributions for Normal Means
Studentized residuals: go further: they refit the model *without* that specific observation and see how much it differs:; Lesson 563 — Standardized and Studentized Residuals Lesson 588 — Standardized and Studentized Residuals
Students: (StudentID, StudentName); Lesson 1065 — Second Normal Form (2NF)
Style and consistency: Does it follow team conventions?; Lesson 2024 — Code Review Best Practices
Subgroup analyses: you plan to run, if any; Lesson 1508 — Pre-Registration and Correction Strategy
Subgroup analysis: Always disaggregate your fairness metrics across combinations of protected attributes (gender × race, age × disability status, etc.; Lesson 1893 — Intersectionality in Fairness
Subject Matter Expertise: Lesson 1602 — Identifying Leading Indicators for Your Metrics
Subject matter experts: Talk to salespeople, operations staff, customers; Lesson 1201 — Domain Knowledge as a Hypothesis Source
Subjective labeling: When humans label training data—tagging images, rating sentiment, or classifying documents— their personal biases, cultural backgrounds, and varying interpretations create inconsistency.; Lesson 1880 — Measurement and Label Bias
Subscriber Acquisition Cost: Marketing spend divided by new subscribers, but media-specific: track which content drives sign- ups.; Lesson 1635 — Media and Content Metrics: Watch Time and Content Performance
Subscription duration modeling: treats cancellation as the "event" and subscription length as the "time" variable, letting you predict when customers are most likely to churn and what drives retention.; Lesson 838 — Subscription and Membership Duration Modeling
Subscription start: When a user begins a paid plan; Lesson 1646 — Defining Cohort Start Events
SUBSTRING: extracts a specific portion of a string; Lesson 1044 — String Manipulation: CONCAT, LENGTH, and SUBSTRING
Subtract 1: Because once you know the counts for all but one category, the last one is determined (they must sum to your total sample size); Lesson 418 — Degrees of Freedom in Goodness of Fit
Subtract estimated parameters: If you had to estimate any population parameters from your data (like a mean or proportion), you lose additional degrees of freedom; Lesson 418 — Degrees of Freedom in Goodness of Fit
Subtract the mean: (x - μ): This centers your data point.; Lesson 196 — Calculating Z-Scores from Raw Data
Subtracting intervals: Lesson 1040 — Date Arithmetic and INTERVAL Operations
Success: Commits automatically when the block completes; Lesson 1114 — Transaction Context Managers in Python
success criteria: the numbers that tell you whether your experimental change actually improved things.; Lesson 1516 — Business Metrics: Definition and Examples Lesson 2093 — Translating Business Questions into Analytical Questions Lesson 2103 — Managing Expectations and Defining Success
Success metrics: to track implementation; Lesson 1970 — Recommendations and Next Steps
Success-Failure Condition: Lesson 400 — Assumptions and Conditions for Proportion Tests Lesson 411 — Sample Size Requirements
Sudden shifts: equipment calibration changes, policy updates, or batch effects; Lesson 562 — Index Plots and Time-Ordered Residuals
Sum: all those products; Lesson 45 — Central Tendency for Grouped Data Lesson 225 — CLT for Sums and Other Statistics Lesson 892 — GROUP BY with Different Aggregate Functions Lesson 894 — NULL Values in GROUP BY
Sum of absolute residuals: Better, but mathematically difficult to work with (no smooth derivative).; Lesson 517 — The Least Squares Criterion
Sum of raw residuals: No—positive and negative errors cancel out.; Lesson 517 — The Least Squares Criterion
Sum the positive ranks: (W ) and **negative ranks** (W ); Lesson 392 — Wilcoxon Signed-Rank Test
Sum the ranks: for each group separately; Lesson 393 — Mann-Whitney U Test (Wilcoxon Rank-Sum)
SUM(): naturally ignores NULLs, so unmatched rows contribute nothing (which is usually what you want); Lesson 933 — Aggregating with LEFT JOINs
Sums of measurements: (total wait time, cumulative sales); Lesson 225 — CLT for Sums and Other Statistics
Support complex relationships: between different types of information (customers → orders → products); Lesson 842 — What is a Database?
Supporting observations: the patterns, anomalies, or visualizations that sparked the hypothesis (e.; Lesson 1203 — Documenting Hypotheses and Evidence
Suppression: removes certain values entirely when they're too identifying—like removing ZIP codes for rural areas where few people live.; Lesson 1895 — Data Anonymization Basics Lesson 1896 — K-Anonymity
Surface plots: provide a solid, colored representation that emphasizes the overall shape and makes valleys and peaks immediately visible.; Lesson 1325 — 3D Surface and Wireframe Plots
Surrogate: 30-day engagement score or feature adoption rate; Lesson 1517 — Surrogate Metrics: When Direct Measurement is Impractical
Surrogate keys: are artificial identifiers created solely for database purposes—typically auto-incrementing integers or UUIDs.; Lesson 1050 — Choosing Effective Primary Keys
surrogate metrics: come in.; Lesson 1517 — Surrogate Metrics: When Direct Measurement is Impractical Lesson 1519 — Common Surrogate Metrics in A/B Testing Lesson 1522 — Balancing Speed and Accuracy in Metric Selection
Survey data: When respondents represent different population sizes; Lesson 43 — Weighted Mean and Its Applications
Survey response rates: (proportion who respond); Lesson 184 — Beta Distribution: Bounded Between 0 and 1
Surveys and questionnaires: Directly asking people for information; Lesson 11 — Data Collection and Acquisition
Survival analysis: Time to failure after multiple stresses; Lesson 181 — Gamma Distribution: Shape and Rate Parameters Lesson 1674 — Churn Prediction Models
Survival bias: only certain types complete treatment and remain observable; Lesson 1444 — Selection Bias and Treatment Assignment
Survival models: predict both *how long* a customer will remain active and *how much* they'll spend during that time.; Lesson 1668 — Predictive LTV Models
Survival times: in medical studies; Lesson 179 — When Variables Are Log-Normally Distributed Lesson 187 — The Weibull Distribution: Shape, Scale, and Survival
Survivorship bias: Only studying "survivors" or successes.; Lesson 244 — Selection Bias and Its Causes Lesson 1532 — Survivorship Bias and Attrition
SVG: (`.; Lesson 1262 — Saving Figures to Files
Swamping: A valid point gets falsely flagged because outliers distort the statistics; Lesson 1407 — The ESD Component
Switch datasets: (e.; Lesson 1302 — Interactive Controls: Dropdown Menus and Buttons
Symmetric: around the mean (left side mirrors the right); Lesson 169 — The Normal Distribution: Definition and Properties Lesson 194 — The Standard Normal Distribution Lesson 1175 — Histograms for Distribution Shape
Symmetric distributions: (bell-shaped): Mean, median, and mode are roughly equal—use any, though mean is most common.; Lesson 42 — Comparing Mean, Median, and Mode Lesson 220 — Sample Size Requirements for the CLT Lesson 221 — CLT for Different Population Distributions
Symmetrical (No Skew): Lesson 64 — Skewness: Definition and Interpretation
Symmetrically distributed data: without extreme outliers; Lesson 39 — The Mean (Arithmetic Average)
Symmetry: Does the left mirror the right?; Lesson 63 — Understanding Distribution Shape Lesson 174 — Symmetry and the Mode, Median, Mean Lesson 377 — Testing Normality: Visual Methods Lesson 1176 — Box Plots for Spread and Outliers
Symmetry around zero: well-specified models should show roughly symmetric deviance residuals; Lesson 701 — Deviance Residuals
System dependencies: (compilers, system libraries); Lesson 2038 — What is Environment Management and Why It Matters
Systematic deviations: Non-normal distribution; Lesson 204 — Q-Q Plots: Theory and Interpretation
Systematic sampling: is efficient and easy.; Lesson 243 — Choosing the Right Sampling Method

T

T-Closeness: goes further: the distribution of sensitive attributes in each group must be **close to the overall distribution** in the dataset (within threshold T).; Lesson 1897 — L-Diversity and T-Closeness
t-distribution: comes in.; Lesson 268 — Critical Values and the t-Distribution Lesson 272 — When to Use Z vs t Lesson 351 — When to Use a One-Sample t-Test Lesson 352 — The t-Distribution and Degrees of Freedom
t-statistic: is the core calculation in a one-sample t-test.; Lesson 353 — Calculating the t-Statistic Lesson 606 — Statistical Significance of Individual Coefficients Lesson 621 — Interpreting t-Statistics and Confidence Intervals Lesson 654 — Testing Interaction Significance
t-test: with these hypotheses:; Lesson 606 — Statistical Significance of Individual Coefficients Lesson 1749 — Measuring Statistical Significance
T2D3: = Triple, Triple, Double, Double, Double.; Lesson 1629 — SaaS Growth Metrics: Quick Ratio and Net Revenue Retention
Tab: key (to move forward), **Shift+Tab** (to move backward), **Enter** or **Space** (to activate), and **arrow keys** (for fine control).; Lesson 1253 — Interactive Accessibility: Keyboard Navigation
table: is like a spreadsheet in a database—it stores data in rows and columns.; Lesson 846 — Tables, Schemas, and Data Types Lesson 1117 — What is an ORM and Why Use It?
table aliases: .; Lesson 945 — Introduction to Self-Joins Lesson 976 — Basic Correlated Subquery Syntax
Table name qualification: means prefixing column names with their table name using dot notation:; Lesson 922 — Selecting Columns from Joined Tables
Table sizes: Joining smaller tables first reduces intermediate result sets; Lesson 951 — Join Order and Performance
tables: with rows and columns, making it easy to store large volumes of information efficiently and access it reliably.; Lesson 842 — What is a Database?Lesson 843 — Relational Database Concepts
Tail behavior: Are extremes rare or common?; Lesson 63 — Understanding Distribution Shape Lesson 193 — Choosing Between Distributions in Practice
Take-Home Projects: test end-to-end skills: EDA, feature engineering, modeling, and communication.; Lesson 2142 — Interviewing: Technical and Behavioral Prep
Target ROAS: Varies by industry and margins, but often 3-4+ for healthy profitability; Lesson 1751 — Return on Ad Spend (ROAS): Definition and Calculation Lesson 1752 — Target ROAS and Break-Even Analysis
Target variable: Actual LTV (from mature cohorts where you've observed full lifecycles); Lesson 1668 — Predictive LTV Models
Targeted interventions: Identify high-risk periods (e.; Lesson 838 — Subscription and Membership Duration Modeling
Task-level: "Spend 2 hours exploring correlations, then move on"; Lesson 2121 — Timeboxing and Deadlines
tasks: (individual units of work) and **operators** (templates for tasks like PythonOperator, BashOperator, or SQLOperator).; Lesson 1833 — Introduction to Apache Airflow Lesson 1835 — Airflow Operators and Tasks
Tau-a: Simplest, doesn't adjust for ties (rarely used); Lesson 491 — Handling Ties in Rank Correlations
Tau-b: Adjusts for ties in both variables (most common); Lesson 491 — Handling Ties in Rank Correlations
Tau-c: Adjusts for table size in contingency tables; Lesson 491 — Handling Ties in Rank Correlations
Tax Reforms: When a city or state changes tax policy, neighboring regions serve as control groups.; Lesson 1459 — Real-World DiD Applications
Teaching and documentation: where the process matters as much as the result; Lesson 2074 — Notebooks vs Scripts: When to Use Each
Teaching and prototyping: Perfect for learning Bayesian concepts or quickly testing ideas; Lesson 1555 — Advantages and Limitations of Conjugate Priors
Team alignment: Give marketing, product, and leadership a shared view of what's working; Lesson 1718 — Introduction to Marketing Attribution Lesson 1727 — Linear Attribution Model
Team capacity: Your colleagues who depend on your work are blocked; Lesson 2120 — The Opportunity Cost of Iteration
Team-Level Key Results: Each team then defines 3-5 measurable Key Results that directly influence the North Star.; Lesson 1608 — Connecting North Star Metrics to OKRs
Technical: What systems must the solution integrate with?; Lesson 2102 — Understanding Stakeholder Goals and Constraints
Technical → Business: When stakeholders ask "How accurate is the model?; Lesson 2105 — Translating Between Technical and Business Language
Technical attributes: Browser, operating system, connection speed; Lesson 1682 — Segmenting Funnels by User Attributes
Technical audiences: (data scientists, engineers, analysts) typically:; Lesson 1950 — Identifying Your Audience: Technical vs Non-Technical
Technical costs: landing pages, tracking infrastructure, A/B testing tools; Lesson 1753 — Customer Acquisition Cost (CAC): Components and Calculation
Technical deep-dives: for the data-savvy audience members; Lesson 1949 — Anticipating Questions: Building in Appendices
Technical friction: (slow loading, complex forms); Lesson 1681 — Time-Based Funnel Analysis
Technical methodology details: Lesson 1971 — Appendices and Technical Details
Technical peers: , on the other hand, often need diagnostic depth: distributions, error bars, residual plots, correlation matrices.; Lesson 1954 — Tailoring Visualizations to Audience Needs
Technical peers/data scientists: Show your work.; Lesson 1953 — Adjusting Statistical Depth by Audience
Technical reviewers: can evaluate your conclusion before diving into methods; Lesson 1942 — The Pyramid Principle: Starting with the Conclusion
temperature: (or season).; Lesson 495 — Confounding Variables Lesson 509 — Confounding Variables and Control Lesson 1427 — What is a Confounding Variable?
Temperature readings: If you're monitoring a freezer that must stay below 0°C, a reading of 5°C is an outlier *by definition*, even if it's close to the mean due to equipment malfunction.; Lesson 75 — Domain-Specific Outlier Rules
Templates and Tooling: Lesson 1643 — Building Attribution Frameworks
Templates are your foundation: Create standardized templates for data documentation that include:; Lesson 2068 — Data Provenance Best Practices
Temporal dependence: Values at time *t* depend on values at *t-1*, *t-2*, etc.; Lesson 704 — What Makes Time Series Data Different?
Temporality: The cause must come *before* the effect—this is the only non-negotiable criterion.; Lesson 498 — Bradford Hill Criteria for Causation
TensorFlow: (`tf.; Lesson 2058 — Seed Scope and Multiple Libraries
Tenure and LTV: High-LTV churners warrant more personalized, generous offers; Lesson 1676 — Win-Back and Retention Strategies
Terms below were extracted from bolded phrases in lesson content. Click a lesson reference to jump
Terms of Service: present another illusion.; Lesson 1914 — Consent in Digital Contexts
Terms of service respect: – If you're scraping a website or using an API, are you honoring the platform's rules?; Lesson 36 — Responsible Data Sourcing and Use
Test: Run controlled experiment until statistical significance; Lesson 1692 — Statistical Significance and Iteration
Test before replacing: When proposing a new branch, validate that it truly influences parent metrics before permanently adding it to the tree.; Lesson 1626 — Maintaining and Evolving Metric Trees
Test causality: Run experiments where you deliberately move the metric and observe effects.; Lesson 1615 — Correlation Without Causation
Test credentials: Try connecting with a database client tool (like `psql` or SQLite browser) using the same credentials; Lesson 1093 — Troubleshooting Connection Issues
Test duration: creates its own problems.; Lesson 1500 — Practical Considerations and Trade-offs
Test parallel trends visually: Pre-treatment coefficients should be near zero; Lesson 1457 — Multiple Time Periods and Staggered Adoption
Test queries safely: by previewing just a handful of rows; Lesson 877 — LIMIT: Restricting the Number of Rows Returned
Test restoration: by recreating the environment on a fresh machine; Lesson 1987 — Environment and Dependency Management
Test segments: scores should separate retained vs churned cohorts clearly; Lesson 1699 — Engagement Scoring Systems
Test sequentially: starting with the smallest p-value; Lesson 1504 — Holm-Bonferroni Method
Test set: Fresh data held back until the very end for a final, unbiased evaluation; Lesson 14 — Model Evaluation and Validation
Test significance: using likelihood ratio tests, Wald tests, or AIC/BIC comparisons—tools you've already learned.; Lesson 703 — Sequential Model Building Strategy
Test small first: Use `LIMIT` while developing queries to avoid long waits; Lesson 880 — Performance Considerations and Best Practices
Test statistic: A calculated value measuring deviation from normality (larger = less normal); Lesson 207 — Anderson-Darling Test Lesson 314 — What is a Test Statistic?Lesson 319 — Calculating P- Values from Test Statistics Lesson 716 — Augmented Dickey-Fuller Test Lesson 818 — What is the Log- Rank Test?
Test statistic (t): (38 - 35) / 1.; Lesson 316 — Calculating Test Statistics from Sample Data
Test with multiple people: One person's confusion might be unique; three people struggling with the same element reveals a design problem.; Lesson 1964 — Testing Visualizations with Audiences
Test your work: Use CVD simulation tools to preview your visualizations as colorblind viewers see them.; Lesson 1248 — Color Blindness and Color Palette Design
Testable hypothesis: "Customers in Segment A have an average purchase frequency at least 20% higher than Segment B customers.; Lesson 1200 — Formulating Specific, Testable Hypotheses
Testing and development: Engineers can work with realistic data without privacy concerns; Lesson 1901 — Synthetic Data Generation
Testing becomes impossible: You can't validate pipeline logic if each run changes the outcome; Lesson 1847 — What is Idempotency?
Testing Multiple Claims Simultaneously: Lesson 313 — Common Pitfalls in Hypothesis Formulation
TEXT: Text strings (e.; Lesson 846 — Tables, Schemas, and Data Types
Text columns: Find alphabetically first and last values (based on sorting order); Lesson 885 — MIN and MAX: Finding Extremes
Text inputs: capture free-form text for searches or custom labels:; Lesson 1332 — Streamlit Widgets: Inputs and Controls
Text labels: add context anywhere on your plot.; Lesson 1271 — Adding Legends, Annotations, and Text
Text processing: Regular expressions, tokenization, or NLP on millions of documents; Lesson 1784 — Computation Complexity: Beyond Data Size
That's analysis: Recommending Policy A because "efficiency matters most" is **advocacy**—it injects your (or your organization's) values into the decision.; Lesson 1927 — Separating Analysis from Advocacy
Their branch's latest commit: The tip of the branch you're merging in; Lesson 2009 — Three-Way Merges
themes: (overall aesthetic) and **contexts** (size scaling).; Lesson 1294 — Seaborn Themes and Context Settings Lesson 1340 — The Seven Layers of Grammar
Then aggregate again: at a different level; Lesson 973 — Nested Subqueries in FROM
Then filter: those aggregates; Lesson 973 — Nested Subqueries in FROM
Then join: the clean, preprocessed results; Lesson 994 — CTEs for Simplifying Complex Joins
Theoretical quantiles: (what we'd expect from a perfect normal distribution) on the x-axis; Lesson 565 — What Q-Q Plots Show: Comparing Residual Distribution to Normal
Theory: Does domain knowledge suggest this predictor matters?; Lesson 625 — Practical Workflow: Testing and Interpreting Predictors
there.
They all still apply: when you add more predictors.; Lesson 601 — Assumptions for Multiple Linear Regression
They generate moments: Taking derivatives at t=0 gives you the "raw moments" of the distribution.; Lesson 150 — Moment Generating Functions
They penalize complexity: Adding unnecessary parameters increases the score; Lesson 781 — Information Criteria: AIC and BIC
They simplify algebra: MGFs make it easier to prove properties about sums of independent random variables (like that the sum of independent Poisson variables is also Poisson).; Lesson 150 — Moment Generating Functions
They uniquely identify distributions: If two random variables have the same MGF, they have the same probability distribution—no other function needed!; Lesson 150 — Moment Generating Functions
They're correlated: and run against large tables (thousands of executions); Lesson 966 — Performance Considerations for WHERE Subqueries
They're reasonable: The conjugate family genuinely captures your prior knowledge; Lesson 1556 — Choosing Between Conjugate and Non-Conjugate Priors
Thin tails: No extreme outliers pulling away; Lesson 377 — Testing Normality: Visual Methods
Think about costs: of acting on this information; Lesson 609 — Practical vs Statistical Significance
Think of it as: Knocking on someone's front door and asking politely for information they're willing to share.; Lesson 21 — APIs and Web Scraping
Think of it like: A visual IQR calculator that also flags unusual values.; Lesson 55 — Visualizing Spread Lesson 567 — Common Q-Q Plot Patterns: Heavy Tails and Light Tails Lesson 1786 — Data Processing Patterns Best Suited for Spark
Thinning: means keeping only every *k*th sample (e.; Lesson 1592 — Burn-in, Thinning, and Convergence Diagnostics
Third batch arrives: Use Beta(17, 25) as prior → and so on.; Lesson 1563 — Sequential Updating with New Data
Third evidence (alibi confirmed): Use 85% as the new prior → posterior drops to 30%.; Lesson 114 — Sequential Updating
Third Normal Form (3NF): eliminates *transitive dependencies*, where a non-key attribute depends on another non-key attribute, which in turn depends on the primary key.; Lesson 1066 — Third Normal Form (3NF)
Third Quartile (Q3): the 75th percentile; Lesson 59 — The Five-Number Summary and Box Plots Lesson 1383 — Understanding the Interquartile Range (IQR)
third variable: here is temperature (or summer season).; Lesson 497 — The Third Variable Problem Lesson 506 — Introduction to Partial Correlation Lesson 1423 — The Third Variable Problem Lesson 1426 — Real-World Examples: Correlation vs Causation
Third-party providers: Companies that sell or license data; Lesson 11 — Data Collection and Acquisition
This is backwards: Lesson 106 — Common Misconceptions About Independence
This is your default: When in doubt, use two-tailed—it's more conservative and widely accepted.; Lesson 350 — Choosing the Right Tail Configuration
This uncertainty matters: When we estimate σ from a small sample, our confidence interval needs to be *wider* to account for the extra uncertainty.; Lesson 268 — Critical Values and the t-Distribution
Thompson Sampling: directly sample from posterior distributions to make allocation decisions—a natural Bayesian approach.; Lesson 1586 — Multi-Armed Bandit Connections
Threaded Scheduler: (default for single machine); Lesson 1795 — Distributed Schedulers and Client Setup
Three columns: with 100 values each → up to 1,000,000 potential groups; Lesson 911 — Performance Considerations with Multiple Groups
Threshold adjustment: Use different decision thresholds for different groups to equalize outcomes.; Lesson 1894 — Auditing and Remediation Strategies
Threshold effects: Variables behave differently above/below a certain value; Lesson 1189 — Detecting Nonlinear Relationships
Tick Marks and Labels: Customize where tick marks appear and what they say using `set_xticks()` and `set_xticklabels()`.; Lesson 1270 — Customizing Axes: Labels, Limits, and Scales
Tick marks or crosses: Often indicate censored observations; Lesson 815 — Survival Curve Plots and Interpretation
Tidy data: is a standardized way of organizing datasets that follows three simple rules:; Lesson 1142 — What is Tidy Data?
tidy data principles: and creates maintenance nightmares.; Lesson 1148 — Handling Multiple Types in One Table Lesson 1151 — Schema Validation
Time: Months (or days) since loan origination; Lesson 840 — Loan Default Timing and Credit Risk Lesson 2102 — Understanding Stakeholder Goals and Constraints
Time and resource limits: "We need an answer in two weeks, even if it's rough"; Lesson 2117 — Defining 'Good Enough' with Stakeholders
Time for Spark: When datasets exceed available RAM or when processing takes hours instead of minutes; Lesson 1783 — Data Size Thresholds: When Pandas Isn't Enough
Time intervals: span durations: "January 2024 to March 2024" or "Q1 2023"; Lesson 19 — Temporal Data and Time Series
Time investment explodes: Simple features took hours; the next marginal improvement requires days of engineering; Lesson 2116 — Diminishing Returns and the 80/20 Rule
Time limitations: Do you have days or months?; Lesson 1169 — Clarifying Assumptions and Constraints
time origin: is your starting line—the moment when the clock begins for each subject.; Lesson 803 — Defining the Event and Time Origin Lesson 835 — Customer Churn Prediction with Survival Analysis
Time periods: Sales in months with different numbers of days; Lesson 692 — Offset Terms for Exposure
Time plot of residuals: Should look randomly scattered around zero with constant variance; Lesson 799 — Fitting and Diagnosing SARIMA Models
Time series comparisons: multiple metrics over the same time period; Lesson 1276 — Sharing Axes Between Subplots
Time Series Data: Measurements taken over time (stock prices, daily temperatures) often show autocorrelation— today's value relates to yesterday's value.; Lesson 381 — Independence Assumption and Its Violations Lesson 548 — Independence of Observations
Time series plot: Should show constant mean and variance over time; Lesson 741 — Testing Stationarity After Transformation
Time since churn: Fresh churners respond better than those gone 6+ months; Lesson 1676 — Win-Back and Retention Strategies
Time trends and seasonality: Lesson 1741 — Controlling for Seasonality and External Factors
Time windows: set boundaries—how far back you look for attributable touchpoints.; Lesson 1639 — Time Windows and Attribution Decay
Time-based rules: Sales of winter coats in July might look like outliers, but they could be legitimate clearance sales or southern hemisphere orders.; Lesson 75 — Domain-Specific Outlier Rules
Time-based variations: Weekly, monthly, quarterly reports; Lesson 1984 — Parameterized Reports
Time-bound: Set a clear horizon (quarterly, annually) so urgency is built in; Lesson 1609 — Setting Effective Objectives Lesson 1610 — Defining Measurable Key Results
Time-Lagged Analysis: Lesson 1602 — Identifying Leading Indicators for Your Metrics
Time-to-conversion: analysis models the journey from first contact (lead acquisition) to purchase, treating non- converters as **censored observations**—they didn't experience the "event" (conversion) during your observation window.; Lesson 839 — Time-to-Conversion in Marketing Funnels
Time-to-event: data; Lesson 828 — Fitting the Cox Model
Time-to-match: How long until a buyer finds a seller; Lesson 1630 — Marketplace Metrics: GMV, Take Rate, and Liquidity
Time-varying covariates: allow your survival model to reflect these dynamic changes.; Lesson 833 — Time-Varying Covariates
Timebox tasks: 2 days EDA, 2 days feature prep, 1 day modeling; Lesson 2113 — Timeboxing and Sprint Planning for Data Projects
Timeboxing: means allocating a fixed duration—say, three days for EDA or one week for initial modeling—and forcing yourself to produce *something* deliverable when time runs out, even if it's imperfect.; Lesson 2113 — Timeboxing and Sprint Planning for Data Projects Lesson 2121 — Timeboxing and Deadlines
Timeline: Prototype in 3 weeks, deploy in 6 weeks; Lesson 1948 — The Recommendation Slide: Making It Actionable Lesson 2103 — Managing Expectations and Defining Success
timeliness: , **validity**, and **uniqueness**.; Lesson 1863 — Data Quality Dimensions Lesson 1867 — Data Profiling and Monitoring Lesson 1869 — Data Quality Metrics and SLAs Lesson 1986 — Automated Report Generation Lesson 2086 — Stage 2: Data Acquisition and Assessment
Timely insights: Market conditions change; delays reduce relevance; Lesson 2120 — The Opportunity Cost of Iteration
Timeout Errors: occur when connections take too long to establish or queries run longer than allowed.; Lesson 1093 — Troubleshooting Connection Issues
TIMESTAMP: Date and time values; Lesson 846 — Tables, Schemas, and Data Types
Timestamps: mark exact moments: "2024-03-15 14:32:05" (year-month-day hour:minute:second); Lesson 19 — Temporal Data and Time Series Lesson 1857 — Logging Best Practices Lesson 1988 — Embedding Data Lineage and Metadata Lesson 2065 — Tracking Data Lineage
Timestamps and Version Fields: Add `created_at` and `updated_at` timestamps to your data.; Lesson 1848 — Designing Idempotent Operations
Too few bins: You lose detail and may miss important patterns; Lesson 1267 — Histograms and Distribution Plots
Too few examples: Training a neural network with 50 samples?; Lesson 2124 — Insufficient or Low-Quality Data
Too large: Each partition takes a long time to process, limiting parallelism.; Lesson 1794 — Working with Partitions
Too many bins: The plot becomes noisy and hard to interpret; Lesson 1267 — Histograms and Distribution Plots
Too many color bins: 3-7 bins is ideal.; Lesson 1309 — Choropleth Maps: Basics and Best Practices
Too narrow: Creates noisy, overfit patterns from random variation; Lesson 1245 — Misleading Aggregations and Binning
Too noisy: Revenue varies wildly day-to-day, drowning out true effects; Lesson 1517 — Surrogate Metrics: When Direct Measurement is Impractical
Too rare: Conversions on high-ticket items are infrequent; Lesson 1517 — Surrogate Metrics: When Direct Measurement is Impractical
too small: .; Lesson 693 — Overdispersion in Count Data Lesson 1794 — Working with Partitions
Too wide: Bins like "0-100" collapse all variation; Lesson 1245 — Misleading Aggregations and Binning
Top layer: "We should launch.; Lesson 1952 — The Pyramid Principle: Leading with Conclusions
Top performers: 90th percentile and above; Lesson 61 — Using Percentiles for Comparison and Benchmarking
Top-of-funnel optimization: Where should you invest to grow your audience?; Lesson 1720 — First-Touch Attribution Model
Total: | **0.; Lesson 93 — Calculating Conditional Probabilities Lesson 444 — The ANOVA Table
Total downloads: (without usage or monetization); Lesson 1612 — What Are Vanity Metrics?
Total registered users: (without knowing active users or retention); Lesson 1612 — What Are Vanity Metrics?
Total Revenue: The sum of all money earned; Lesson 1516 — Business Metrics: Definition and Examples
Trace plots: Visualize the chain over iterations—it should look like random noise around a stable mean, not trending or stuck; Lesson 1592 — Burn-in, Thinning, and Convergence Diagnostics
Track improvements: Overlay cohorts from before and after a product change to see if retention improved.; Lesson 1656 — Visualizing Retention Curves
Track randomization seed: always save the random seed used for reproducibility; Lesson 1492 — Rerandomization and Practical Implementation
Track step repetition frequency: – identify which steps users commonly revisit; Lesson 1683 — Multi-Path and Non-Linear Funnels
Track what you learn: .; Lesson 2143 — Continuous Learning and Skill Development
Tracking Only Lagging Metrics: Lesson 1603 — Common Pitfalls in Indicator Selection
Tracking Pixels: are tiny, invisible images embedded in emails or third-party sites.; Lesson 1713 — Tracking Users by Channel
Trade-off: Slightly lower power (5-15% efficiency loss if data *were* normal), and results describe distributions or medians, not means.; Lesson 475 — Choosing Between Parametric and Non-Parametric Tests Lesson 1767 — Scale-Up vs Scale- Out Architectures
Trade-offs: Lesson 1620 — Single vs Shared Ownership Models
Tradeoffs: Choosing "good enough" over perfection; Lesson 2142 — Interviewing: Technical and Behavioral Prep
Traditional methods: work beautifully when:; Lesson 305 — When to Use Bootstrap vs Traditional Methods
Traffic source: Organic search, paid ads, social media, email, direct; Lesson 1682 — Segmenting Funnels by User Attributes
Traffic volume: is often your biggest limitation.; Lesson 1500 — Practical Considerations and Trade-offs Lesson 1714 — Channel-Level Metrics
Trailing moving averages: (also called "backward-looking") use only past data points.; Lesson 753 — Centered vs Trailing Moving Averages
Train on the rest: Build your ARIMA, Holt-Winters, or other model using only the training portion; Lesson 790 — Out-of-Sample Forecast Evaluation
Trained model files: (.; Lesson 2033 — Git Large File Storage (LFS) for Data Assets
Training set: Data the model learns from; Lesson 14 — Model Evaluation and Validation
Transaction amounts: A \$0.; Lesson 75 — Domain-Specific Outlier Rules
Transform: it on cheaper servers or ETL tools (like Informatica or DataStage); Lesson 1817 — Historical Context: Why ETL Came First
Transform back: to the correlation scale using the inverse transformation; Lesson 503 — Confidence Intervals for Correlation Coefficients
Transform within the warehouse: using SQL-based tools like **dbt** (data build tool); Lesson 1821 — Hybrid Approaches and Modern Data Stacks
Transform your data: to reflect the null hypothesis being true (e.; Lesson 396 — Bootstrap Hypothesis Testing
Transform Your Variables: Lesson 564 — What to Do When Residual Plots Show Problems
Transformation History: What cleaning or calculations were applied; Lesson 1163 — Metadata and Data Dictionaries
Transformation layers: like **dbt** that version-control SQL transformations, run tests, and document data models; Lesson 1821 — Hybrid Approaches and Modern Data Stacks
Transformation logic: Clean, join, aggregate, or enrich data (the "T" in ETL/ELT); Lesson 1822 — What is a Data Pipeline?
Transformations: Has someone already cleaned or filtered it?; Lesson 23 — Data Provenance and Metadata Lesson 1189 — Detecting Nonlinear Relationships Lesson 1344 — Scales and Coordinate Systems Lesson 1774 — What is Apache Spark and Why Use It?Lesson 1780 — Transformations vs Actions in Spark Lesson 1800 — Chunked Reading with read_csv Lesson 1823 — Pipeline Components: Sources, Transformations, Sinks Lesson 2065 — Tracking Data Lineage
Transformations are simpler: Operations like filtering, grouping, and summarizing follow predictable patterns; Lesson 1142 — What is Tidy Data?
Transformed coordinates: apply mathematical transformations to the entire space; Lesson 1344 — Scales and Coordinate Systems
Transforming: using SQL queries within the warehouse itself; Lesson 1816 — What is ELT? Extract, Load, Transform Explained
Transforming features: to meet model assumptions or improve performance: scaling numerical features, encoding categorical variables, handling skewed distributions, or creating polynomial terms.; Lesson 2088 — Stage 4: Feature Engineering and Preparation
Transient: Implement exponential backoff, retry 3-5 times; Lesson 1849 — Transient vs Permanent Failures
Transient failures: typically include:; Lesson 1849 — Transient vs Permanent Failures Lesson 1850 — Retry Strategies
Transitive: `A → B` and `B → C`, so `A → C`; Lesson 1063 — Functional Dependencies
Transitive dependencies: are the hidden culprit: Package A depends on Package B version 2, but Package C needs Package B version 3.; Lesson 2048 — The Dependency Hell Problem
Transparency: Don't hide limitations.; Lesson 1247 — The Ethics of Visualization Design Lesson 1341 — Data and Aesthetic Mappings Lesson 1643 — Building Attribution Frameworks Lesson 1816 — What is ELT? Extract, Load, Transform Explained Lesson 1931 — When to Push Back on Requests Lesson 2029 — Draft Pull Requests and WIP Workflows
Transparency (alpha): prevents overplotting in dense datasets.; Lesson 1265 — Scatter Plots: Relationships Between Variables
Transparency/alpha: Let overlapping points blend, showing density through darker areas; Lesson 1310 — Point Maps and Scatter Plots on Maps
Transportation: Optimizing delivery routes or predicting traffic patterns; Lesson 6 — Common Data Science Applications
Treated: is a binary indicator (1 if unit is in treatment group, 0 if control); Lesson 1455 — DiD with Regression
Treated × Post: is the **interaction term** between the two indicators; Lesson 1455 — DiD with Regression
treatment: (version B—a new feature, design, or intervention), while the other receives the **control** (version A—the current state or baseline).; Lesson 1477 — Core Principles of A/B Testing Lesson 1482 — Control and Treatment Design
Treatment Effect Estimation: calculates the difference in average outcomes between those who received the treatment and those who didn't.; Lesson 1440 — Treatment Effect Estimation
treatment group: (receives the intervention) or a **control group** (does not receive the intervention).; Lesson 1435 — What is a Randomized Controlled Trial?Lesson 1641 — Isolating Effects with Control Groups Lesson 1677 — Measuring Churn Reduction Impact Lesson 1688 — A/B Testing for Conversion Optimization Lesson 1745 — Holdout Groups and Test Design
Treatment group, after intervention: Lesson 1452 — The Difference-in-Differences Setup
Treatment group, before intervention: (baseline); Lesson 1452 — The Difference-in-Differences Setup
Treatment Type: (Drug A vs Drug B) and **Gender** (Male vs Female) on recovery time.; Lesson 653 — Interpreting Categorical × Categorical Interactions
Tree-based models: (decision trees, random forests): These algorithms don't use the same linear framework as regression and can handle all k variables without issues; Lesson 638 — One-Hot Encoding Overview
trend: a general direction the data is moving.; Lesson 706 — Trend: Long-Term Direction Lesson 710 — Additive vs Multiplicative Models Lesson 711 — Visualizing Components with Decomposition Plots Lesson 715 — Visual Tests for Stationarity Lesson 742 — Components of Seasonal Decomposition Lesson 744 — Classical Decomposition Methods Lesson 747 — Interpreting Decomposition Plots Lesson 761 — Double Exponential Smoothing (Holt's Method) (+6 more)
Trend (b₀): Lesson 770 — Initializing Holt-Winters Components
Trend component: (the long-term direction); Lesson 711 — Visualizing Components with Decomposition Plots Lesson 742 — Components of Seasonal Decomposition Lesson 769 — Smoothing Parameters: Alpha, Beta, Gamma
Trend equation: The current rate of change, smoothed over time; Lesson 761 — Double Exponential Smoothing (Holt's Method)Lesson 767 — Holt-Winters Additive Model Lesson 768 — Holt-Winters Multiplicative Model
Trend or pattern: "Sales increased steadily from January to December"; Lesson 1250 — Text Alternatives and Screen Reader Compatibility
Trend Signals: Lesson 1401 — Detecting Out-of-Control Signals
Trends: Are sales climbing over the year?; Lesson 19 — Temporal Data and Time Series Lesson 562 — Index Plots and Time-Ordered Residuals Lesson 760 — Forecasting with Simple Exponential Smoothing Lesson 1183 — Scatter Plots for Two Numeric Variables
Trends in rolling stats: = non-stationary (needs fixing!; Lesson 715 — Visual Tests for Stationarity
Triggers: allow one pipeline to programmatically start another pipeline upon completion.; Lesson 1845 — Cross-Pipeline Dependencies
Trimming whitespace: removes leading and trailing spaces that creep in from manual data entry or faulty exports.; Lesson 1138 — Cleaning and Standardizing Text Fields
Tritanopia: (blue-yellow, rare): difficulty with blue and yellow; Lesson 1248 — Color Blindness and Color Palette Design
Trivial: `StudentID → StudentID` (always true, not useful); Lesson 1063 — Functional Dependencies
Troubleshoot failures: If a downstream task fails, check its upstream dependencies first; Lesson 1841 — Upstream and Downstream Dependencies
true: probability without relying on large-sample approximations.; Lesson 432 — Fisher's Exact Test: The Logic Lesson 871 — NULL Handling with Logical Operators
True metric: Annual subscription renewal rate; Lesson 1517 — Surrogate Metrics: When Direct Measurement is Impractical
True Positives (TP): Correctly identified change-points; Lesson 1418 — Evaluating Change-Point Detection Methods
Truncate trends: End the chart before a reversal occurs; Lesson 1241 — Cherry-Picking Time Ranges
Trust erosion: with stakeholders when they catch problems before you do; Lesson 2136 — Monitoring Gaps and Silent Failures
Trustworthiness: Is this data from a reliable source?; Lesson 23 — Data Provenance and Metadata
Try common encodings explicitly: UTF-8 (most modern), Latin-1 (ISO-8859-1, Western European), or CP1252 (Windows); Lesson 1135 — Detecting and Fixing Encoding Issues
Try d=1: If non-stationary, apply first-order differencing (subtracting each value from the previous one).; Lesson 778 — Determining Differencing Order (d)
Try different combinations: of alpha, beta, and gamma values (typically between 0 and 1); Lesson 772 — Holt-Winters Parameter Optimization
Try multiple reasonable priors: Use informative, weakly informative, and uninformative priors for the same problem; Lesson 1572 — Sensitivity Analysis and Prior Robustness
Tukey's fences: use the IQR to build "boundary lines" beyond which data points are considered outliers.; Lesson 72 — IQR Method and Tukey's Fences
TV(t), Radio(t), Digital(t): are your marketing spend amounts in each channel at time *t*; Lesson 1738 — The Core MMM Regression Model
two categorical variables: (like color preference and age group); Lesson 422 — Introduction to Chi-Squared Test of Independence Lesson 1181 — What is Bivariate Analysis?
Two columns: with 100 values each → up to 10,000 potential groups; Lesson 911 — Performance Considerations with Multiple Groups
two groups: Lesson 361 — Pooled Variance t-Test Lesson 824 — Multiple Group Comparisons
Two numerical variables: Does house size relate to price?; Lesson 1181 — What is Bivariate Analysis?
two-sample t-test: , the process is similar but accounts for both group sizes and their combined variability.; Lesson 343 — Calculating Power for Common Tests Lesson 359 — Two-Sample t-Test Overview Lesson 360 — Independent vs. Dependent Samples Lesson 375 — Paired t-Test vs Two-Sample t-Test
Two-sided: "The parameter is *different* from the null value" (≠); Lesson 308 — Defining the Alternative Hypothesis (H₁ or H ₐ)Lesson 311 — One-Sided vs Two-Sided Alternatives Lesson 345 — Directionality in Hypothesis Testing Lesson 373 — Hypotheses for Paired t- Tests Lesson 401 — Setting Up Hypotheses for Proportions Lesson 1393 — Two-Sided vs One-Sided Grubbs' Test
Two-sided (two-tailed) test: This tests whether the *most extreme value* — either the maximum OR minimum — is an outlier.; Lesson 1393 — Two-Sided vs One-Sided Grubbs' Test
two-sided test: , you calculate the probability in *both* tails (values as extreme or more extreme in either direction).; Lesson 319 — Calculating P-Values from Test Statistics Lesson 325 — The Rejection Region
Two-tailed: H₁: p₁ ≠ p₂ (testing for *any* difference); Lesson 406 — Two-Sample Proportion Test Setup Lesson 433 — Conducting Fisher's Exact Test
Two-tailed test: You care about differences in *either* direction (bigger or smaller).; Lesson 348 — P-Value Calculation Differences Lesson 354 — Setting Up Hypotheses for One-Sample t-Test Lesson 410 — P-Value Calculation and Interpretation Lesson 433 — Conducting Fisher's Exact Test
Type 0 (Fixed): Never update.; Lesson 1809 — Dimension Tables and Slowly Changing Dimensions
Type 1 (Overwrite): Replace the old value with the new one.; Lesson 1809 — Dimension Tables and Slowly Changing Dimensions
Type 3 (Add Column): Store both current and previous values in separate columns (e.; Lesson 1809 — Dimension Tables and Slowly Changing Dimensions
Type I error: occurs when you **reject a true null hypothesis**.; Lesson 330 — Understanding Type I Error (False Positive)Lesson 333 — Consequences of Type I and Type II Errors Lesson 624 — Multiple Testing Considerations
Type I Error (α): appears as the shaded area *under the null curve* that falls into the rejection region.; Lesson 336 — Visualizing Error Types with Sampling Distributions
Type II error: occurs when you **fail to reject a false null hypothesis**.; Lesson 331 — Understanding Type II Error (False Negative)Lesson 333 — Consequences of Type I and Type II Errors
Type II Error (β): appears as the shaded area *under the alternative curve* that falls *outside* the rejection region (where you fail to reject H₀).; Lesson 336 — Visualizing Error Types with Sampling Distributions
Type mismatches: Passing `"hello"` when you expect an integer; Lesson 1109 — Input Validation and Defense in Depth Lesson 1150 — What is Data Validation?
Type of phone: (iPhone vs Android) can correlate with socioeconomic status; Lesson 1883 — Protected Classes and Proxy Variables
Type safety: Your IDE can catch errors before runtime; Lesson 1117 — What is an ORM and Why Use It?
Types of contributions welcome: Documentation fixes?; Lesson 2083 — Contributing Guidelines and Contact Information
Typical pattern: Lesson 1113 — Rolling Back Transactions

U

Uber: Rides completed — directly measures successful matching of drivers and riders.; Lesson 1606 — Examples of North Star Metrics by Industry
unbiased: .; Lesson 255 — Expected Value of Sample Statistics Lesson 521 — Properties of Least Squares Estimators Lesson 552 — Zero Conditional Mean of Errors Lesson 554 — Consequences of Violating Assumptions
Unbounded above: Theoretically no maximum limit (though rare events in practice); Lesson 689 — When to Use Poisson Regression
UNBOUNDED FOLLOWING: End at the very last row of the partition; Lesson 1020 — UNBOUNDED and CURRENT ROW Keywords
UNBOUNDED PRECEDING: Start at the very first row of the partition; Lesson 1020 — UNBOUNDED and CURRENT ROW Keywords
Unbounded Retention: (also called "Return on or After Day N") measures the percentage of users who come back *any time on or after* Day N.; Lesson 1654 — Classic vs Unbounded Retention
Uncertainty is present: The relationship between surrogate and business metric is unproven; Lesson 1522 — Balancing Speed and Accuracy in Metric Selection
Uncertainty Quantification: Lesson 1539 — Interpreting Posterior Probabilities
Under-controlling: Ignoring confounders because they seem unimportant or weren't measured.; Lesson 1476 — Common DAG Patterns and Pitfalls
Under-investing: in channels with high incremental value but lower raw volume; Lesson 1717 — Incrementality and True Channel Impact
Undercoverage: Your sampling frame (the list you sample from) doesn't include part of the population.; Lesson 244 — Selection Bias and Its Causes Lesson 249 — Coverage Error and Undercoverage
Undermining trust: Stakeholders may feel manipulated rather than informed; Lesson 1927 — Separating Analysis from Advocacy
Understand: where your model succeeds and fails; Lesson 542 — Computing Fitted Values and Residuals
Understand complexity: Most conversions aren't one-click decisions; they involve multiple channels and interactions; Lesson 1719 — The Customer Journey and Touchpoints
Understand conditional relationships: When relationships hold under specific circumstances; Lesson 1190 — Introduction to Multivariate Analysis
Understand decision-maker constraints: Your stakeholder might need results before quarterly board meetings, end-of-month planning sessions, or annual budget reviews.; Lesson 2099 — Aligning with Business Timelines and Decision Points
Understand structural changes: in your domain (markets expanding, behaviors shifting); Lesson 706 — Trend: Long-Term Direction
Understanding cardinality: Join tables that produce smaller results first when possible; Lesson 951 — Join Order and Performance
Understanding patterns: Knowing that average temperature is 70°F doesn't tell you if you need both winter coats and shorts; Lesson 46 — What is Variability?
Understanding the business context: What decision will this analysis inform?; Lesson 2085 — Stage 1: Problem Definition and Scoping
Understanding the real world: Shape reveals the story behind your numbers.; Lesson 63 — Understanding Distribution Shape
Undirected graphs: show symmetrical relationships.; Lesson 1316 — Introduction to Network Graphs and Graph Theory Basics
Unequal variances: → Use Welch's t-test; Lesson 383 — Diagnostic Workflow: When to Proceed or Switch Tests Lesson 390 — When Parametric Tests Fail: Violations of Assumptions Lesson 398 — Choosing Between Parametric and Non-Parametric Tests Lesson 461 — Games-Howell Test for Unequal Variances
Unexpected duplicates: The same transaction or observation recorded multiple times; Lesson 1154 — Uniqueness and Duplication Checks
Unexpected paths: (tasks that shouldn't depend on each other); Lesson 1846 — Testing and Validating Dependency Graphs
Unexpected Patterns: Look for broken correlations (height and weight usually relate; if they suddenly don't, check your data), unusual counts (suddenly 200 records instead of the usual 50), or rare category values appearing too frequently.; Lesson 1157 — Statistical Anomaly Detection in QA
unexpected relationships: in your data; Lesson 1181 — What is Bivariate Analysis?Lesson 1192 — Correlation Matrices and Heatmaps
Unicode: is the universal character encoding standard that assigns a unique number to every character across all writing systems.; Lesson 1139 — Dealing with Special Characters and Unicode
Unimodal: (has one peak at the mean); Lesson 169 — The Normal Distribution: Definition and Properties Lesson 1175 — Histograms for Distribution Shape
uninformative prior: that assigns equal probability across all plausible values.; Lesson 1543 — Defining Prior Distributions Lesson 1581 — Setting Priors for A/B Tests
Union ( ): "A **or** B"; Lesson 80 — Set Operations: Union, Intersection, and Complement
Unique identifier validation: Verify that ID columns contain no duplicates.; Lesson 1154 — Uniqueness and Duplication Checks
Uniqueness: Each value in the primary key column must be unique across the entire table.; Lesson 1048 — What Are Primary Keys?Lesson 1863 — Data Quality Dimensions Lesson 1865 — Data Quality Checks in Pipelines
Unit tests for dependencies: Write tests that assert specific relationships exist.; Lesson 1846 — Testing and Validating Dependency Graphs
Unite when: Lesson 1147 — Separating and Uniting Columns
Uniting columns: is the reverse: combining multiple columns into one when they represent a single logical unit.; Lesson 1147 — Separating and Uniting Columns
Units: Currency (USD), measurements (kg, meters), percentages; Lesson 1163 — Metadata and Data Dictionaries Lesson 2064 — Creating Data Dictionaries
UNKNOWN: (represented by NULL).; Lesson 871 — NULL Handling with Logical Operators
Unnatural constraints: Sometimes the conjugate form doesn't match your actual prior knowledge; Lesson 1555 — Advantages and Limitations of Conjugate Priors
Unnecessary legends: Label directly when possible; Lesson 1237 — Chart Junk and Data-Ink Ratio
Unpooled variance: treats each group's variance as unique.; Lesson 285 — Pooled vs Unpooled Variance Approaches
Unreliable forecasts: Predictions become meaningless outside your training period; Lesson 734 — Why Differencing and Detrending Matter
Unreliable predictions: Since the underlying process is changing, our model's parameters—estimated from past data— won't accurately describe future behavior.; Lesson 713 — Why Stationarity Matters
Unrepresentative samples: If your data doesn't reflect the real-world distribution, predictions will fail in production.; Lesson 2124 — Insufficient or Low-Quality Data
Unresolved issues: Tickets closed without satisfaction; Lesson 1673 — Leading Indicators of Churn
Unstable Coefficient Estimates: Lesson 581 — Symptoms of Multicollinearity
Unstable coefficients: Small changes in your data can lead to large swings in the estimated regression coefficients; Lesson 580 — What is Multicollinearity?
Unstructured data: doesn't fit neatly into tables.; Lesson 16 — Structured vs Unstructured Data
Untracked data sources: Multiple teams pull from the same database table, but nobody coordinates when structure or semantics change.; Lesson 2133 — Undocumented Data Dependencies
Untracked files: Files Git doesn't know about yet (never staged or committed).; Lesson 1997 — Viewing Repository State with git status Lesson 1998 — Checking Repository Status
Unused indexes: consume storage and slow down writes (INSERT, UPDATE, DELETE) without providing query benefits.; Lesson 1086 — Index Maintenance and Monitoring
Update: Apply Bayes' theorem to compute the posterior using that data; Lesson 1582 — Updating Beliefs with Test Data
Update Anomalies: Lesson 1062 — Data Anomalies: Insert, Update, Delete
Update complexity: requiring changes in multiple places; Lesson 1071 — When to Denormalize: Performance Trade-offs Lesson 1074 — Duplicating Data Across Tables
UPDATE protection: You cannot change a foreign key to point to a non-existent parent; Lesson 1052 — Foreign Key Constraints
Update visual properties: (e.; Lesson 1302 — Interactive Controls: Dropdown Menus and Buttons
Update with data: from each group separately to get two posterior distributions: one for μ₁ and one for μ₂; Lesson 1570 — Comparing Two Means: Bayesian Approach
Updated beliefs: Compare the posterior to your prior.; Lesson 1547 — Interpreting Posterior Distributions
Updates belief: Strong data can overcome weak priors; strong priors resist contradictory weak data; Lesson 1537 — The Posterior Distribution
Updates segment membership: as customer behavior evolves; Lesson 1710 — Operationalizing Segments: Scoring and Deployment
Updating: existing information; Lesson 844 — What is SQL?Lesson 1124 — Insert, Update, Delete, and Bulk Operations
Upper bound only: Mean + t*(SE); Lesson 275 — One-Sided Confidence Bounds
Upper boundary: Q3 + 1.; Lesson 1384 — The IQR Outlier Detection Rule
Upper Control Limit (UCL): Typically 3 standard deviations above the mean; Lesson 1396 — Introduction to Control Charts Lesson 1397 — Shewhart Control Chart Basics Lesson 1398 — Control Charts for Means (X-bar Charts)
Upper fence: = Q3 + (1.; Lesson 72 — IQR Method and Tukey's Fences Lesson 1385 — Calculating IQR Fences in Practice
Upper threshold (B): Based on acceptable Type I error (α, false positive rate); Lesson 1511 — Sequential Probability Ratio Test (SPRT)
Upserts (Update or Insert): Instead of blindly inserting records, use operations that update existing records if they're already present.; Lesson 1848 — Designing Idempotent Operations
Upstream: `clean_data` and `extract_raw_data` (direct and transitive); Lesson 1841 — Upstream and Downstream Dependencies
Upstream dependencies: are the tasks that must run *before* your current task.; Lesson 1841 — Upstream and Downstream Dependencies
Upward (positive) trend: Values generally increase over time (e.; Lesson 706 — Trend: Long-Term Direction
Upward or downward slope: Warning—variance is changing systematically as fitted values increase; Lesson 560 — Scale-Location Plot (Spread-Location Plot)
Usage: How to run scripts, notebooks, or generate reports; Lesson 2077 — The Purpose and Anatomy of a Good README
Use ±2: when missing real anomalies is costly (e.; Lesson 1378 — Setting Z-Score Thresholds
Use ±3: when false positives are costly (e.; Lesson 1378 — Setting Z-Score Thresholds
Use a random mechanism: to select your sample (random number generator, lottery-style draw); Lesson 234 — Simple Random Sampling
Use accessible uncertainty language: .; Lesson 1928 — Communicating Uncertainty Honestly
Use active voice: "We tested three models" beats "Three models were tested"; Lesson 1967 — Writing Clear and Concise Analysis Sections
Use additive when: Lesson 766 — Additive vs Multiplicative Seasonality
Use asymptotic p-values when: Lesson 322 — Exact vs Asymptotic P-Values
Use binomial logic: Under H₀, positive and negative signs are equally likely (p = 0.; Lesson 391 — The Sign Test for Medians
Use Binomial when: Lesson 146 — When to Use Poisson vs Other Distributions
Use blocking first: rerandomization works best *after* applying stratification—it fine-tunes balance within strata; Lesson 1492 — Rerandomization and Practical Implementation
Use case: Three or more related groups (repeated measures); Lesson 474 — Friedman Test: Non-Parametric Repeated Measures ANOVA Lesson 1437 — Randomization Mechanisms
Use CASE when: You need inline conditional logic for 3-10 possible outcomes within a query.; Lesson 1037 — CASE Best Practices and Performance
Use charset detection libraries: that analyze byte patterns to suggest likely encodings; Lesson 1135 — Detecting and Fixing Encoding Issues
Use colorblind-friendly palettes: Tools like ColorBrewer, Viridis, and palette simulators help you test combinations.; Lesson 1248 — Color Blindness and Color Palette Design
Use concrete examples: Instead of explaining regularization abstractly, say "prevents the model from memorizing noise in the training data"; Lesson 2105 — Translating Between Technical and Business Language
Use concrete units: Always include what you're measuring ("dollars," "pounds," "hours"); Lesson 530 — Communicating Results to Non-Technical Audiences
Use configuration files: Create a `config.; Lesson 2070 — Separating Data from Code
Use consistent formatting: APA, IEEE, or your organization's standard; Lesson 1972 — Citations and References in Data Science Reports
Use custom values: (like ±2.; Lesson 1378 — Setting Z-Score Thresholds
Use descriptive, hierarchical patterns: Lesson 2073 — Naming Conventions for Files and Functions
Use exact p-values when: Lesson 322 — Exact vs Asymptotic P-Values
Use Exact Versions: Lesson 2046 — Best Practices for Environment Management in Teams
Use explicit JOIN syntax: with `ON` clauses instead of comma-separated table lists; Lesson 955 — Avoiding Cartesian Products
Use Fisher's Exact Test: as an alternative (for 2×2 tables); Lesson 426 — Assumptions and Sample Size Requirements
Use Geometric when: Lesson 137 — Geometric vs Negative Binomial: Key Differences Lesson 146 — When to Use Poisson vs Other Distributions
Use informative priors when: Lesson 1544 — Informative vs Uninformative Priors
Use merge when: Lesson 2014 — Understanding Git Rebase vs Merge
Use multiple channels strategically: Lesson 2104 — Communication Cadence and Updates
Use multiplicative when: Lesson 766 — Additive vs Multiplicative Seasonality
Use Negative Binomial when: Lesson 137 — Geometric vs Negative Binomial: Key Differences Lesson 146 — When to Use Poisson vs Other Distributions
Use OO interface: for production code, complex layouts, multiple subplots, or when functions need to accept specific axes to plot on; Lesson 1256 — Two Interfaces: pyplot vs Object-Oriented
Use Paired t-Test when: Lesson 375 — Paired t-Test vs Two-Sample t-Test
Use percentiles when: Lesson 62 — Percentiles vs Z-Scores: Complementary Position Measures
Use plain language: Avoid jargon like "feature importance" or "p-values.; Lesson 1944 — Executive Summary Best Practices
Use Poisson when: Lesson 146 — When to Use Poisson vs Other Distributions
Use pooled variance when: Lesson 285 — Pooled vs Unpooled Variance Approaches
Use pyplot: for quick exploratory visualizations and simple single plots; Lesson 1256 — Two Interfaces: pyplot vs Object-Oriented
Use Python when: Lesson 1375 — Choosing Tools: When to Use R vs Python for Visualization
Use rank tests: when: data are skewed, outliers present, small samples where you can't verify normality, or you care about distribution shifts beyond just means; Lesson 397 — Power and Efficiency of Non-Parametric Tests
Use rebase when: Lesson 2014 — Understanding Git Rebase vs Merge
Use relative paths: `data/raw/sales.; Lesson 2070 — Separating Data from Code
Use Robust Methods: Lesson 564 — What to Do When Residual Plots Show Problems
Use sequential testing methods: specifically designed for interim analysis (like Group Sequential Testing or Always-Valid Inference from earlier lessons); Lesson 1523 — Peeking at Results Early
Use standardized coefficients when: Lesson 528 — Standardized vs Unstandardized Coefficients
Use stratified sampling: When you know certain groups are underrepresented, deliberately sample more from those groups to balance things out.; Lesson 250 — Strategies for Bias Detection and Mitigation
Use t: if you must *estimate* σ from your sample (using sample standard deviation s) — almost always the case; Lesson 272 — When to Use Z vs t
Use t-tests: when: data are approximately normal, moderate sample sizes, you want maximum power from clean data; Lesson 397 — Power and Efficiency of Non-Parametric Tests
Use table aliases carefully: ensure your `ON` clause references columns from *both* tables, not just one; Lesson 955 — Avoiding Cartesian Products
Use the appropriate test: for your data structure; Lesson 368 — Common Pitfalls and Best Practices
Use the bootstrap distribution: to build a confidence interval (percentile method, BCa, etc.; Lesson 306 — Bootstrap for Non-Standard Problems
Use the CDF: to find the area beyond your test statistic; Lesson 319 — Calculating P-Values from Test Statistics
Use Two-Sample t-Test when: Lesson 375 — Paired t-Test vs Two-Sample t-Test
Use uninformative priors when: Lesson 1544 — Informative vs Uninformative Priors
Use unpooled variance when: Lesson 285 — Pooled vs Unpooled Variance Approaches
Use unstandardized coefficients when: Lesson 528 — Standardized vs Unstandardized Coefficients
Use when: Your research question is "Is there a difference?; Lesson 345 — Directionality in Hypothesis Testing Lesson 475 — Choosing Between Parametric and Non- Parametric Tests Lesson 2026 — Merge Strategies: Merge vs Squash vs Rebase
Use WHERE subqueries: When filtering data, subqueries in WHERE typically outperform SELECT subqueries; Lesson 969 — Performance Considerations for SELECT Subqueries
Use z: if you *know* the population standard deviation (σ) — rare in real life; Lesson 272 — When to Use Z vs t
Use z-scores when: Lesson 62 — Percentiles vs Z-Scores: Complementary Position Measures
User behavior shifts: People interact with systems differently over time; Lesson 15 — Deployment, Monitoring, and Iteration
User confusion: about what to do next; Lesson 1681 — Time-Based Funnel Analysis
User demographics: Age group, gender, language preference; Lesson 1682 — Segmenting Funnels by User Attributes
User Engagement: Lesson 908 — Multi-Level Grouping in Business Analytics
User experience consistency: Randomizing by session means the same user might see different versions on different visits, creating confusion.; Lesson 1481 — Unit of Randomization
User identifiers: to stitch touchpoints together into coherent journeys; Lesson 1719 — The Customer Journey and Touchpoints
User input matters: (filtering by date range, region, or product category); Lesson 1330 — Introduction to Interactive Dashboards
User support: Answering questions about metrics and functionality; Lesson 1979 — Maintenance and Sustainability Considerations
User/Customer: Each individual person gets one experience; Lesson 1481 — Unit of Randomization
Uses the one-sample t-test: on the differences (simpler than two-sample methods); Lesson 370 — Differences as the Unit of Analysis
USING: only works when column names match exactly; Lesson 953 — Join Conditions: ON vs USING
Using linear regression: where residuals should be approximately normal; Lesson 202 — Why Test for Normality?
Using scipy: Lesson 569 — Creating Q-Q Plots: Tools in Python and R
Using specific columns: Select only needed columns to reduce memory overhead; Lesson 951 — Join Order and Performance
Using statsmodels: Lesson 569 — Creating Q-Q Plots: Tools in Python and R
UTC (Coordinated Universal Time): is the universal baseline—think of it as the "source of truth" for time.; Lesson 1042 — Working with Timestamps and Time Zones
UTM Parameters: are tags appended to URLs that capture campaign details.; Lesson 1713 — Tracking Users by Channel

V

Vague: "Our website isn't doing well"; Lesson 10 — Problem Definition and Scoping Lesson 2093 — Translating Business Questions into Analytical Questions Lesson 2094 — Defining Success Metrics Upfront
Vague observation: "Customer behavior looks different between segments.; Lesson 1200 — Formulating Specific, Testable Hypotheses
Vague or Undefined Parameters: Lesson 313 — Common Pitfalls in Hypothesis Formulation
Valid partition: Lesson 83 — Partitions of the Sample Space
Valid Range: Min/max values, allowed categories, or regex patterns; Lesson 1163 — Metadata and Data Dictionaries
validate: that your leading indicator actually predicts the outcome you care about.; Lesson 1603 — Common Pitfalls in Indicator Selection Lesson 1692 — Statistical Significance and Iteration
Validate assumptions early: Does your preliminary analysis match stakeholder intuition?; Lesson 2111 — Fast Feedback Loops with Stakeholders
Validate with cross-validation: Ensure the model generalizes to unseen data; Lesson 633 — Practical Model Selection Strategy
Validate with stakeholders: Product, engineering, and analytics teams must agree on definitions.; Lesson 1679 — Defining Funnel Steps and Events
Validated: against incoming data batches in your pipeline; Lesson 1868 — Great Expectations Framework
Validating accuracy: Checking that values make sense—for example, ensuring ages aren't negative or dates aren't in the future.; Lesson 12 — Data Cleaning and Preparation
Validating updates: Changes to foreign key values are checked against the parent table; Lesson 1055 — What is Referential Integrity?
Validation: Test on held-out data or different time periods; Lesson 1204 — From Hypothesis to Analysis Plan
Validation becomes complex: What metrics indicate your retrained model is "good"?; Lesson 2128 — Data Distribution Shifts Frequently
Validation set: Data you use to check performance during development; Lesson 14 — Model Evaluation and Validation
Validation utilities: (`utils/validation.; Lesson 2075 — Utility Modules and Helper Functions
validity: , and **uniqueness**.; Lesson 1863 — Data Quality Dimensions Lesson 1865 — Data Quality Checks in Pipelines
value: of the ordering column, not physical position.; Lesson 1015 — ROWS vs RANGE Frame Specifications Lesson 1701 — What is Customer Segmentation?Lesson 1762 — Extended Dimensions: Veracity and Value
Value constraints: Non-null requirements, allowed categories; Lesson 1151 — Schema Validation
Values near zero: suggest little to no linear relationship at that lag; Lesson 720 — The Autocorrelation Function (ACF)
Vanity metrics: are measurements that appear impressive at first glance—often large, growing numbers—but don't connect to actionable business outcomes or inform strategic decisions.; Lesson 1612 — What Are Vanity Metrics?Lesson 1614 — Growth Without Retention
Var(X) = (1-p)/p²: Lesson 151 — Expected Value and Variance for Common Distributions
Var(X) = np(1-p): Lesson 151 — Expected Value and Variance for Common Distributions
Var(X) = p(1-p): Lesson 151 — Expected Value and Variance for Common Distributions
Var(X) = r(1-p)/p²: Lesson 136 — Expectation and Variance of the Negative Binomial
Var(X) = λ: (variance); Lesson 141 — Mean and Variance of Poisson Distribution Lesson 151 — Expected Value and Variance for Common Distributions
VARCHAR: or **TEXT**: Text strings (e.; Lesson 846 — Tables, Schemas, and Data Types
variability: (how much data points differ from each other), let's start with the simplest way to measure it: **range**.; Lesson 47 — Range: The Simplest Measure Lesson 294 — Margin of Error and Its Components Lesson 296 — Sample Size for Comparing Two Groups
Variability in the data: More spread (higher standard deviation) → larger standard error → larger margin of error.; Lesson 271 — Margin of Error
Variable distributions: Shape and spread along the diagonal; Lesson 1191 — Scatter Plot Matrices and Pairplots
Variable Name: The exact column name as it appears in your data; Lesson 2064 — Creating Data Dictionaries
Variables: ".; Lesson 1250 — Text Alternatives and Screen Reader Compatibility
Variables to exclude: "Drop `user_id` (high cardinality, no predictive value)"; Lesson 1212 — EDA Summary Documentation and Next Steps
variance: (which squares deviations) and **standard deviation** (which takes the square root of variance), MAD works directly with the actual distances.; Lesson 52 — Mean Absolute Deviation (MAD)Lesson 54 — When to Use Each Measure Lesson 122 — Variance and Standard Deviation of Discrete Random Variables Lesson 125 — Bernoulli Mean and Variance Lesson 129 — Binomial Mean and Variance Lesson 133 — Expectation and Variance of the Geometric Distribution Lesson 136 — Expectation and Variance of the Negative Binomial Lesson 141 — Mean and Variance of Poisson Distribution (+6 more)
Variance = 1/λ²: The spread is the square of the mean; Lesson 166 — Exponential Distribution: Mean and Variance
Variance inequality: Two-sample t-tests are more sensitive to unequal variances when sample sizes differ between groups.; Lesson 382 — Robustness of t-Tests to Assumption Violations
Variance Inflation Factor (VIF): quantifies this problem by measuring how much the variance of a coefficient estimate is "inflated" due to correlation with other predictors.; Lesson 582 — Variance Inflation Factor (VIF)
Variance inspection: Directly compare the empirical variance and mean of your count variable across groups.; Lesson 693 — Overdispersion in Count Data
Variance of X: Lesson 180 — Parameters and Moments of the Log-Normal Lesson 519 — Computing β₁: The Slope Estimate
Variety: captures the diversity of data types and sources.; Lesson 1760 — Defining Big Data: The Three Vs
Vector formats: (like PDF, SVG, EPS) store mathematical descriptions of shapes.; Lesson 1273 — Saving Figures: Formats and Resolution
Vectorized operations: Modern CPUs process columns of uniform data types far faster than mixed-type rows.; Lesson 1811 — Columnar Storage and Query Optimization
Velocity: describes the speed at which data arrives and must be processed.; Lesson 1760 — Defining Big Data: The Three Vs
Verdict: We either reject innocence (guilty) or fail to reject it (not guilty — notice we don't say "innocent"); Lesson 312 — Hypothesis Testing as a Legal Analogy
Verifiable: – Anyone can check if it was achieved; Lesson 1610 — Defining Measurable Key Results
verify: using the multiplication rule: P(A and B) = P(A) × P(B); Lesson 106 — Common Misconceptions About Independence Lesson 259 — Simulating Sampling Distributions Lesson 542 — Computing Fitted Values and Residuals Lesson 741 — Testing Stationarity After Transformation
Verify balance: across both stratification variables and other covariates; Lesson 1489 — Stratified Randomization Fundamentals
Verify basics: Can you ping the database host?; Lesson 1093 — Troubleshooting Connection Issues
Verify data collection: Is this data being captured at all?; Lesson 2098 — Identifying Data Availability Gaps Early
Verify independence: review your sampling method and data collection; Lesson 290 — Assumptions and Diagnostics for Difference Intervals
Verify residuals: Check that the sum of residuals equals zero (or very close); Lesson 522 — Implementing Least Squares from Scratch
Verify the value: against source data—is it a recording error?; Lesson 1209 — Outlier Detection and Investigation
Verify your configuration: Lesson 1991 — Installing Git and Initial Configuration
Version: If the dataset has explicit versioning (like "v2.; Lesson 2063 — Essential Metadata to Capture
Version control: for tracking code changes over time; Lesson 29 — Code and Environment Management
Version control it: alongside your report code; Lesson 1987 — Environment and Dependency Management
Version drift: means that installing "the latest" packages today gives you a different environment than "the latest" six months ago, breaking reproducibility even when you follow the same steps.; Lesson 2048 — The Dependency Hell Problem
Version your data: Track which dataset version you used; Lesson 30 — The Reproducibility Crisis and Solutions
Version-controlled code: that documents every transformation; Lesson 1981 — What Makes a Report Reproducible?
Vertical bars: Each bar represents the correlation at a specific lag; Lesson 722 — ACF Plots and Interpretation
Vertical patterns: All cohorts struggling at the same time period (e.; Lesson 1649 — Visualizing Cohort Data with Heatmaps
Vertical scaling (scale-up): means upgrading to a more powerful single machine—more RAM, more CPU cores, faster disks.; Lesson 1767 — Scale-Up vs Scale-Out Architectures
View the reflog: `git reflog` shows recent `HEAD` movements with timestamps and commit hashes; Lesson 2021 — Recovering from Rebase Mistakes
VIF = 1: No correlation with other predictors (ideal); Lesson 582 — Variance Inflation Factor (VIF)
VIF = 1–5: Moderate correlation (usually acceptable); Lesson 582 — Variance Inflation Factor (VIF)
VIF = 5–10: High correlation (concerning, investigate further); Lesson 582 — Variance Inflation Factor (VIF)
VIF > 10: Severe multicollinearity (action needed); Lesson 582 — Variance Inflation Factor (VIF)
VIF-guided removal: Remove the predictor with highest VIF, recalculate, repeat; Lesson 585 — Remedies: Variable Selection
Violate anti-discrimination laws: (e.; Lesson 1888 — Protected Classes and Sensitive Attributes
Violates ethical standards: (e.; Lesson 2107 — Saying No and Pushing Back Constructively
Violation examples: Lesson 448 — Independence of Observations
violin plot: combines a boxplot with a smoothed density curve mirrored on both sides.; Lesson 55 — Visualizing Spread Lesson 1268 — Box Plots and Violin Plots Lesson 1286 — Violin Plots and Distribution Shape
Violin plots: go further by showing the **full probability density** of the data.; Lesson 1223 — Box Plots and Violin Plots Lesson 1268 — Box Plots and Violin Plots
Virality Coefficient (k): = Invites Sent per User × Conversion Rate; Lesson 1631 — Social Media Metrics: DAU/MAU and Content Engagement
Viridis palettes: are perceptually uniform and colorblind-friendly:; Lesson 1368 — Color Scales and Palettes
Visual check: Plot the series after each differencing step.; Lesson 778 — Determining Differencing Order (d)Lesson 1456 — Testing Parallel Trends
Visual checks: Lesson 217 — Evaluating Transformation Effectiveness
Visual Diagnostics: Histograms or density plots overlaying treatment and control distributions make imbalances immediately visible.; Lesson 1491 — Covariate Balance and Diagnostics
Visual inspection: Plot a histogram.; Lesson 193 — Choosing Between Distributions in Practice Lesson 734 — Why Differencing and Detrending Matter Lesson 1209 — Outlier Detection and Investigation
Visual inspection first: Does your plot show obvious trend or changing variance?; Lesson 718 — Interpreting Stationarity Test Results
Visual methods: (histograms, density plots, Q-Q plots) give you the *intuitive picture*.; Lesson 210 — Combining Visual and Statistical Methods Lesson 377 — Testing Normality: Visual Methods
Visual proof: A chart that makes the trend immediately visible; Lesson 1946 — Supporting Your Claims with Evidence
Visual separation: of confidence bands (non-overlapping suggests real differences); Lesson 817 — Comparing Multiple Survival Curves
Visual storytelling: Plots, dashboards, or interactive demos; Lesson 2141 — Building a Portfolio and Personal Brand
visualization: and **description**.; Lesson 817 — Comparing Multiple Survival Curves Lesson 1656 — Visualizing Retention Curves
Visualization decisions: (bar charts vs.; Lesson 18 — Numerical Variables: Discrete and Continuous
Visualization tools work smoothly: Libraries like Pandas and plotting tools expect tidy structure; Lesson 1142 — What is Tidy Data?
Visualizations: Use bar charts comparing segment characteristics side-by-side, box plots showing distributions of key metrics within segments, or radar charts displaying multiple dimensions simultaneously.; Lesson 1709 — Segment Profiling and Interpretation
Visualizations over tables: charts speak louder than numbers; Lesson 2091 — Stage 7: Communication and Handoff
Visualize: what "sampling variability" really means; Lesson 259 — Simulating Sampling Distributions
Visualize demographics: Plot key characteristics of your sample against the population.; Lesson 250 — Strategies for Bias Detection and Mitigation
Vital Interests: Lesson 1906 — Legal Bases for Processing Personal Data
Volume: refers to the sheer scale of data.; Lesson 1760 — Defining Big Data: The Three Vs Lesson 2086 — Stage 2: Data Acquisition and Assessment
Volume spike: Multiple complaints in a short window; Lesson 1673 — Leading Indicators of Churn
Volume/Cubes: (e.; Lesson 1232 — Perceptual Accuracy Hierarchy
Voluntariness: Lesson 1913 — Elements of Valid Consent
Voluntary: No coercion or pressure.; Lesson 1912 — What is Informed Consent in Data Science?
Voluntary churn: happens when customers actively choose to leave.; Lesson 1670 — What is Churn and Why It Matters
Volunteer bias: (also called **self-selection bias**) occurs when people choose whether or not to participate in a study, and those who volunteer differ in important ways from those who don't.; Lesson 246 — Volunteer and Self-Selection Bias
VP and above: Org-wide vision, resource allocation, executive influence; Lesson 2140 — Individual Contributor vs Management Tracks
Vulnerability: Over-reliance on power users means losing a few hurts badly; Lesson 1698 — Power User Curves and Engagement Distribution

W

W-shaped attribution model: recognizes that not all touchpoints are equally important.; Lesson 1730 — W-Shaped Attribution Model
WAIC: (Widely Applicable Information Criterion) or **LOO** (Leave-One-Out cross-validation) to compare them:; Lesson 1596 — Posterior Predictive Checks and Model Comparison
Wait for external conditions: before proceeding; Lesson 1836 — Task Dependencies and Flow Control
Wald Test: Lesson 830 — Testing Coefficient Significance
Wald tests: with **z-statistics** (because we're using maximum likelihood estimation, not least squares).; Lesson 683 — Hypothesis Tests for Individual Coefficients
Warning signs: before model fitting goes wrong; Lesson 584 — Correlation Matrices for Predictors
Warning/Email: Elevated error rate, slower performance, approaching thresholds—investigate during business hours; Lesson 1858 — Alerting Strategies
Warranty planning: Understanding failure patterns helps set optimal warranty periods; Lesson 188 — Weibull Distribution: Hazard Function and Reliability
Wasted computational resources: on variables that don't add value; Lesson 1197 — Identifying Variable Importance and Redundancy
Wasted effort: Including redundant features adds complexity without improving predictions; Lesson 513 — Applications: Feature Selection and Multicollinearity
Wasted resources: on flawed approaches; Lesson 34 — Recognizing Boundaries of Competence Lesson 1518 — The Relationship Between Surrogate and Business Metrics
Wasted space: Many columns contain `NULL` for half the rows; Lesson 1148 — Handling Multiple Types in One Table
Watch for hesitation: If someone pauses, squints, or re-reads labels, you've found friction.; Lesson 1964 — Testing Visualizations with Audiences
Watch Time: (or Listen Time): Total hours users spend consuming content.; Lesson 1635 — Media and Content Metrics: Watch Time and Content Performance
WCAG 2.1 Level AA: compliance.; Lesson 1254 — Testing Visualizations for Accessibility
Weak: "Improve customer satisfaction"; Lesson 1610 — Defining Measurable Key Results
Weak or no relationships: appear as values near 0, suggesting variables are independent of each other.; Lesson 511 — Reading and Interpreting Correlation Matrices
Weakly Informative Prior: Use `Beta(2, 20)` or similar if you expect roughly 10% conversion but aren't certain.; Lesson 1581 — Setting Priors for A/B Tests
Weakly informative priors: gently guide the analysis away from unrealistic values (like 99% conversion) without imposing strong opinions.; Lesson 1534 — The Prior Distribution Lesson 1559 — Uninformative and Weakly Informative Priors Lesson 1565 — Prior Distributions for Normal Means
Wealth distribution: A few people hold most wealth; Lesson 190 — The Pareto Distribution: Heavy Tails and Power Laws Lesson 191 — Pareto Principle and the 80/20 Rule
Weaponization: A facial recognition system built for user authentication could be repurposed for mass surveillance or stalking.; Lesson 1920 — Anticipating Misuse of Data Products
Wear-out failures: (aging).; Lesson 189 — Fitting Weibull Models to Lifetime Data
Web Mercator: What you see in Google Maps and most web applications; Lesson 1308 — Geographic Data Types and Coordinate Systems
Web scraping: Extracting information from web pages; Lesson 11 — Data Collection and Acquisition Lesson 21 — APIs and Web Scraping
Web sources: Websites, social media, online reviews; Lesson 11 — Data Collection and Acquisition
Web traffic: Marketing campaigns or service outages; Lesson 1412 — What is Change-Point Detection?
Web traffic analysis: Detecting unusual spikes beyond typical weekday/weekend patterns or holiday seasons; Lesson 1411 — Applications and Limitations
Web UI: Visual dashboard for monitoring pipelines; Lesson 1833 — Introduction to Apache Airflow
Website Traffic: Your blog gets an average of 8 visits per hour.; Lesson 144 — Poisson Applications: Arrivals and Events Lesson 190 — The Pareto Distribution: Heavy Tails and Power Laws Lesson 191 — Pareto Principle and the 80/20 Rule Lesson 421 — Applications: Uniform, Genetic Ratios, and Distributions Lesson 746 — Choosing Seasonal Period
Website traffic and sales: Marketing spend might drive both independently; Lesson 1423 — The Third Variable Problem Lesson 1424 — Reverse Causality
Week 4 retention: for January cohort: 45%; Lesson 1650 — Comparing Cohorts Over Time
Weekly cycles: Retail sales peak on weekends, drop on Mondays; Lesson 707 — Seasonality: Regular Periodic Patterns Lesson 1484 — Duration and Timing Considerations
Weibull: extends exponential by allowing failure rates to change over time (shape parameter).; Lesson 193 — Choosing Between Distributions in Practice
Weight by predictive power: use churn models or LTV correlations to guide weights; Lesson 1699 — Engagement Scoring Systems
Weight your data: If you can't get a perfect sample, assign weights to underrepresented groups so they count more in your analysis—this mathematically corrects for imbalance.; Lesson 250 — Strategies for Bias Detection and Mitigation
weighted average: of past observations, where recent values matter more than older ones.; Lesson 757 — Introduction to Exponential Smoothing Lesson 1566 — Conjugate Normal-Normal Model
Welch's: approach.; Lesson 364 — Degrees of Freedom in Two-Sample Tests
Welch's ANOVA: Handles unequal variances without requiring transformations; Lesson 470 — When Parametric ANOVA Assumptions Fail
Welch's t-test: method and doesn't assume equal variances.; Lesson 285 — Pooled vs Unpooled Variance Approaches Lesson 362 — Welch's t-Test for Unequal Variances Lesson 363 — Testing Equality of Variances Lesson 379 — The Assumption of Equal Variances (Homoscedasticity)Lesson 380 — Testing Equal Variances: Levene's and Bartlett's Tests
what: R-squared is and **how** to calculate it, the critical question becomes: *what does the number actually mean?; Lesson 533 — Interpreting R-Squared Values Lesson 1346 — The Grammar vs Traditional Plotting Lesson 1830 — Documentation and Metadata Management Lesson 1861 — Monitoring Tools and Dashboards Lesson 1948 — The Recommendation Slide: Making It Actionable Lesson 2023 — Creating a Pull Request
What automated decisions: involve their data (if any); Lesson 1908 — Data Subject Access Requests (DSARs)
what changed: , **why you changed it**, and **what assumptions you made** at each processing step.; Lesson 1162 — Documenting Transformations Lesson 1955 — Framing Insights in Business Language
What data: you hold about them (copy of all personal data); Lesson 1908 — Data Subject Access Requests (DSARs)
What did you find: State the key insight in one clear sentence.; Lesson 1944 — Executive Summary Best Practices
What follow-up analyses: you'll run based on different outcomes; Lesson 1204 — From Hypothesis to Analysis Plan
what happened: (facts) from **the context** (dimensions).; Lesson 956 — Star Schema Joins Lesson 1675 — Churn Attribution and Root Cause Analysis
What it means: Your residuals have more extreme values (outliers) than a normal distribution would predict.; Lesson 567 — Common Q-Q Plot Patterns: Heavy Tails and Light Tails
What metric defines success: (e.; Lesson 1167 — Identifying Success Criteria
What should we do: Give the top 1–3 recommendations.; Lesson 1944 — Executive Summary Best Practices Lesson 1955 — Framing Insights in Business Language
What they actually test: Whether two groups have **identical distributions**.; Lesson 394 — Interpreting Rank-Based Tests: Medians vs Distributions
What this means: Hat values range from `1/n` to 1.; Lesson 573 — Calculating and Interpreting Hat Values
What threshold constitutes improvement: (e.; Lesson 1167 — Identifying Success Criteria
What timeframe matters: (e.; Lesson 1167 — Identifying Success Criteria
What to do: (recommendation); Lesson 1952 — The Pyramid Principle: Leading with Conclusions
What validation approach: you'll apply; Lesson 1204 — From Hypothesis to Analysis Plan
What variables: you'll analyze; Lesson 1204 — From Hypothesis to Analysis Plan
What would constitute evidence: for or against your hypothesis; Lesson 1204 — From Hypothesis to Analysis Plan
What's the impact: Quantify the business outcome.; Lesson 1944 — Executive Summary Best Practices
WhatsApp: Number of messages sent — directly measures the utility users get from communication.; Lesson 1606 — Examples of North Star Metrics by Industry
when: it happened.; Lesson 19 — Temporal Data and Time Series Lesson 838 — Subscription and Membership Duration Modeling Lesson 840 — Loan Default Timing and Credit Risk Lesson 841 — Campaign Response Time Analysis Lesson 1111 — Autocommit Mode vs Explicit Transactions Lesson 1162 — Documenting Transformations Lesson 1850 — Retry Strategies Lesson 1948 — The Recommendation Slide: Making It Actionable
When differences emerge: (curves may start together then diverge); Lesson 817 — Comparing Multiple Survival Curves
When duplicates are meaningful: Combining sales records, event logs, or time-series data where each row represents a distinct occurrence; Lesson 1000 — UNION ALL: Preserving Duplicates
When satisfied: Your β₀ and β₁ estimates are **unbiased**—on average, they hit the true population values.; Lesson 552 — Zero Conditional Mean of Errors
When to pin exactly: Lesson 2050 — Pinning Versions vs Flexible Ranges
When to shift focus: Once flattened, optimize retention earlier in the curve rather than fighting churn at the tail; Lesson 1658 — Flattening and Asymptotic Behavior
When to use: Simple tabular data, easy human readability, compatibility with almost any tool.; Lesson 22 — File Formats: CSV, JSON, and Beyond Lesson 453 — Transformations to Meet Assumptions Lesson 1645 — Types of Cohorts: Acquisition vs Behavioral Lesson 1828 — Incremental vs Full Load Strategies
When to use it: Lesson 44 — Geometric and Harmonic Means
When to use ranges: Lesson 2050 — Pinning Versions vs Flexible Ranges
When to use which: Report eta-squared for descriptive purposes with your current sample; use omega-squared when making inferences about population-level effects.; Lesson 445 — Effect Size: Eta-Squared and Omega-Squared
When violated: Predictions are systematically wrong at certain X ranges; coefficient estimates are misleading.; Lesson 552 — Zero Conditional Mean of Errors
When you reject H₀: Lesson 356 — Making Decisions and Stating Conclusions
where: you are along the X-axis.; Lesson 659 — Interpreting Polynomial Regression Coefficients Lesson 896 — GROUP BY Execution Order Lesson 898 — HAVING Clause Fundamentals Lesson 899 — HAVING vs WHERE: Key Differences Lesson 903 — Combining WHERE and HAVING Lesson 912 — Fundamental Difference: Filter Timing Lesson 1908 — Data Subject Access Requests (DSARs)Lesson 2137 — Refactoring Strategies and Debt Paydown
WHERE filters first: It eliminates individual rows from the raw table before any grouping or aggregation happens; Lesson 915 — Combining WHERE and HAVING
Where the center is: (median); Lesson 59 — The Five-Number Summary and Box Plots
Which columns: have missing values?; Lesson 1207 — Missing Data Assessment and Strategy
Which specific pairs: are problematic; Lesson 584 — Correlation Matrices for Predictors
White noise: is purely random data with no temporal structure.; Lesson 724 — ACF Patterns for Different Processes Lesson 786 — ACF and PACF of Residuals Lesson 799 — Fitting and Diagnosing SARIMA Models
Who: you've shared it with (recipients or categories); Lesson 1908 — Data Subject Access Requests (DSARs)Lesson 1948 — The Recommendation Slide: Making It Actionable
Who bears the cost: Sometimes the aggregate accuracy loss is small, but one subgroup's performance drops significantly.; Lesson 1891 — Fairness-Accuracy Tradeoffs
Who drives value: Are 20% of users responsible for 70% of activity?; Lesson 1698 — Power User Curves and Engagement Distribution
Who might challenge this: Peers and auditors need enough detail to validate your rigor.; Lesson 1947 — Handling Methodology and Technical Details
Why: The chi-squared distribution is a *approximation* that only works well when expected counts are sufficiently large.; Lesson 426 — Assumptions and Sample Size Requirements Lesson 1162 — Documenting Transformations Lesson 1675 — Churn Attribution and Root Cause Analysis Lesson 1830 — Documentation and Metadata Management Lesson 1908 — Data Subject Access Requests (DSARs)Lesson 2023 — Creating a Pull Request Lesson 2137 — Refactoring Strategies and Debt Paydown
Why it matters: With finite populations, sampling without replacement affects probabilities as you go.; Lesson 233 — Populations in Practice Lesson 541 — Properties of Residuals
Why it works: It handles multiple predictors simultaneously, quantifies each feature's impact, and produces interpretable coefficients.; Lesson 1674 — Churn Prediction Models
Why it's powerful: Lesson 1079 — B-Tree Indexes: Structure and Mechanics
Why this works: The regression "controls for" the intermediate lags, removing their influence and revealing only the direct relationship.; Lesson 729 — Calculating Partial Autocorrelations
Why you're confident: (key evidence); Lesson 1952 — The Pyramid Principle: Leading with Conclusions
Wide confidence bands: High uncertainty (small risk set); Lesson 815 — Survival Curve Plots and Interpretation
Wide format: spreads observations across multiple columns.; Lesson 1144 — Common Violations: Wide vs Long Format Lesson 1145 — Pivoting Data Longer (Melt)
Widely understood: The standard language for discussing variability across fields; Lesson 49 — Standard Deviation: Interpretable Spread
Wilcoxon Signed-Rank Test: improves on this by incorporating the **size** of differences while remaining non-parametric (no normality assumption required).; Lesson 392 — Wilcoxon Signed-Rank Test
Wilcoxon test: (also called Breslow test) weights earlier time points more heavily because more subjects are at risk early on.; Lesson 823 — Log-Rank Test vs Other Tests
Win-back: strategies target customers who've already churned, while **retention** strategies aim to prevent at-risk customers from leaving in the first place.; Lesson 1676 — Win-Back and Retention Strategies
Win-Back Candidates: A subset of churned customers worth targeting for reactivation—perhaps they left for fixable reasons or represent high LTV potential.; Lesson 1704 — Customer Lifecycle Stages
Wireframe plots: show the underlying grid structure more clearly and reduce visual clutter when you need to see through the surface or understand the data's resolution.; Lesson 1325 — 3D Surface and Wireframe Plots
with: observation i included: ŷ ᵢ; Lesson 576 — DFFITS: Influence on Fitted Values Lesson 990 — Basic CTE Syntax and Structure
With adjustment: Include age in your regression model.; Lesson 1431 — Controlling for Confounders: Adjustment
With AVG: Lesson 900 — Using HAVING with Aggregate Functions
With COUNT: Lesson 900 — Using HAVING with Aggregate Functions
With CTE: Lesson 989 — What are Common Table Expressions (CTEs)?
With MIN/MAX: Lesson 900 — Using HAVING with Aggregate Functions
With ownership: , you get:; Lesson 1619 — What is Metric Ownership?
With partitions: Lesson 1007 — ROW_NUMBER(): Assigning Unique Row Numbers
With SUM: Lesson 900 — Using HAVING with Aggregate Functions
Within Groups: (or "Error"): Variation due to random differences within groups; Lesson 444 — The ANOVA Table
Within-Group Variability: Lesson 446 — Power and Sample Size for ANOVA
Within-group variance (denominator): Measures the average variability within each group (pooled across all groups); Lesson 440 — The F-Statistic and Its Distribution
without: observation i: ŷᵢ ᵢ; Lesson 576 — DFFITS: Influence on Fitted Values Lesson 825 — What is the Cox Proportional Hazards Model?
Without adjustment: Exercise appears negatively associated with blood pressure, but is that real or just because older people do both less?; Lesson 1431 — Controlling for Confounders: Adjustment
Without manipulation: means doing so honestly, proportionally, and with full context—not weaponizing emotion to bypass critical thinking or hide inconvenient truths.; Lesson 1941 — Emotional Connection Without Manipulation
Without manual oversight: (weekends, holidays, overnight); Lesson 1831 — What is Job Scheduling?
Without ownership: , metrics suffer:; Lesson 1619 — What is Metric Ownership?
Without partitions: Lesson 1007 — ROW_NUMBER(): Assigning Unique Row Numbers
Word frequency: A few words appear constantly; most are rare; Lesson 190 — The Pareto Distribution: Heavy Tails and Power Laws
Work with ordinal data: (survey ratings like "good, better, best"); Lesson 486 — Spearman's Rank Correlation Coefficient
Working Directory: Your desk where you're actively working on documents; Lesson 1993 — The Three States: Working Directory, Staging, Repository
Working sessions: Bi-weekly meetings to review preliminary findings and get rapid feedback; Lesson 2104 — Communication Cadence and Updates
Working with date ranges: Lesson 1040 — Date Arithmetic and INTERVAL Operations
Worst-case scenarios: Can you survive the potential losses?; Lesson 152 — Decision Making Under Uncertainty
Write complexity: Every time underlying data changes, you must update the aggregate; Lesson 1073 — Storing Computed Values and Aggregates Lesson 1075 — Handling Data Consistency in Denormalized Schemas
Write operation cost: How much slower are inserts and updates?; Lesson 1077 — Measuring Performance Impact of Denormalization
Wrong: This ignores the base rate.; Lesson 110 — Base Rate Fallacy Lesson 313 — Common Pitfalls in Hypothesis Formulation Lesson 1103 — The Dangers of String Formatting in SQL
Wrong Coefficient Signs: Lesson 581 — Symptoms of Multicollinearity
Wrong data type: Applying `AVG()` to non-numeric columns causes errors; Lesson 884 — AVG: Computing Averages
Wrong interpretation: "Going to the hospital makes people sick.; Lesson 496 — Reverse Causality
Wrong period: Seasonal spikes get flagged as false positives, or real anomalies blend into "normal" variation; Lesson 1409 — Setting Detection Parameters
Wrong summary: Using mean when data has outliers (better: median); Lesson 1245 — Misleading Aggregations and Binning

X

x̄: (x-bar): Sample mean — the average of *your specific sample*; Lesson 232 — Notation Conventions Lesson 269 — Confidence Interval Formula for One Mean Lesson 353 — Calculating the t-Statistic Lesson 520 — Computing β₀: The Intercept Estimate Lesson 1391 — The Grubbs' Test Statistic
X → Y: (causal path) and **Z → X → Y** plus **Z → Y** (a confounder creating a backdoor path **X ← Z → Y**), controlling for **Z** blocks the backdoor while preserving the causal arrow.; Lesson 1472 — The Backdoor Criterion
X value: but may fit the pattern perfectly.; Lesson 587 — Identifying Outliers in Regression Context
X-axis: Time periods since the initial event (Day 0, Day 7, Day 30, etc.; Lesson 1653 — What are Retention Curves?
X': The transpose of X (flip rows and columns); Lesson 598 — Estimating Coefficients with Least Squares
X(t): instead of just **X**.; Lesson 833 — Time-Varying Covariates
X₁, X₂, ..., X: Your predictor variables (independent variables); Lesson 596 — The Multiple Regression Equation

Y

Y value: given its X value—it doesn't follow the pattern of the other data points.; Lesson 587 — Identifying Outliers in Regression Context
Y-axis: Percentage of the original cohort still active (0-100%); Lesson 1653 — What are Retention Curves?
Y-units per X-unit: , and this determines how you communicate your findings.; Lesson 525 — Units and Scale in Interpretation
YAML: Human-readable, great for hierarchical settings; Lesson 2072 — Configuration Files vs Hard-Coded Values
YAML header: Metadata at the top specifying output format, title, author, and date; Lesson 1983 — R Markdown for Dynamic Reports
Yearly cycles: Ice cream sales peak every summer, heating costs rise every winter; Lesson 707 — Seasonality: Regular Periodic Patterns
Years of experience: may serve as an age proxy; Lesson 1883 — Protected Classes and Proxy Variables
You: stay focused on what actually matters to your audience; Lesson 1942 — The Pyramid Principle: Starting with the Conclusion
You can say: "Given our data and prior beliefs, there's a 95% probability the true conversion rate is between 45% and 74%.; Lesson 1562 — Credible Intervals for Proportions Lesson 1578 — Interpreting Credible Intervals
You CANNOT say: Lesson 1578 — Interpreting Credible Intervals
You have limited data: Conjugacy helps when likelihood is weak; Lesson 1556 — Choosing Between Conjugate and Non-Conjugate Priors
You look prepared: , not defensive; Lesson 1949 — Anticipating Questions: Building in Appendices
You miss real effects: – Even if your new feature genuinely improves conversion by 2%, your test might conclude "no significant difference" simply because you didn't collect enough data.; Lesson 1529 — Running Underpowered Tests
You stay in control: of the narrative instead of improvising; Lesson 1949 — Anticipating Questions: Building in Appendices
You use NOT IN: with nullable columns (can miss results and run slowly); Lesson 966 — Performance Considerations for WHERE Subqueries
You want stable variance: Some transformations stabilize variance across different data ranges, meeting another key assumption; Lesson 211 — Why Transform Data to Normality?
You waste resources: – Your engineering team built the feature, you split traffic for weeks, analyzed results.; Lesson 1529 — Running Underpowered Tests
You're building linear models: Transforming the response variable can improve model fit and prediction accuracy; Lesson 211 — Why Transform Data to Normality?
You're doing exploratory work: The interactive nature of Pandas in Jupyter makes rapid iteration easier than Spark's batch- oriented workflows.; Lesson 1787 — When to Optimize Pandas Instead
You're exploring: Early-stage analysis where perfect precision isn't critical; Lesson 1556 — Choosing Between Conjugate and Non-Conjugate Priors
Your branch's latest commit: The tip of your current branch; Lesson 2009 — Three-Way Merges
Your data is categorical/binary: Each observation falls into one of two categories (success/failure, yes/no, clicked/didn't click); Lesson 399 — When to Use the One-Sample Z-Test for Proportions
Your data is skewed: Income data, reaction times, or count data often pile up on one side; Lesson 211 — Why Transform Data to Normality?
Your fitted model: The GLM you actually built; Lesson 697 — Deviance: A Measure of Model Fit
Your operations are vectorized: Pandas built on NumPy excels at vectorized operations.; Lesson 1787 — When to Optimize Pandas Instead
Your outcome is categorical: predicting "yes/no" or categories requires logistic regression or classification methods instead; Lesson 555 — When Regression Is and Isn't Appropriate
Your own experience: Previous projects or work in adjacent fields; Lesson 1201 — Domain Knowledge as a Hypothesis Source
Your own historical cohorts: to track improvement; Lesson 1657 — Day-1, Day-7, Day-30 Benchmarks
Your sample size (n): Larger datasets have different thresholds; Lesson 1392 — Critical Values and Significance Testing
Your significance level (α): Typically 0.; Lesson 1392 — Critical Values and Significance Testing
Your significance level α: (and whether your test is one-tailed or two-tailed); Lesson 355 — Finding Critical Values and P-Values
Yule-Walker equations: .; Lesson 729 — Calculating Partial Autocorrelations

Z

Z → X: (third variable influences X); Lesson 1423 — The Third Variable Problem
Z → Y: (third variable influences Y); Lesson 1423 — The Third Variable Problem Lesson 1472 — The Backdoor Criterion
Z ≈ 0: The value is close to average; Lesson 1376 — What is the Z-Score Method?
Z_α/2: = critical value for your confidence level; Lesson 296 — Sample Size for Comparing Two Groups Lesson 1497 — Sample Size Formulas for Proportions Lesson 1498 — Sample Size Formulas for Continuous Metrics
Z_β: = critical value for your desired power; Lesson 296 — Sample Size for Comparing Two Groups Lesson 1497 — Sample Size Formulas for Proportions Lesson 1498 — Sample Size Formulas for Continuous Metrics
z-score: (or standard score) tells you how many standard deviations a data point is away from the mean.; Lesson 195 — Z-Score Definition and Interpretation Lesson 199 — Finding Percentiles with Z-Scores Lesson 200 — Comparing Values Across Different Distributions Lesson 1376 — What is the Z-Score Method?Lesson 1389 — What is Grubbs' Test?
z-score method: uses this to flag outliers: if a data point is *too many* standard deviations away from the mean, it's probably an outlier.; Lesson 71 — Z-Score Method for Outlier Detection Lesson 1386 — IQR Method vs Z-Score: When to Use Each
Z-scores: (which you'll learn to calculate soon) tell you how many standard deviations away from the mean you are.; Lesson 62 — Percentiles vs Z-Scores: Complementary Position Measures Lesson 1209 — Outlier Detection and Investigation
z-statistic: for proportions.; Lesson 402 — Calculating the Test Statistic for Proportions Lesson 683 — Hypothesis Tests for Individual Coefficients
Z-table: (also called a standard normal table) is a reference chart that shows cumulative probabilities for the standard normal distribution.; Lesson 198 — Using Z-Tables for Probability
Z-test: Used when you have large samples or known population variance; Lesson 1749 — Measuring Statistical Significance
z-test statistic: .; Lesson 409 — Z-Test Statistic for Two Proportions Lesson 410 — P-Value Calculation and Interpretation
Z-tests: for proportions determine if selection rate differences are statistically meaningful; Lesson 1890 — Measuring Disparate Impact
Zero: = normal distribution; Lesson 67 — Calculating Kurtosis Lesson 280 — Confidence Intervals for Difference in Proportions Lesson 539 — What Are Residuals?Lesson 984 — NOT EXISTS for Finding Missing Relationships
Zero residuals: (`e_i = 0`) mean your prediction was exactly correct (rare in practice!; Lesson 540 — The Residual Formula
Zero slope: X has no linear relationship with Y; Lesson 524 — The Meaning of the Slope
ZIP code: often correlates with race and income due to historical segregation patterns; Lesson 1883 — Protected Classes and Proxy Variables Lesson 1889 — Proxy Variables and Redlining
Zip code or address: → race, income, immigration status; Lesson 1889 — Proxy Variables and Redlining
Zombie users: Automated scripts or bots inflate counts without real engagement; Lesson 1694 — Daily Active Users (DAU) and Monthly Active Users (MAU)
Zoom controls: allow users to magnify regions of interest by scrolling or clicking-and-dragging, making dense visualizations navigable.; Lesson 1303 — Range Sliders and Zoom Controls