BiteSizedChunks.comLearn one small thing at a time.

Course contentsShow

Machine Learning and Deep Learning

1Scalars, Vectors, and Matrices: Definitions
2Vector Operations: Addition and Scalar Multiplication
3Dot Product and Vector Similarity
4Vector Norms and Distance Metrics
5Matrix-Vector Multiplication
6Matrix-Matrix Multiplication
7Matrix Transpose and Symmetry
8Identity Matrix and Matrix Inverse
9Systems of Linear Equations
10Linear Independence and Span
11Basis and Dimension
12Column Space and Null Space
13Rank of a Matrix
14Determinants and Their Properties
15Trace of a Matrix
16Eigenvalues and Eigenvectors: Definitions
17Computing Eigenvalues and Eigenvectors
18Eigendecomposition of Matrices
19Diagonalization and Its Applications
20Orthogonality and Orthonormal Vectors
21Orthogonal Matrices and Their Properties
22Singular Value Decomposition (SVD): Concept
23Computing and Interpreting SVD
24Matrix Approximation with SVD
25Positive Definite and Semidefinite Matrices
26Quadratic Forms
27Matrix Calculus: Gradients of Matrix Expressions
28Numerical Stability in Linear Algebra
29Functions and Continuity
30Limits: The Foundation of Derivatives
31The Derivative Definition
32Geometric Interpretation of Derivatives
33Basic Differentiation Rules
34Product and Quotient Rules
35The Chain Rule
36Derivatives of Exponential Functions
37Derivatives of Logarithmic Functions
38Derivatives of Trigonometric Functions
39Higher-Order Derivatives
40Implicit Differentiation
41Partial Derivatives: Introduction
42The Gradient Vector
43Directional Derivatives
44The Multivariable Chain Rule
45Critical Points and Extrema
46The Hessian Matrix
47Second Derivative Test in Multiple Dimensions
48Taylor Series and Approximations
49L'Hôpital's Rule
50The Jacobian Matrix
51Integration Fundamentals
52Numerical Differentiation
53Sample Spaces and Events
54Probability Axioms and Basic Rules
55Conditional Probability
56Independence of Events
57Bayes' Theorem
58Random Variables: Discrete and Continuous
59Probability Mass Functions
60Probability Density Functions
61Cumulative Distribution Functions
62Expectation and Mean
63Variance and Standard Deviation
64Common Discrete Distributions: Bernoulli and Binomial
65Poisson Distribution
66Uniform Distribution
67Normal (Gaussian) Distribution
68Exponential and Gamma Distributions
69Joint Probability Distributions
70Marginal and Conditional Distributions
71Covariance and Correlation
72Independence of Random Variables
73Law of Large Numbers
74Central Limit Theorem
75Population vs Sample
76Descriptive Statistics: Central Tendency
77Descriptive Statistics: Spread and Variability
78Percentiles and Quantiles
79Covariance and Correlation
80The Law of Large Numbers
81Central Limit Theorem
82Sampling Distributions
83Point Estimation Fundamentals
84Bias and Variance of Estimators
85Maximum Likelihood Estimation
86Method of Moments
87Confidence Intervals
88Bootstrap Resampling
89Hypothesis Testing Framework
90Type I and Type II Errors
91Common Statistical Tests
92Multiple Testing Correction
93What is Mathematical Optimization?
94Unconstrained vs Constrained Optimization
95Local vs Global Optima
96Convex Sets
97Convex Functions
98First-Order Optimality Conditions
99Second-Order Optimality Conditions
100The Gradient Descent Algorithm
101Learning Rate and Step Size
102Convergence Guarantees for Gradient Descent
103Lipschitz Continuity and Smoothness
104Strong Convexity
105Stochastic Gradient Descent Basics
106Momentum Methods
107Newton's Method
108Quasi-Newton Methods
109Coordinate Descent
110Constrained Optimization and Lagrange Multipliers
111KKT Conditions
112Subgradients and Non-Smooth Optimization

Machine Learning and Deep Learning

1Scalars, Vectors, and Matrices: Definitions
2Vector Operations: Addition and Scalar Multiplication
3Dot Product and Vector Similarity
4Vector Norms and Distance Metrics
5Matrix-Vector Multiplication
6Matrix-Matrix Multiplication
7Matrix Transpose and Symmetry
8Identity Matrix and Matrix Inverse
9Systems of Linear Equations
10Linear Independence and Span
11Basis and Dimension
12Column Space and Null Space
13Rank of a Matrix
14Determinants and Their Properties
15Trace of a Matrix
16Eigenvalues and Eigenvectors: Definitions
17Computing Eigenvalues and Eigenvectors
18Eigendecomposition of Matrices
19Diagonalization and Its Applications
20Orthogonality and Orthonormal Vectors
21Orthogonal Matrices and Their Properties
22Singular Value Decomposition (SVD): Concept
23Computing and Interpreting SVD
24Matrix Approximation with SVD
25Positive Definite and Semidefinite Matrices
26Quadratic Forms
27Matrix Calculus: Gradients of Matrix Expressions
28Numerical Stability in Linear Algebra
29Functions and Continuity
30Limits: The Foundation of Derivatives
31The Derivative Definition
32Geometric Interpretation of Derivatives
33Basic Differentiation Rules
34Product and Quotient Rules
35The Chain Rule
36Derivatives of Exponential Functions
37Derivatives of Logarithmic Functions
38Derivatives of Trigonometric Functions
39Higher-Order Derivatives
40Implicit Differentiation
41Partial Derivatives: Introduction
42The Gradient Vector
43Directional Derivatives
44The Multivariable Chain Rule
45Critical Points and Extrema
46The Hessian Matrix
47Second Derivative Test in Multiple Dimensions
48Taylor Series and Approximations
49L'Hôpital's Rule
50The Jacobian Matrix
51Integration Fundamentals
52Numerical Differentiation
53Sample Spaces and Events
54Probability Axioms and Basic Rules
55Conditional Probability
56Independence of Events
57Bayes' Theorem
58Random Variables: Discrete and Continuous
59Probability Mass Functions
60Probability Density Functions
61Cumulative Distribution Functions
62Expectation and Mean
63Variance and Standard Deviation
64Common Discrete Distributions: Bernoulli and Binomial
65Poisson Distribution
66Uniform Distribution
67Normal (Gaussian) Distribution
68Exponential and Gamma Distributions
69Joint Probability Distributions
70Marginal and Conditional Distributions
71Covariance and Correlation
72Independence of Random Variables
73Law of Large Numbers
74Central Limit Theorem
75Population vs Sample
76Descriptive Statistics: Central Tendency
77Descriptive Statistics: Spread and Variability
78Percentiles and Quantiles
79Covariance and Correlation
80The Law of Large Numbers
81Central Limit Theorem
82Sampling Distributions
83Point Estimation Fundamentals
84Bias and Variance of Estimators
85Maximum Likelihood Estimation
86Method of Moments
87Confidence Intervals
88Bootstrap Resampling
89Hypothesis Testing Framework
90Type I and Type II Errors
91Common Statistical Tests
92Multiple Testing Correction
93What is Mathematical Optimization?
94Unconstrained vs Constrained Optimization
95Local vs Global Optima
96Convex Sets
97Convex Functions
98First-Order Optimality Conditions
99Second-Order Optimality Conditions
100The Gradient Descent Algorithm
101Learning Rate and Step Size
102Convergence Guarantees for Gradient Descent
103Lipschitz Continuity and Smoothness
104Strong Convexity
105Stochastic Gradient Descent Basics
106Momentum Methods
107Newton's Method
108Quasi-Newton Methods
109Coordinate Descent
110Constrained Optimization and Lagrange Multipliers
111KKT Conditions
112Subgradients and Non-Smooth Optimization

← Machine Learning and Deep Learning

Lesson 9 of 3,538·1. Mathematical Foundations for Machine LearningFree lesson

Systems of Linear Equations

Represent systems as Ax=b, understand when solutions exist, and connect to matrix rank.

Systems of Linear Equations

What you'll learn: How to represent multiple linear equations as a single matrix equation and determine when solutions exist.

The Core Idea

Imagine you're trying to find the recipe for a mystery smoothie by tasting three different batches. Each batch uses different amounts of strawberries, bananas, and protein powder, but you know the total calories. You need to solve for the unknowns (ingredient amounts) using multiple clues (equations).

In mathematics, when you have multiple equations with multiple unknowns, you have a system of linear equations. Instead of writing them separately, we can represent the entire system compactly as:

Ax = b

Where:

A is a matrix containing all the coefficients
x is a vector of unknowns you're solving for
b is a vector of results

For example, these two equations:

2x₁ + 3x₂ = 8
1x₁ + 4x₂ = 9

Become:

A = [[2, 3],    x = [x₁],    b = [8]
     [1, 4]]         [x₂]         [9]

When Do Solutions Exist?

Not every system has a solution! Three scenarios:

One unique solution – equations intersect at exactly one point
Infinite solutions – equations describe the same line/plane
No solution – equations are parallel, never intersect

The key is matrix rank: the number of independent equations. If rank(A) equals the number of unknowns and rank(A) = rank([A|b]), a unique solution exists. If rank is less, you might have infinite solutions or none.

Key Takeaway: Systems of linear equations Ax=b compress multiple equations into one matrix equation, and whether solutions exist depends on the rank of matrix A — a fundamental tool throughout machine learning for solving parameter optimization problems.