BiteSizedChunks.comLearn one small thing at a time.

Course contentsShow

Machine Learning and Deep Learning

1Scalars, Vectors, and Matrices: Definitions
2Vector Operations: Addition and Scalar Multiplication
3Dot Product and Vector Similarity
4Vector Norms and Distance Metrics
5Matrix-Vector Multiplication
6Matrix-Matrix Multiplication
7Matrix Transpose and Symmetry
8Identity Matrix and Matrix Inverse
9Systems of Linear Equations
10Linear Independence and Span
11Basis and Dimension
12Column Space and Null Space
13Rank of a Matrix
14Determinants and Their Properties
15Trace of a Matrix
16Eigenvalues and Eigenvectors: Definitions
17Computing Eigenvalues and Eigenvectors
18Eigendecomposition of Matrices
19Diagonalization and Its Applications
20Orthogonality and Orthonormal Vectors
21Orthogonal Matrices and Their Properties
22Singular Value Decomposition (SVD): Concept
23Computing and Interpreting SVD
24Matrix Approximation with SVD
25Positive Definite and Semidefinite Matrices
26Quadratic Forms
27Matrix Calculus: Gradients of Matrix Expressions
28Numerical Stability in Linear Algebra
29Functions and Continuity
30Limits: The Foundation of Derivatives
31The Derivative Definition
32Geometric Interpretation of Derivatives
33Basic Differentiation Rules
34Product and Quotient Rules
35The Chain Rule
36Derivatives of Exponential Functions
37Derivatives of Logarithmic Functions
38Derivatives of Trigonometric Functions
39Higher-Order Derivatives
40Implicit Differentiation
41Partial Derivatives: Introduction
42The Gradient Vector
43Directional Derivatives
44The Multivariable Chain Rule
45Critical Points and Extrema
46The Hessian Matrix
47Second Derivative Test in Multiple Dimensions
48Taylor Series and Approximations
49L'Hôpital's Rule
50The Jacobian Matrix
51Integration Fundamentals
52Numerical Differentiation
53Sample Spaces and Events
54Probability Axioms and Basic Rules
55Conditional Probability
56Independence of Events
57Bayes' Theorem
58Random Variables: Discrete and Continuous
59Probability Mass Functions
60Probability Density Functions
61Cumulative Distribution Functions
62Expectation and Mean
63Variance and Standard Deviation
64Common Discrete Distributions: Bernoulli and Binomial
65Poisson Distribution
66Uniform Distribution
67Normal (Gaussian) Distribution
68Exponential and Gamma Distributions
69Joint Probability Distributions
70Marginal and Conditional Distributions
71Covariance and Correlation
72Independence of Random Variables
73Law of Large Numbers
74Central Limit Theorem
75Population vs Sample
76Descriptive Statistics: Central Tendency
77Descriptive Statistics: Spread and Variability
78Percentiles and Quantiles
79Covariance and Correlation
80The Law of Large Numbers
81Central Limit Theorem
82Sampling Distributions
83Point Estimation Fundamentals
84Bias and Variance of Estimators
85Maximum Likelihood Estimation
86Method of Moments
87Confidence Intervals
88Bootstrap Resampling
89Hypothesis Testing Framework
90Type I and Type II Errors
91Common Statistical Tests
92Multiple Testing Correction
93What is Mathematical Optimization?
94Unconstrained vs Constrained Optimization
95Local vs Global Optima
96Convex Sets
97Convex Functions
98First-Order Optimality Conditions
99Second-Order Optimality Conditions
100The Gradient Descent Algorithm
101Learning Rate and Step Size
102Convergence Guarantees for Gradient Descent
103Lipschitz Continuity and Smoothness
104Strong Convexity
105Stochastic Gradient Descent Basics
106Momentum Methods
107Newton's Method
108Quasi-Newton Methods
109Coordinate Descent
110Constrained Optimization and Lagrange Multipliers
111KKT Conditions
112Subgradients and Non-Smooth Optimization

Machine Learning and Deep Learning

1Scalars, Vectors, and Matrices: Definitions
2Vector Operations: Addition and Scalar Multiplication
3Dot Product and Vector Similarity
4Vector Norms and Distance Metrics
5Matrix-Vector Multiplication
6Matrix-Matrix Multiplication
7Matrix Transpose and Symmetry
8Identity Matrix and Matrix Inverse
9Systems of Linear Equations
10Linear Independence and Span
11Basis and Dimension
12Column Space and Null Space
13Rank of a Matrix
14Determinants and Their Properties
15Trace of a Matrix
16Eigenvalues and Eigenvectors: Definitions
17Computing Eigenvalues and Eigenvectors
18Eigendecomposition of Matrices
19Diagonalization and Its Applications
20Orthogonality and Orthonormal Vectors
21Orthogonal Matrices and Their Properties
22Singular Value Decomposition (SVD): Concept
23Computing and Interpreting SVD
24Matrix Approximation with SVD
25Positive Definite and Semidefinite Matrices
26Quadratic Forms
27Matrix Calculus: Gradients of Matrix Expressions
28Numerical Stability in Linear Algebra
29Functions and Continuity
30Limits: The Foundation of Derivatives
31The Derivative Definition
32Geometric Interpretation of Derivatives
33Basic Differentiation Rules
34Product and Quotient Rules
35The Chain Rule
36Derivatives of Exponential Functions
37Derivatives of Logarithmic Functions
38Derivatives of Trigonometric Functions
39Higher-Order Derivatives
40Implicit Differentiation
41Partial Derivatives: Introduction
42The Gradient Vector
43Directional Derivatives
44The Multivariable Chain Rule
45Critical Points and Extrema
46The Hessian Matrix
47Second Derivative Test in Multiple Dimensions
48Taylor Series and Approximations
49L'Hôpital's Rule
50The Jacobian Matrix
51Integration Fundamentals
52Numerical Differentiation
53Sample Spaces and Events
54Probability Axioms and Basic Rules
55Conditional Probability
56Independence of Events
57Bayes' Theorem
58Random Variables: Discrete and Continuous
59Probability Mass Functions
60Probability Density Functions
61Cumulative Distribution Functions
62Expectation and Mean
63Variance and Standard Deviation
64Common Discrete Distributions: Bernoulli and Binomial
65Poisson Distribution
66Uniform Distribution
67Normal (Gaussian) Distribution
68Exponential and Gamma Distributions
69Joint Probability Distributions
70Marginal and Conditional Distributions
71Covariance and Correlation
72Independence of Random Variables
73Law of Large Numbers
74Central Limit Theorem
75Population vs Sample
76Descriptive Statistics: Central Tendency
77Descriptive Statistics: Spread and Variability
78Percentiles and Quantiles
79Covariance and Correlation
80The Law of Large Numbers
81Central Limit Theorem
82Sampling Distributions
83Point Estimation Fundamentals
84Bias and Variance of Estimators
85Maximum Likelihood Estimation
86Method of Moments
87Confidence Intervals
88Bootstrap Resampling
89Hypothesis Testing Framework
90Type I and Type II Errors
91Common Statistical Tests
92Multiple Testing Correction
93What is Mathematical Optimization?
94Unconstrained vs Constrained Optimization
95Local vs Global Optima
96Convex Sets
97Convex Functions
98First-Order Optimality Conditions
99Second-Order Optimality Conditions
100The Gradient Descent Algorithm
101Learning Rate and Step Size
102Convergence Guarantees for Gradient Descent
103Lipschitz Continuity and Smoothness
104Strong Convexity
105Stochastic Gradient Descent Basics
106Momentum Methods
107Newton's Method
108Quasi-Newton Methods
109Coordinate Descent
110Constrained Optimization and Lagrange Multipliers
111KKT Conditions
112Subgradients and Non-Smooth Optimization

← Machine Learning and Deep Learning

Lesson 6 of 3,538·1. Mathematical Foundations for Machine LearningFree lesson

Matrix-Matrix Multiplication

Learn the rules for multiplying matrices, including dimension requirements and computational complexity.

Matrix-Matrix Multiplication

What you'll learn: How to multiply two matrices together and why this operation is fundamental to machine learning computations.

The Core Idea

Matrix-matrix multiplication combines two matrices to produce a third matrix. Think of it as performing many dot products at once: each element in the result comes from taking the dot product of a row from the first matrix with a column from the second matrix.

Just like matrix-vector multiplication (which you've already learned), there's a strict rule: the number of columns in the first matrix must equal the number of rows in the second matrix.

How It Works

If matrix A has dimensions (m × n) and matrix B has dimensions (n × p):

The result C will have dimensions (m × p)
Each element C[i,j] = dot product of row i from A with column j from B

For example, if A is 3×2 and B is 2×4, the result will be 3×4. You compute 12 dot products total (3 rows × 4 columns).

Why It Matters

Matrix-matrix multiplication is essentially applying multiple transformations simultaneously. In machine learning, you'll often process entire batches of data at once—each row might represent one training example, and multiplying by a weight matrix transforms all examples in parallel.

Computational Cost

Multiplying an (m × n) matrix by an (n × p) matrix requires roughly m × n × p individual multiplications. This grows quickly with size, which is why efficient matrix multiplication is crucial for training large neural networks.

Key Takeaway: Matrix-matrix multiplication produces a new matrix by computing dot products between rows of the first matrix and columns of the second; the dimensions must be compatible (columns of first = rows of second), and the operation scales cubically with matrix size.