BiteSizedChunks.comLearn one small thing at a time.

Course contentsShow

Machine Learning and Deep Learning

1Scalars, Vectors, and Matrices: Definitions
2Vector Operations: Addition and Scalar Multiplication
3Dot Product and Vector Similarity
4Vector Norms and Distance Metrics
5Matrix-Vector Multiplication
6Matrix-Matrix Multiplication
7Matrix Transpose and Symmetry
8Identity Matrix and Matrix Inverse
9Systems of Linear Equations
10Linear Independence and Span
11Basis and Dimension
12Column Space and Null Space
13Rank of a Matrix
14Determinants and Their Properties
15Trace of a Matrix
16Eigenvalues and Eigenvectors: Definitions
17Computing Eigenvalues and Eigenvectors
18Eigendecomposition of Matrices
19Diagonalization and Its Applications
20Orthogonality and Orthonormal Vectors
21Orthogonal Matrices and Their Properties
22Singular Value Decomposition (SVD): Concept
23Computing and Interpreting SVD
24Matrix Approximation with SVD
25Positive Definite and Semidefinite Matrices
26Quadratic Forms
27Matrix Calculus: Gradients of Matrix Expressions
28Numerical Stability in Linear Algebra
29Functions and Continuity
30Limits: The Foundation of Derivatives
31The Derivative Definition
32Geometric Interpretation of Derivatives
33Basic Differentiation Rules
34Product and Quotient Rules
35The Chain Rule
36Derivatives of Exponential Functions
37Derivatives of Logarithmic Functions
38Derivatives of Trigonometric Functions
39Higher-Order Derivatives
40Implicit Differentiation
41Partial Derivatives: Introduction
42The Gradient Vector
43Directional Derivatives
44The Multivariable Chain Rule
45Critical Points and Extrema
46The Hessian Matrix
47Second Derivative Test in Multiple Dimensions
48Taylor Series and Approximations
49L'Hôpital's Rule
50The Jacobian Matrix
51Integration Fundamentals
52Numerical Differentiation
53Sample Spaces and Events
54Probability Axioms and Basic Rules
55Conditional Probability
56Independence of Events
57Bayes' Theorem
58Random Variables: Discrete and Continuous
59Probability Mass Functions
60Probability Density Functions
61Cumulative Distribution Functions
62Expectation and Mean
63Variance and Standard Deviation
64Common Discrete Distributions: Bernoulli and Binomial
65Poisson Distribution
66Uniform Distribution
67Normal (Gaussian) Distribution
68Exponential and Gamma Distributions
69Joint Probability Distributions
70Marginal and Conditional Distributions
71Covariance and Correlation
72Independence of Random Variables
73Law of Large Numbers
74Central Limit Theorem
75Population vs Sample
76Descriptive Statistics: Central Tendency
77Descriptive Statistics: Spread and Variability
78Percentiles and Quantiles
79Covariance and Correlation
80The Law of Large Numbers
81Central Limit Theorem
82Sampling Distributions
83Point Estimation Fundamentals
84Bias and Variance of Estimators
85Maximum Likelihood Estimation
86Method of Moments
87Confidence Intervals
88Bootstrap Resampling
89Hypothesis Testing Framework
90Type I and Type II Errors
91Common Statistical Tests
92Multiple Testing Correction
93What is Mathematical Optimization?
94Unconstrained vs Constrained Optimization
95Local vs Global Optima
96Convex Sets
97Convex Functions
98First-Order Optimality Conditions
99Second-Order Optimality Conditions
100The Gradient Descent Algorithm
101Learning Rate and Step Size
102Convergence Guarantees for Gradient Descent
103Lipschitz Continuity and Smoothness
104Strong Convexity
105Stochastic Gradient Descent Basics
106Momentum Methods
107Newton's Method
108Quasi-Newton Methods
109Coordinate Descent
110Constrained Optimization and Lagrange Multipliers
111KKT Conditions
112Subgradients and Non-Smooth Optimization

Machine Learning and Deep Learning

1Scalars, Vectors, and Matrices: Definitions
2Vector Operations: Addition and Scalar Multiplication
3Dot Product and Vector Similarity
4Vector Norms and Distance Metrics
5Matrix-Vector Multiplication
6Matrix-Matrix Multiplication
7Matrix Transpose and Symmetry
8Identity Matrix and Matrix Inverse
9Systems of Linear Equations
10Linear Independence and Span
11Basis and Dimension
12Column Space and Null Space
13Rank of a Matrix
14Determinants and Their Properties
15Trace of a Matrix
16Eigenvalues and Eigenvectors: Definitions
17Computing Eigenvalues and Eigenvectors
18Eigendecomposition of Matrices
19Diagonalization and Its Applications
20Orthogonality and Orthonormal Vectors
21Orthogonal Matrices and Their Properties
22Singular Value Decomposition (SVD): Concept
23Computing and Interpreting SVD
24Matrix Approximation with SVD
25Positive Definite and Semidefinite Matrices
26Quadratic Forms
27Matrix Calculus: Gradients of Matrix Expressions
28Numerical Stability in Linear Algebra
29Functions and Continuity
30Limits: The Foundation of Derivatives
31The Derivative Definition
32Geometric Interpretation of Derivatives
33Basic Differentiation Rules
34Product and Quotient Rules
35The Chain Rule
36Derivatives of Exponential Functions
37Derivatives of Logarithmic Functions
38Derivatives of Trigonometric Functions
39Higher-Order Derivatives
40Implicit Differentiation
41Partial Derivatives: Introduction
42The Gradient Vector
43Directional Derivatives
44The Multivariable Chain Rule
45Critical Points and Extrema
46The Hessian Matrix
47Second Derivative Test in Multiple Dimensions
48Taylor Series and Approximations
49L'Hôpital's Rule
50The Jacobian Matrix
51Integration Fundamentals
52Numerical Differentiation
53Sample Spaces and Events
54Probability Axioms and Basic Rules
55Conditional Probability
56Independence of Events
57Bayes' Theorem
58Random Variables: Discrete and Continuous
59Probability Mass Functions
60Probability Density Functions
61Cumulative Distribution Functions
62Expectation and Mean
63Variance and Standard Deviation
64Common Discrete Distributions: Bernoulli and Binomial
65Poisson Distribution
66Uniform Distribution
67Normal (Gaussian) Distribution
68Exponential and Gamma Distributions
69Joint Probability Distributions
70Marginal and Conditional Distributions
71Covariance and Correlation
72Independence of Random Variables
73Law of Large Numbers
74Central Limit Theorem
75Population vs Sample
76Descriptive Statistics: Central Tendency
77Descriptive Statistics: Spread and Variability
78Percentiles and Quantiles
79Covariance and Correlation
80The Law of Large Numbers
81Central Limit Theorem
82Sampling Distributions
83Point Estimation Fundamentals
84Bias and Variance of Estimators
85Maximum Likelihood Estimation
86Method of Moments
87Confidence Intervals
88Bootstrap Resampling
89Hypothesis Testing Framework
90Type I and Type II Errors
91Common Statistical Tests
92Multiple Testing Correction
93What is Mathematical Optimization?
94Unconstrained vs Constrained Optimization
95Local vs Global Optima
96Convex Sets
97Convex Functions
98First-Order Optimality Conditions
99Second-Order Optimality Conditions
100The Gradient Descent Algorithm
101Learning Rate and Step Size
102Convergence Guarantees for Gradient Descent
103Lipschitz Continuity and Smoothness
104Strong Convexity
105Stochastic Gradient Descent Basics
106Momentum Methods
107Newton's Method
108Quasi-Newton Methods
109Coordinate Descent
110Constrained Optimization and Lagrange Multipliers
111KKT Conditions
112Subgradients and Non-Smooth Optimization

← Machine Learning and Deep Learning

Lesson 5 of 3,538·1. Mathematical Foundations for Machine LearningFree lesson

Matrix-Vector Multiplication

Master multiplying matrices by vectors: mechanics, dimensions, and interpretation as linear transformations.

Matrix-Vector Multiplication

What you'll learn: How to multiply a matrix by a vector to transform data—a fundamental operation underlying predictions in machine learning.

What Is Matrix-Vector Multiplication?

When you multiply a matrix by a vector, you're applying a linear transformation—essentially reshaping or rotating your data in space. Each row of the matrix performs a dot product with the vector, producing one number in the output vector.

Think of it like a recipe card system: each recipe (matrix row) takes your ingredients (input vector) and combines them with specific weights to create one dish (output element).

The Mechanics

Given a matrix A with dimensions m × n and a vector x with n elements:

Check dimensions: The number of columns in A must equal the length of x
Compute each output element: The i-th element of the result equals the dot product of the i-th row of A with x
Result shape: You get an output vector with m elements

Example

Matrix A (2×3):

[2  1  3]
[0  4  1]

Vector x (3 elements):

[1]
[2]
[3]

Result (2 elements):

First element: (2×1) + (1×2) + (3×3) = 2 + 2 + 9 = 13
Second element: (0×1) + (4×2) + (1×3) = 0 + 8 + 3 = 11

Output: [13, 11]

Why It Matters

Every neuron in a neural network performs matrix-vector multiplication! Your input features (vector) get combined using learned weights (matrix rows) to produce predictions. This operation transforms raw data into meaningful outputs.

Key Takeaway: Matrix-vector multiplication applies a linear transformation by taking dot products of each matrix row with the input vector—the dimensions must align (matrix columns = vector length), and the output vector has as many elements as the matrix has rows.