Matrix-Matrix Multiplication
What you'll learn: How to multiply two matrices together and why this operation is fundamental to machine learning computations.
The Core Idea
Matrix-matrix multiplication combines two matrices to produce a third matrix. Think of it as performing many dot products at once: each element in the result comes from taking the dot product of a row from the first matrix with a column from the second matrix.
Just like matrix-vector multiplication (which you've already learned), there's a strict rule: the number of columns in the first matrix must equal the number of rows in the second matrix.
How It Works
If matrix A has dimensions (m × n) and matrix B has dimensions (n × p):
- The result C will have dimensions (m × p)
- Each element C[i,j] = dot product of row i from A with column j from B
For example, if A is 3×2 and B is 2×4, the result will be 3×4. You compute 12 dot products total (3 rows × 4 columns).
Why It Matters
Matrix-matrix multiplication is essentially applying multiple transformations simultaneously. In machine learning, you'll often process entire batches of data at once—each row might represent one training example, and multiplying by a weight matrix transforms all examples in parallel.
Computational Cost
Multiplying an (m × n) matrix by an (n × p) matrix requires roughly m × n × p individual multiplications. This grows quickly with size, which is why efficient matrix multiplication is crucial for training large neural networks.
Key Takeaway: Matrix-matrix multiplication produces a new matrix by computing dot products between rows of the first matrix and columns of the second; the dimensions must be compatible (columns of first = rows of second), and the operation scales cubically with matrix size.