Hitchhiker's Guide to AI, Software Architecture, and Everything Else: Understanding Singular Value Decomposition (SVD) and Its Implementation in Pure Python

Introduction to SVD

Singular Value Decomposition (SVD) is one of the most powerful and versatile techniques in linear algebra. It decomposes any matrix into three components, revealing fundamental properties about the original matrix, its rank, null spaces, and range. SVD is particularly applicable to AI.

This article explores the theory behind SVD and presents a pure Python implementation that doesn't rely on external libraries like NumPy.

The Mathematics of SVD

For any matrix A of dimensions m×n, the SVD decomposes it as:

A = U · Σ · V^T

Where:

- U is an m×m orthogonal matrix containing the left singular vectors

- Σ is an m×n diagonal matrix containing the singular values in descending order

- V^T is the transpose of an n×n orthogonal matrix V containing the right singular vectors

The singular values (the diagonal entries of Σ) represent the "stretching factors" of the transformation represented by matrix A. They reveal the rank of the matrix and provide a measure of how information is distributed across different dimensions.

Applications of SVD

Before diving into the implementation, it's worth understanding why SVD is so valuable:

1. Dimensionality Reduction: By keeping only the largest singular values and their corresponding vectors, we can create low-rank approximations of matrices (the foundation of techniques like PCA).

2. Image Compression: SVD allows us to represent images using fewer coefficients while preserving essential features.

3. Recommendation Systems: The Netflix Prize competition demonstrated how matrix factorization methods derived from SVD can be effective for collaborative filtering.

4. Noise Reduction: By eliminating components associated with small singular values, we can often reduce noise in data.

5. Pseudoinverse Calculation: SVD provides a stable way to compute the Moore-Penrose pseudoinverse of matrices.

Understanding Our Implementation

Our Python implementation demonstrates how to compute the SVD using only built-in Python features. Let's examine the key components:

The Power Iteration Method

At the heart of our implementation is the power iteration method, an algorithm for finding the dominant eigenvalue and eigenvector of a matrix. For SVD, we apply this to either A·A^T or A^T·A (depending on which is smaller) to find singular values and vectors.

The power method works by repeatedly multiplying a vector by the matrix and normalizing, which causes the vector to converge to the eigenvector corresponding to the largest eigenvalue:

# Power iteration simplified

x = random_vector

for _ in range(iterations):

y = matrix_multiply(A, x)

x = normalize(y)

# x is now (approximately) the dominant eigenvector

Matrix Deflation

After finding each singular triplet (a singular value and its corresponding left and right singular vectors), we use deflation to remove that component from the working matrix:

# Deflation

A_work[j][l] -= sigma * u[j] * v[l]

This allows us to find subsequent singular triplets in order of decreasing singular values.

Adaptive Dimensionality

One key innovation in our implementation is adaptively choosing whether to work with A·A^T or A^T·A based on the matrix dimensions:

use_AAt = m >= n

This approach ensures we work with the smaller of the two matrices, improving computational efficiency.

Completing Orthonormal Bases

For matrices that are not full rank (like our test matrix [1,2,3; 4,5,6; 7,8,9]), we need to complete the orthonormal bases for both U and V:

# Complete basis for U if necessary

while len(U_vectors) < m:

u = random_orthogonal_vector(U_vectors, m)

U_vectors.append(u)

This ensures that U and V are proper orthogonal matrices as required by the SVD definition.

Technical Challenges and Solutions

Implementing SVD without specialized libraries presented several challenges:

Numerical Stability

Floating-point arithmetic errors can accumulate in iterative methods. We addressed this with several techniques:

1. Explicit orthogonalization against previous vectors

2. Careful normalization with tolerance checks

3. Convergence criteria based on vector differences rather than just eigenvalues

Handling Rank-Deficient Matrices

Many matrices in practice aren't full rank. Our implementation handles this by:

1. Detecting near-zero singular values based on a tolerance parameter

2. Generating appropriate random orthogonal vectors to complete the bases

3. Properly initializing the singular value matrix with zeros

Edge Cases

Our code includes specific handling for edge cases such as:

- Matrices with zero singular values

- Vectors becoming zero during orthogonalization

- Different matrix dimensions (m >> n or n >> m)

Performance Considerations

This pure Python implementation prioritizes clarity and correctness over performance. For large matrices, several optimizations could be made:

1. Sparse Matrix Representation: For matrices with many zeros, a sparse representation would reduce memory usage and computation time.

2. Block Methods: Processing multiple vectors simultaneously can better utilize cache locality and improve convergence for clustered singular values.

3. Randomized SVD: For very large matrices, randomized algorithms can approximate the SVD with significantly less computation.

Conclusion

Our Python implementation demonstrates how SVD can be computed from first principles without relying on specialized libraries. While libraries like NumPy and SciPy provide highly optimized SVD implementations for practical use, understanding the underlying algorithm provides valuable insights into the mathematics of matrix decomposition.

The full SVD implementation reveals the elegance of the algorithm and the power of linear algebra in extracting the fundamental structure of data. Whether you're compressing images, building recommendation systems, or analyzing high-dimensional data, SVD provides a powerful tool for understanding and transforming matrices.

By implementing SVD in pure Python, we've created an educational resource that makes this important algorithm accessible and comprehensible, providing a foundation for further exploration of linear algebra and its applications in data science and machine learning.

import math

import random

from copy import deepcopy

def svd(A, max_iterations=100, tolerance=1e-10):

"""

Compute the Singular Value Decomposition of a matrix A.

Returns U, S, V such that A = U * S * V.T

Args:

A: Input matrix as a list of lists

max_iterations: Maximum number of iterations for power method

tolerance: Convergence tolerance

Returns:

U: Left singular vectors

S: Singular values (diagonal matrix)

V: Right singular vectors

"""

# Get matrix dimensions

m = len(A)

n = len(A[0]) if m > 0 else 0

# Matrix operations

def transpose(matrix):

return [[matrix[j][i] for j in range(len(matrix))] for i in range(len(matrix[0]))]

def matrix_multiply(A, B):

C = [[0 for _ in range(len(B[0]))] for _ in range(len(A))]

for i in range(len(A)):

for j in range(len(B[0])):

for k in range(len(B)):

C[i][j] += A[i][k] * B[k][j]

return C

def matrix_vector_multiply(A, v):

"""Multiply matrix A by vector v"""

result = [0] * len(A)

for i in range(len(A)):

for j in range(len(A[0])):

if j < len(v):

result[i] += A[i][j] * v[j]

return result

def vector_norm(v):

"""Compute L2 norm of a vector"""

return math.sqrt(sum(x*x for x in v))

def normalize_vector(v):

"""Normalize a vector to unit length"""

norm = vector_norm(v)

if norm < tolerance:

return [0] * len(v)

return [x/norm for x in v]

# Helper function to generate a random orthogonal vector to a set of vectors

def random_orthogonal_vector(vectors, size):

"""Generate a random vector orthogonal to all vectors in the list"""

# Start with a random vector

v = [random.random() for _ in range(size)]

# Make it orthogonal to all vectors in the list

for vector in vectors:

# Calculate dot product

dot_prod = sum(v[i] * vector[i] for i in range(min(len(v), len(vector))))

# Subtract projection

for i in range(size):

if i < len(vector):

v[i] -= dot_prod * vector[i]

# Normalize

return normalize_vector(v)

# Perform full SVD using a power iteration approach

def full_svd(A, k):

"""

Compute the rank-k approximation of the SVD.

Args:

A: Input matrix

k: Maximum rank

Returns:

U: Left singular vectors

S: Singular values

V: Right singular vectors

"""

m, n = len(A), len(A[0])

A_t = transpose(A)

# Initialize storage for singular triplets

singular_values = []

U_vectors = []

V_vectors = []

# Create a copy of A to work with

A_work = [row[:] for row in A]

# Iteratively find singular triplets

for i in range(min(k, min(m, n))):

# Choose whether to use A*A^T or A^T*A based on dimensions

use_AAt = m >= n

if use_AAt:

# Initialize random vector for power iteration

u = [random.random() for _ in range(m)]

# Make orthogonal to previous U vectors

for vec in U_vectors:

dot_prod = sum(u[j] * vec[j] for j in range(m))

u = [u[j] - dot_prod * vec[j] for j in range(m)]

# Normalize

u = normalize_vector(u)

# Power iteration

for _ in range(max_iterations):

# v = A^T * u

v = [0] * n

for j in range(n):

for l in range(m):

v[j] += A_work[l][j] * u[l]

# Normalize v

v_norm = vector_norm(v)

if v_norm < tolerance:

# Matrix is effectively zero

break

v = [v[j] / v_norm for j in range(n)]

# u_new = A * v

u_new = [0] * m

for j in range(m):

for l in range(n):

u_new[j] += A_work[j][l] * v[l]

# Compute sigma (singular value)

sigma = vector_norm(u_new)

if sigma < tolerance:

# Effectively zero

break

# Update u

u_new = [u_new[j] / sigma for j in range(m)]

# Check convergence

if sum((u_new[j] - u[j])**2 for j in range(m)) < tolerance:

u = u_new

break

u = u_new

else:

# Initialize random vector for power iteration

v = [random.random() for _ in range(n)]

# Make orthogonal to previous V vectors

for vec in V_vectors:

dot_prod = sum(v[j] * vec[j] for j in range(n))

v = [v[j] - dot_prod * vec[j] for j in range(n)]

# Normalize

v = normalize_vector(v)

# Power iteration

for _ in range(max_iterations):

# u = A * v

u = [0] * m

for j in range(m):

for l in range(n):

u[j] += A_work[j][l] * v[l]

# Normalize u

u_norm = vector_norm(u)

if u_norm < tolerance:

# Matrix is effectively zero

break

u = [u[j] / u_norm for j in range(m)]

# v_new = A^T * u

v_new = [0] * n

for j in range(n):

for l in range(m):

v_new[j] += A_work[l][j] * u[l]

# Compute sigma (singular value)

sigma = vector_norm(v_new)

if sigma < tolerance:

# Effectively zero

break

# Update v

v_new = [v_new[j] / sigma for j in range(n)]

# Check convergence

if sum((v_new[j] - v[j])**2 for j in range(n)) < tolerance:

v = v_new

break

v = v_new

# One more multiplication to get final triplet

if use_AAt:

# v = A^T * u

v = [0] * n

for j in range(n):

for l in range(m):

v[j] += A_work[l][j] * u[l]

# Compute sigma

sigma = vector_norm(v)

if sigma > tolerance:

v = [v[j] / sigma for j in range(n)]

else:

# Zero singular value

sigma = 0

v = random_orthogonal_vector(V_vectors, n)

else:

# u = A * v

u = [0] * m

for j in range(m):

for l in range(n):

u[j] += A_work[j][l] * v[l]

# Compute sigma

sigma = vector_norm(u)

if sigma > tolerance:

u = [u[j] / sigma for j in range(m)]

else:

# Zero singular value

sigma = 0

u = random_orthogonal_vector(U_vectors, m)

# Store the singular triplet

singular_values.append(sigma)

U_vectors.append(u)

V_vectors.append(v)

# Deflate the matrix: A = A - sigma * u * v^T

for j in range(m):

for l in range(n):

A_work[j][l] -= sigma * u[j] * v[l]

# Complete basis for U if necessary

while len(U_vectors) < m:

u = random_orthogonal_vector(U_vectors, m)

U_vectors.append(u)

# Complete basis for V if necessary

while len(V_vectors) < n:

v = random_orthogonal_vector(V_vectors, n)

V_vectors.append(v)

# Create the diagonal singular value matrix

S = [[0 for _ in range(n)] for _ in range(m)]

for i in range(min(len(singular_values), min(m, n))):

S[i][i] = singular_values[i]

# Create proper U and V matrices

U = [[U_vectors[i][j] for i in range(m)] for j in range(m)]

V = [[V_vectors[i][j] for i in range(n)] for j in range(n)]

# Transpose V to get V

V = transpose(V)

return U, S, V

# Compute full SVD

return full_svd(A, min(m, n))

def test_svd():

# Create a test matrix

A = [

[1, 2, 3],

[4, 5, 6],

[7, 8, 9]

]

# Call SVD function

U, S, V = svd(A)

print("Matrix A:")

for row in A:

print(row)

print("\nLeft singular vectors (U):")

for row in U:

print([round(x, 6) for x in row])

print("\nSingular values (S):")

for i in range(len(S)):

print([round(x, 6) for x in S[i]])

print("\nRight singular vectors (V):")

for row in V:

print([round(x, 6) for x in row])

# Reconstruct the original matrix to verify

def matrix_multiply(A, B):

C = [[0 for _ in range(len(B[0]))] for _ in range(len(A))]

for i in range(len(A)):

for j in range(len(B[0])):

for k in range(len(B)):

C[i][j] += A[i][k] * B[k][j]

return C

US = matrix_multiply(U, S)

A_reconstructed = matrix_multiply(US, V)

print("\nReconstructed matrix A:")

for row in A_reconstructed:

print([round(x, 6) for x in row])

# Compute reconstruction error

error = 0

for i in range(len(A)):

for j in range(len(A[0])):

error += (A[i][j] - A_reconstructed[i][j])**2

error = math.sqrt(error)

print(f"\nReconstruction error: {error}")

if __name__ == "__main__":

test_svd()

Hitchhiker's Guide to AI, Software Architecture, and Everything Else

Tuesday, May 20, 2025

Understanding Singular Value Decomposition (SVD) and Its Implementation in Pure Python

Introduction to SVD

The Mathematics of SVD

Applications of SVD

Understanding Our Implementation

The Power Iteration Method

Matrix Deflation

Adaptive Dimensionality

Completing Orthonormal Bases

Technical Challenges and Solutions

Numerical Stability

Handling Rank-Deficient Matrices

Edge Cases

Performance Considerations

Conclusion

No comments:

About Me