Introduction to SVD
Singular Value Decomposition (SVD) is one of the most powerful and versatile techniques in linear algebra. It decomposes any matrix into three components, revealing fundamental properties about the original matrix, its rank, null spaces, and range. SVD is particularly applicable to AI.
This article explores the theory behind SVD and presents a pure Python implementation that doesn't rely on external libraries like NumPy.
The Mathematics of SVD
For any matrix A of dimensions m×n, the SVD decomposes it as:
A = U · Σ · V^T
Where:
- U is an m×m orthogonal matrix containing the left singular vectors
- Σ is an m×n diagonal matrix containing the singular values in descending order
- V^T is the transpose of an n×n orthogonal matrix V containing the right singular vectors
The singular values (the diagonal entries of Σ) represent the "stretching factors" of the transformation represented by matrix A. They reveal the rank of the matrix and provide a measure of how information is distributed across different dimensions.
Applications of SVD
Before diving into the implementation, it's worth understanding why SVD is so valuable:
1. Dimensionality Reduction: By keeping only the largest singular values and their corresponding vectors, we can create low-rank approximations of matrices (the foundation of techniques like PCA).
2. Image Compression: SVD allows us to represent images using fewer coefficients while preserving essential features.
3. Recommendation Systems: The Netflix Prize competition demonstrated how matrix factorization methods derived from SVD can be effective for collaborative filtering.
4. Noise Reduction: By eliminating components associated with small singular values, we can often reduce noise in data.
5. Pseudoinverse Calculation: SVD provides a stable way to compute the Moore-Penrose pseudoinverse of matrices.
Understanding Our Implementation
Our Python implementation demonstrates how to compute the SVD using only built-in Python features. Let's examine the key components:
The Power Iteration Method
At the heart of our implementation is the power iteration method, an algorithm for finding the dominant eigenvalue and eigenvector of a matrix. For SVD, we apply this to either A·A^T or A^T·A (depending on which is smaller) to find singular values and vectors.
The power method works by repeatedly multiplying a vector by the matrix and normalizing, which causes the vector to converge to the eigenvector corresponding to the largest eigenvalue:
# Power iteration simplified
x = random_vector
for _ in range(iterations):
y = matrix_multiply(A, x)
x = normalize(y)
# x is now (approximately) the dominant eigenvector
Matrix Deflation
After finding each singular triplet (a singular value and its corresponding left and right singular vectors), we use deflation to remove that component from the working matrix:
# Deflation
A_work[j][l] -= sigma * u[j] * v[l]
This allows us to find subsequent singular triplets in order of decreasing singular values.
Adaptive Dimensionality
One key innovation in our implementation is adaptively choosing whether to work with A·A^T or A^T·A based on the matrix dimensions:
use_AAt = m >= n
This approach ensures we work with the smaller of the two matrices, improving computational efficiency.
Completing Orthonormal Bases
For matrices that are not full rank (like our test matrix [1,2,3; 4,5,6; 7,8,9]), we need to complete the orthonormal bases for both U and V:
# Complete basis for U if necessary
while len(U_vectors) < m:
u = random_orthogonal_vector(U_vectors, m)
U_vectors.append(u)
This ensures that U and V are proper orthogonal matrices as required by the SVD definition.
Technical Challenges and Solutions
Implementing SVD without specialized libraries presented several challenges:
Numerical Stability
Floating-point arithmetic errors can accumulate in iterative methods. We addressed this with several techniques:
1. Explicit orthogonalization against previous vectors
2. Careful normalization with tolerance checks
3. Convergence criteria based on vector differences rather than just eigenvalues
Handling Rank-Deficient Matrices
Many matrices in practice aren't full rank. Our implementation handles this by:
1. Detecting near-zero singular values based on a tolerance parameter
2. Generating appropriate random orthogonal vectors to complete the bases
3. Properly initializing the singular value matrix with zeros
Edge Cases
Our code includes specific handling for edge cases such as:
- Matrices with zero singular values
- Vectors becoming zero during orthogonalization
- Different matrix dimensions (m >> n or n >> m)
Performance Considerations
This pure Python implementation prioritizes clarity and correctness over performance. For large matrices, several optimizations could be made:
1. Sparse Matrix Representation: For matrices with many zeros, a sparse representation would reduce memory usage and computation time.
2. Block Methods: Processing multiple vectors simultaneously can better utilize cache locality and improve convergence for clustered singular values.
3. Randomized SVD: For very large matrices, randomized algorithms can approximate the SVD with significantly less computation.
Conclusion
Our Python implementation demonstrates how SVD can be computed from first principles without relying on specialized libraries. While libraries like NumPy and SciPy provide highly optimized SVD implementations for practical use, understanding the underlying algorithm provides valuable insights into the mathematics of matrix decomposition.
The full SVD implementation reveals the elegance of the algorithm and the power of linear algebra in extracting the fundamental structure of data. Whether you're compressing images, building recommendation systems, or analyzing high-dimensional data, SVD provides a powerful tool for understanding and transforming matrices.
By implementing SVD in pure Python, we've created an educational resource that makes this important algorithm accessible and comprehensible, providing a foundation for further exploration of linear algebra and its applications in data science and machine learning.
import math
import random
from copy import deepcopy
def svd(A, max_iterations=100, tolerance=1e-10):
"""
Compute the Singular Value Decomposition of a matrix A.
Returns U, S, V such that A = U * S * V.T
Args:
A: Input matrix as a list of lists
max_iterations: Maximum number of iterations for power method
tolerance: Convergence tolerance
Returns:
U: Left singular vectors
S: Singular values (diagonal matrix)
V: Right singular vectors
"""
# Get matrix dimensions
m = len(A)
n = len(A[0]) if m > 0 else 0
# Matrix operations
def transpose(matrix):
return [[matrix[j][i] for j in range(len(matrix))] for i in range(len(matrix[0]))]
def matrix_multiply(A, B):
C = [[0 for _ in range(len(B[0]))] for _ in range(len(A))]
for i in range(len(A)):
for j in range(len(B[0])):
for k in range(len(B)):
C[i][j] += A[i][k] * B[k][j]
return C
def matrix_vector_multiply(A, v):
"""Multiply matrix A by vector v"""
result = [0] * len(A)
for i in range(len(A)):
for j in range(len(A[0])):
if j < len(v):
result[i] += A[i][j] * v[j]
return result
def vector_norm(v):
"""Compute L2 norm of a vector"""
return math.sqrt(sum(x*x for x in v))
def normalize_vector(v):
"""Normalize a vector to unit length"""
norm = vector_norm(v)
if norm < tolerance:
return [0] * len(v)
return [x/norm for x in v]
# Helper function to generate a random orthogonal vector to a set of vectors
def random_orthogonal_vector(vectors, size):
"""Generate a random vector orthogonal to all vectors in the list"""
# Start with a random vector
v = [random.random() for _ in range(size)]
# Make it orthogonal to all vectors in the list
for vector in vectors:
# Calculate dot product
dot_prod = sum(v[i] * vector[i] for i in range(min(len(v), len(vector))))
# Subtract projection
for i in range(size):
if i < len(vector):
v[i] -= dot_prod * vector[i]
# Normalize
return normalize_vector(v)
# Perform full SVD using a power iteration approach
def full_svd(A, k):
"""
Compute the rank-k approximation of the SVD.
Args:
A: Input matrix
k: Maximum rank
Returns:
U: Left singular vectors
S: Singular values
V: Right singular vectors
"""
m, n = len(A), len(A[0])
A_t = transpose(A)
# Initialize storage for singular triplets
singular_values = []
U_vectors = []
V_vectors = []
# Create a copy of A to work with
A_work = [row[:] for row in A]
# Iteratively find singular triplets
for i in range(min(k, min(m, n))):
# Choose whether to use A*A^T or A^T*A based on dimensions
use_AAt = m >= n
if use_AAt:
# Initialize random vector for power iteration
u = [random.random() for _ in range(m)]
# Make orthogonal to previous U vectors
for vec in U_vectors:
dot_prod = sum(u[j] * vec[j] for j in range(m))
u = [u[j] - dot_prod * vec[j] for j in range(m)]
# Normalize
u = normalize_vector(u)
# Power iteration
for _ in range(max_iterations):
# v = A^T * u
v = [0] * n
for j in range(n):
for l in range(m):
v[j] += A_work[l][j] * u[l]
# Normalize v
v_norm = vector_norm(v)
if v_norm < tolerance:
# Matrix is effectively zero
break
v = [v[j] / v_norm for j in range(n)]
# u_new = A * v
u_new = [0] * m
for j in range(m):
for l in range(n):
u_new[j] += A_work[j][l] * v[l]
# Compute sigma (singular value)
sigma = vector_norm(u_new)
if sigma < tolerance:
# Effectively zero
break
# Update u
u_new = [u_new[j] / sigma for j in range(m)]
# Check convergence
if sum((u_new[j] - u[j])**2 for j in range(m)) < tolerance:
u = u_new
break
u = u_new
else:
# Initialize random vector for power iteration
v = [random.random() for _ in range(n)]
# Make orthogonal to previous V vectors
for vec in V_vectors:
dot_prod = sum(v[j] * vec[j] for j in range(n))
v = [v[j] - dot_prod * vec[j] for j in range(n)]
# Normalize
v = normalize_vector(v)
# Power iteration
for _ in range(max_iterations):
# u = A * v
u = [0] * m
for j in range(m):
for l in range(n):
u[j] += A_work[j][l] * v[l]
# Normalize u
u_norm = vector_norm(u)
if u_norm < tolerance:
# Matrix is effectively zero
break
u = [u[j] / u_norm for j in range(m)]
# v_new = A^T * u
v_new = [0] * n
for j in range(n):
for l in range(m):
v_new[j] += A_work[l][j] * u[l]
# Compute sigma (singular value)
sigma = vector_norm(v_new)
if sigma < tolerance:
# Effectively zero
break
# Update v
v_new = [v_new[j] / sigma for j in range(n)]
# Check convergence
if sum((v_new[j] - v[j])**2 for j in range(n)) < tolerance:
v = v_new
break
v = v_new
# One more multiplication to get final triplet
if use_AAt:
# v = A^T * u
v = [0] * n
for j in range(n):
for l in range(m):
v[j] += A_work[l][j] * u[l]
# Compute sigma
sigma = vector_norm(v)
if sigma > tolerance:
v = [v[j] / sigma for j in range(n)]
else:
# Zero singular value
sigma = 0
v = random_orthogonal_vector(V_vectors, n)
else:
# u = A * v
u = [0] * m
for j in range(m):
for l in range(n):
u[j] += A_work[j][l] * v[l]
# Compute sigma
sigma = vector_norm(u)
if sigma > tolerance:
u = [u[j] / sigma for j in range(m)]
else:
# Zero singular value
sigma = 0
u = random_orthogonal_vector(U_vectors, m)
# Store the singular triplet
singular_values.append(sigma)
U_vectors.append(u)
V_vectors.append(v)
# Deflate the matrix: A = A - sigma * u * v^T
for j in range(m):
for l in range(n):
A_work[j][l] -= sigma * u[j] * v[l]
# Complete basis for U if necessary
while len(U_vectors) < m:
u = random_orthogonal_vector(U_vectors, m)
U_vectors.append(u)
# Complete basis for V if necessary
while len(V_vectors) < n:
v = random_orthogonal_vector(V_vectors, n)
V_vectors.append(v)
# Create the diagonal singular value matrix
S = [[0 for _ in range(n)] for _ in range(m)]
for i in range(min(len(singular_values), min(m, n))):
S[i][i] = singular_values[i]
# Create proper U and V matrices
U = [[U_vectors[i][j] for i in range(m)] for j in range(m)]
V = [[V_vectors[i][j] for i in range(n)] for j in range(n)]
# Transpose V to get V
V = transpose(V)
return U, S, V
# Compute full SVD
return full_svd(A, min(m, n))
def test_svd():
# Create a test matrix
A = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
# Call SVD function
U, S, V = svd(A)
print("Matrix A:")
for row in A:
print(row)
print("\nLeft singular vectors (U):")
for row in U:
print([round(x, 6) for x in row])
print("\nSingular values (S):")
for i in range(len(S)):
print([round(x, 6) for x in S[i]])
print("\nRight singular vectors (V):")
for row in V:
print([round(x, 6) for x in row])
# Reconstruct the original matrix to verify
def matrix_multiply(A, B):
C = [[0 for _ in range(len(B[0]))] for _ in range(len(A))]
for i in range(len(A)):
for j in range(len(B[0])):
for k in range(len(B)):
C[i][j] += A[i][k] * B[k][j]
return C
US = matrix_multiply(U, S)
A_reconstructed = matrix_multiply(US, V)
print("\nReconstructed matrix A:")
for row in A_reconstructed:
print([round(x, 6) for x in row])
# Compute reconstruction error
error = 0
for i in range(len(A)):
for j in range(len(A[0])):
error += (A[i][j] - A_reconstructed[i][j])**2
error = math.sqrt(error)
print(f"\nReconstruction error: {error}")
if __name__ == "__main__":
test_svd()
No comments:
Post a Comment