Tuesday, May 20, 2025

Understanding Singular Value Decomposition (SVD) and Its Implementation in Pure Python


Introduction to SVD


Singular Value Decomposition (SVD) is one of the most powerful and versatile techniques in linear algebra. It decomposes any matrix into three components, revealing fundamental properties about the original matrix, its rank, null spaces, and range. SVD is particularly applicable to AI. 

This article explores the theory behind SVD and presents a pure Python implementation that doesn't rely on external libraries like NumPy.


The Mathematics of SVD


For any matrix A of dimensions m×n, the SVD decomposes it as:


A = U · Σ · V^T


Where:

- U is an m×m orthogonal matrix containing the left singular vectors

- Σ is an m×n diagonal matrix containing the singular values in descending order

- V^T is the transpose of an n×n orthogonal matrix V containing the right singular vectors


The singular values (the diagonal entries of Σ) represent the "stretching factors" of the transformation represented by matrix A. They reveal the rank of the matrix and provide a measure of how information is distributed across different dimensions.


Applications of SVD


Before diving into the implementation, it's worth understanding why SVD is so valuable:


1. Dimensionality Reduction: By keeping only the largest singular values and their corresponding vectors, we can create low-rank approximations of matrices (the foundation of techniques like PCA).


2. Image Compression: SVD allows us to represent images using fewer coefficients while preserving essential features.


3. Recommendation Systems: The Netflix Prize competition demonstrated how matrix factorization methods derived from SVD can be effective for collaborative filtering.


4. Noise Reduction: By eliminating components associated with small singular values, we can often reduce noise in data.


5. Pseudoinverse Calculation: SVD provides a stable way to compute the Moore-Penrose pseudoinverse of matrices.


Understanding Our Implementation


Our Python implementation demonstrates how to compute the SVD using only built-in Python features. Let's examine the key components:


The Power Iteration Method


At the heart of our implementation is the power iteration method, an algorithm for finding the dominant eigenvalue and eigenvector of a matrix. For SVD, we apply this to either A·A^T or A^T·A (depending on which is smaller) to find singular values and vectors.


The power method works by repeatedly multiplying a vector by the matrix and normalizing, which causes the vector to converge to the eigenvector corresponding to the largest eigenvalue:


# Power iteration simplified

x = random_vector

for _ in range(iterations):

    y = matrix_multiply(A, x)

    x = normalize(y)

# x is now (approximately) the dominant eigenvector



Matrix Deflation


After finding each singular triplet (a singular value and its corresponding left and right singular vectors), we use deflation to remove that component from the working matrix:



# Deflation

A_work[j][l] -= sigma * u[j] * v[l]



This allows us to find subsequent singular triplets in order of decreasing singular values.


Adaptive Dimensionality


One key innovation in our implementation is adaptively choosing whether to work with A·A^T or A^T·A based on the matrix dimensions:



use_AAt = m >= n



This approach ensures we work with the smaller of the two matrices, improving computational efficiency.


Completing Orthonormal Bases


For matrices that are not full rank (like our test matrix [1,2,3; 4,5,6; 7,8,9]), we need to complete the orthonormal bases for both U and V:



# Complete basis for U if necessary

while len(U_vectors) < m:

    u = random_orthogonal_vector(U_vectors, m)

    U_vectors.append(u)



This ensures that U and V are proper orthogonal matrices as required by the SVD definition.


Technical Challenges and Solutions


Implementing SVD without specialized libraries presented several challenges:


Numerical Stability


Floating-point arithmetic errors can accumulate in iterative methods. We addressed this with several techniques:


1. Explicit orthogonalization against previous vectors

2. Careful normalization with tolerance checks

3. Convergence criteria based on vector differences rather than just eigenvalues


Handling Rank-Deficient Matrices


Many matrices in practice aren't full rank. Our implementation handles this by:


1. Detecting near-zero singular values based on a tolerance parameter

2. Generating appropriate random orthogonal vectors to complete the bases

3. Properly initializing the singular value matrix with zeros


Edge Cases


Our code includes specific handling for edge cases such as:


- Matrices with zero singular values

- Vectors becoming zero during orthogonalization

- Different matrix dimensions (m >> n or n >> m)


Performance Considerations


This pure Python implementation prioritizes clarity and correctness over performance. For large matrices, several optimizations could be made:


1. Sparse Matrix Representation: For matrices with many zeros, a sparse representation would reduce memory usage and computation time.


2. Block Methods: Processing multiple vectors simultaneously can better utilize cache locality and improve convergence for clustered singular values.


3. Randomized SVD: For very large matrices, randomized algorithms can approximate the SVD with significantly less computation.


Conclusion


Our Python implementation demonstrates how SVD can be computed from first principles without relying on specialized libraries. While libraries like NumPy and SciPy provide highly optimized SVD implementations for practical use, understanding the underlying algorithm provides valuable insights into the mathematics of matrix decomposition.


The full SVD implementation reveals the elegance of the algorithm and the power of linear algebra in extracting the fundamental structure of data. Whether you're compressing images, building recommendation systems, or analyzing high-dimensional data, SVD provides a powerful tool for understanding and transforming matrices.


By implementing SVD in pure Python, we've created an educational resource that makes this important algorithm accessible and comprehensible, providing a foundation for further exploration of linear algebra and its applications in data science and machine learning.​​​​​​​​​​​​​​​​



import math

import random

from copy import deepcopy


    def svd(A, max_iterations=100, tolerance=1e-10):

        """

        Compute the Singular Value Decomposition of a matrix A.

        Returns U, S, V such that A = U * S * V.T

        

        Args:

            A: Input matrix as a list of lists

            max_iterations: Maximum number of iterations for power method

            tolerance: Convergence tolerance

            

        Returns:

            U: Left singular vectors

            S: Singular values (diagonal matrix)

            V: Right singular vectors

        """

        # Get matrix dimensions

        m = len(A)

        n = len(A[0]) if m > 0 else 0

        

        # Matrix operations

        def transpose(matrix):

            return [[matrix[j][i] for j in range(len(matrix))] for i in range(len(matrix[0]))]

        

        def matrix_multiply(A, B):

            C = [[0 for _ in range(len(B[0]))] for _ in range(len(A))]

            for i in range(len(A)):

                for j in range(len(B[0])):

                    for k in range(len(B)):

                        C[i][j] += A[i][k] * B[k][j]

            return C

        

        def matrix_vector_multiply(A, v):

            """Multiply matrix A by vector v"""

            result = [0] * len(A)

            for i in range(len(A)):

                for j in range(len(A[0])):

                    if j < len(v):

                        result[i] += A[i][j] * v[j]

            return result

        

        def vector_norm(v):

            """Compute L2 norm of a vector"""

            return math.sqrt(sum(x*x for x in v))

        

        def normalize_vector(v):

            """Normalize a vector to unit length"""

            norm = vector_norm(v)

            if norm < tolerance:

                return [0] * len(v)

            return [x/norm for x in v]

        

        # Helper function to generate a random orthogonal vector to a set of vectors

        def random_orthogonal_vector(vectors, size):

            """Generate a random vector orthogonal to all vectors in the list"""

            # Start with a random vector

            v = [random.random() for _ in range(size)]

            

            # Make it orthogonal to all vectors in the list

            for vector in vectors:

                # Calculate dot product

                dot_prod = sum(v[i] * vector[i] for i in range(min(len(v), len(vector))))

                

                # Subtract projection

                for i in range(size):

                    if i < len(vector):

                        v[i] -= dot_prod * vector[i]

            

            # Normalize

            return normalize_vector(v)

        

        # Perform full SVD using a power iteration approach

        def full_svd(A, k):

            """

            Compute the rank-k approximation of the SVD.

            

            Args:

                A: Input matrix

                k: Maximum rank

                

            Returns:

                U: Left singular vectors

                S: Singular values

                V: Right singular vectors

            """

            m, n = len(A), len(A[0])

            A_t = transpose(A)

            

            # Initialize storage for singular triplets

            singular_values = []

            U_vectors = []

            V_vectors = []

            

            # Create a copy of A to work with

            A_work = [row[:] for row in A]

            

            # Iteratively find singular triplets

            for i in range(min(k, min(m, n))):

                # Choose whether to use A*A^T or A^T*A based on dimensions

                use_AAt = m >= n

                

                if use_AAt:

                    # Initialize random vector for power iteration

                    u = [random.random() for _ in range(m)]

                    

                    # Make orthogonal to previous U vectors

                    for vec in U_vectors:

                        dot_prod = sum(u[j] * vec[j] for j in range(m))

                        u = [u[j] - dot_prod * vec[j] for j in range(m)]

                    

                    # Normalize

                    u = normalize_vector(u)

                    

                    # Power iteration

                    for _ in range(max_iterations):

                        # v = A^T * u

                        v = [0] * n

                        for j in range(n):

                            for l in range(m):

                                v[j] += A_work[l][j] * u[l]

                        

                        # Normalize v

                        v_norm = vector_norm(v)

                        if v_norm < tolerance:

                            # Matrix is effectively zero

                            break

                        

                        v = [v[j] / v_norm for j in range(n)]

                        

                        # u_new = A * v

                        u_new = [0] * m

                        for j in range(m):

                            for l in range(n):

                                u_new[j] += A_work[j][l] * v[l]

                        

                        # Compute sigma (singular value)

                        sigma = vector_norm(u_new)

                        if sigma < tolerance:

                            # Effectively zero

                            break

                        

                        # Update u

                        u_new = [u_new[j] / sigma for j in range(m)]

                        

                        # Check convergence

                        if sum((u_new[j] - u[j])**2 for j in range(m)) < tolerance:

                            u = u_new

                            break

                        

                        u = u_new

                else:

                    # Initialize random vector for power iteration

                    v = [random.random() for _ in range(n)]

                    

                    # Make orthogonal to previous V vectors

                    for vec in V_vectors:

                        dot_prod = sum(v[j] * vec[j] for j in range(n))

                        v = [v[j] - dot_prod * vec[j] for j in range(n)]

                    

                    # Normalize

                    v = normalize_vector(v)

                    

                    # Power iteration

                    for _ in range(max_iterations):

                        # u = A * v

                        u = [0] * m

                        for j in range(m):

                            for l in range(n):

                                u[j] += A_work[j][l] * v[l]

                        

                        # Normalize u

                        u_norm = vector_norm(u)

                        if u_norm < tolerance:

                            # Matrix is effectively zero

                            break

                        

                        u = [u[j] / u_norm for j in range(m)]

                        

                        # v_new = A^T * u

                        v_new = [0] * n

                        for j in range(n):

                            for l in range(m):

                                v_new[j] += A_work[l][j] * u[l]

                        

                        # Compute sigma (singular value)

                        sigma = vector_norm(v_new)

                        if sigma < tolerance:

                            # Effectively zero

                            break

                        

                        # Update v

                        v_new = [v_new[j] / sigma for j in range(n)]

                        

                        # Check convergence

                        if sum((v_new[j] - v[j])**2 for j in range(n)) < tolerance:

                            v = v_new

                            break

                        

                        v = v_new

                

                # One more multiplication to get final triplet

                if use_AAt:

                    # v = A^T * u

                    v = [0] * n

                    for j in range(n):

                        for l in range(m):

                            v[j] += A_work[l][j] * u[l]

                    

                    # Compute sigma

                    sigma = vector_norm(v)

                    if sigma > tolerance:

                        v = [v[j] / sigma for j in range(n)]

                    else:

                        # Zero singular value

                        sigma = 0

                        v = random_orthogonal_vector(V_vectors, n)

                else:

                    # u = A * v

                    u = [0] * m

                    for j in range(m):

                        for l in range(n):

                            u[j] += A_work[j][l] * v[l]

                    

                    # Compute sigma

                    sigma = vector_norm(u)

                    if sigma > tolerance:

                        u = [u[j] / sigma for j in range(m)]

                    else:

                        # Zero singular value

                        sigma = 0

                        u = random_orthogonal_vector(U_vectors, m)

                

                # Store the singular triplet

                singular_values.append(sigma)

                U_vectors.append(u)

                V_vectors.append(v)

                

                # Deflate the matrix: A = A - sigma * u * v^T

                for j in range(m):

                    for l in range(n):

                        A_work[j][l] -= sigma * u[j] * v[l]

            

            # Complete basis for U if necessary

            while len(U_vectors) < m:

                u = random_orthogonal_vector(U_vectors, m)

                U_vectors.append(u)

            

            # Complete basis for V if necessary

            while len(V_vectors) < n:

                v = random_orthogonal_vector(V_vectors, n)

                V_vectors.append(v)

            

            # Create the diagonal singular value matrix

            S = [[0 for _ in range(n)] for _ in range(m)]

            for i in range(min(len(singular_values), min(m, n))):

                S[i][i] = singular_values[i]

            

            # Create proper U and V matrices

            U = [[U_vectors[i][j] for i in range(m)] for j in range(m)]

            V = [[V_vectors[i][j] for i in range(n)] for j in range(n)]

            

            # Transpose V to get V

            V = transpose(V)

            

            return U, S, V

        

        # Compute full SVD

        return full_svd(A, min(m, n))


def test_svd():

    # Create a test matrix

    A = [

        [1, 2, 3],

        [4, 5, 6],

        [7, 8, 9]

    ]

    

    # Call SVD function

    U, S, V = svd(A)

    

    print("Matrix A:")

    for row in A:

        print(row)

    

    print("\nLeft singular vectors (U):")

    for row in U:

        print([round(x, 6) for x in row])

    

    print("\nSingular values (S):")

    for i in range(len(S)):

        print([round(x, 6) for x in S[i]])

    

    print("\nRight singular vectors (V):")

    for row in V:

        print([round(x, 6) for x in row])

    

    # Reconstruct the original matrix to verify

    def matrix_multiply(A, B):

        C = [[0 for _ in range(len(B[0]))] for _ in range(len(A))]

        for i in range(len(A)):

            for j in range(len(B[0])):

                for k in range(len(B)):

                    C[i][j] += A[i][k] * B[k][j]

        return C

    

    US = matrix_multiply(U, S)

    A_reconstructed = matrix_multiply(US, V)

    

    print("\nReconstructed matrix A:")

    for row in A_reconstructed:

        print([round(x, 6) for x in row])

    

    # Compute reconstruction error

    error = 0

    for i in range(len(A)):

        for j in range(len(A[0])):

            error += (A[i][j] - A_reconstructed[i][j])**2

    error = math.sqrt(error)

    print(f"\nReconstruction error: {error}")


if __name__ == "__main__":

    test_svd()

No comments: