Welcome, fellow developers, to an exciting journey into the world of Autoencoders! These fascinating neural network architectures are not only powerful tools for data compression and feature learning but also a gateway to understanding more complex generative models. This tutorial will demystify autoencoders, starting from their fundamental concepts and gradually building up to a practical, runnable code example. By the end, you will have a solid grasp of how they work and how to implement them.
Section 1: Theoretical Foundations of Autoencoders
An autoencoder is a type of artificial neural network used for unsupervised learning of efficient data codings. The primary goal of an autoencoder is to learn a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore signal "noise." It attempts to reconstruct its own input. This means that the output of the autoencoder should be as close as possible to its input.
1.1 The Core Idea: Data Compression and Reconstruction
Imagine you have a large image, and you want to store it efficiently without losing too much detail. An autoencoder approaches this problem by learning to compress the image into a smaller representation and then decompressing it back to its original size. The "compressed" representation is what we call the latent space or bottleneck layer. If the autoencoder can accurately reconstruct the original image from this compressed form, it implies that the latent space effectively captures the most important features of the image.
The fundamental concept revolves around two main parts: an encoder and a decoder. The encoder takes the input data and transforms it into a lower-dimensional representation. The decoder then takes this lower-dimensional representation and transforms it back into the original data format. The entire network is trained to minimize the difference between the input and the reconstructed output.
1.2 Encoder-Decoder Architecture
The architecture of an autoencoder is symmetric around a central layer, often referred to as the "bottleneck" or "latent space."
Input Data
|
V
[ Encoder ] (Compresses data)
|
V
[Latent Space] (Bottleneck - compact representation)
|
V
[ Decoder ] (Reconstructs data)
|
V
Reconstructed Data
Let us explore each component in detail.
Encoder: The encoder is the first part of the autoencoder. It is responsible for taking the input data and transforming it into a lower-dimensional representation. This transformation is typically achieved through a series of layers, often dense (fully connected) layers or convolutional layers, depending on the type of data (e.g., tabular data versus images). Each layer in the encoder progressively extracts more abstract and compressed features from the input. The output of the encoder is the latent space representation.
Latent Space (Bottleneck): This is the most crucial part of the autoencoder. It is the layer with the fewest neurons, acting as an information bottleneck. The encoder is forced to learn the most salient features of the input data to represent it in this compressed form. If the latent space were as large as the input, the autoencoder might simply learn an identity function, copying the input directly to the output without learning any meaningful compression. The smaller the latent space, the more compressed the representation, and the more challenging it is for the autoencoder to reconstruct the input accurately, thus forcing it to learn more robust features.
Decoder: The decoder is the second part of the autoencoder. Its job is to take the compressed representation from the latent space and reconstruct the original input data. Similar to the encoder, the decoder typically consists of a series of layers that gradually expand the dimensions back to the original input size. These layers are often mirror images of the encoder layers in terms of their structure, but they perform the inverse operation (e.g., `Dense` layers or `Conv2DTranspose` layers for images). The final layer of the decoder produces the reconstructed output.
1.3 Loss Function: Measuring Reconstruction Error
To train an autoencoder, we need a way to quantify how well it reconstructs its input. This is where the loss function comes into play. The loss function measures the discrepancy between the original input and the reconstructed output. The goal during training is to minimize this loss.
For continuous data like images with pixel values, a common choice for the loss function is the Mean Squared Error (MSE). The Mean Squared Error calculates the average of the squared differences between the actual pixel values of the input image and the predicted pixel values of the reconstructed image.
MSE = (1/N) * SUM((Input_i - Reconstructed_i)^2)
Here, 'N' represents the total number of data points or pixels, and 'i' iterates through each data point. The squaring ensures that positive and negative errors do not cancel each other out and penalizes larger errors more heavily.
For categorical data or when dealing with probabilities, Binary Cross-Entropy might be a more suitable choice. The selection of the loss function depends on the nature of the input data and the desired output characteristics.
1.4 Training Process
Training an autoencoder is similar to training any other neural network. It involves an iterative process of feeding data through the network, calculating the loss, and then updating the network's weights and biases to reduce that loss.
Forward Pass: An input data point is fed into the encoder, which generates a latent space representation. This representation is then passed to the decoder, which produces a reconstructed output.
Loss Calculation: The loss function compares the reconstructed output with the original input, yielding a numerical value indicating the reconstruction error.
Backward Pass (Backpropagation): The calculated loss is then propagated backward through the network. This process determines how much each weight and bias in the network contributed to the error.
Optimization: An optimization algorithm, such as Adam or Stochastic Gradient Descent (SGD), uses the gradients computed during backpropagation to adjust the network's weights and biases. The goal is to move these parameters in a direction that minimizes the loss function.
This cycle is repeated for many epochs (full passes over the entire dataset) and batches (subsets of the dataset), gradually improving the autoencoder's ability to reconstruct its input accurately.
1.5 Use Cases
Autoencoders are versatile tools with several practical applications:
- Dimensionality Reduction: By learning a compact latent space representation, autoencoders can effectively reduce the number of features needed to describe data, similar to Principal Component Analysis (PCA) but with the ability to learn non-linear relationships.
- Feature Learning: The latent space often contains meaningful, lower-dimensional features that capture the essential characteristics of the input data. These learned features can then be used as input for other machine learning models, potentially improving their performance.
- Anomaly Detection: Autoencoders are trained on "normal" data. When presented with anomalous data, they struggle to reconstruct it accurately, resulting in a high reconstruction error. This high error can be used as an indicator of an anomaly.
- Denoising: A denoising autoencoder is trained to reconstruct clean data from corrupted input (e.g., images with noise). By learning to remove the noise, it effectively learns robust features of the underlying clean data.
Section 2: Types of Autoencoders
While the core principle remains the same, various modifications to the autoencoder architecture have led to different types, each suited for specific tasks.
2.1 Vanilla Autoencoder
This is the basic autoencoder architecture we have been discussing, consisting of an encoder, a latent space, and a decoder. It aims to learn an identity function under a dimensionality constraint, meaning it tries to output exactly what it was given as input, but through a compressed representation.
2.2 Denoising Autoencoder
A denoising autoencoder is designed to be more robust to noise and to learn more meaningful representations. Instead of feeding the clean input directly, it takes a corrupted version of the input (e.g., an image with added noise) and is trained to reconstruct the original, clean input. This forces the autoencoder to learn to extract robust features that are invariant to the noise.
2.3 Variational Autoencoder (VAE)
Variational Autoencoders are a more advanced type of autoencoder that takes a probabilistic approach to the latent space. Instead of learning a fixed latent vector, the encoder learns the parameters of a probability distribution (mean and variance) for each dimension in the latent space. The decoder then samples from this distribution to reconstruct the output. VAEs are powerful generative models capable of creating new, realistic data samples. While fascinating, they involve more complex mathematical foundations and are typically covered in more advanced tutorials.
2.4 Sparse Autoencoder
Sparse autoencoders introduce a sparsity constraint on the latent space. This means that only a small number of neurons in the latent layer are allowed to be active at any given time. This encourages the network to learn highly specialized features, where each feature detector responds to a unique pattern in the input. Sparsity can be enforced by adding a regularization term to the loss function, such as the Kullback-Leibler divergence.
For this tutorial, we will focus on implementing a **Vanilla Autoencoder**, as it provides a solid foundation for understanding the core concepts before delving into more specialized variants.
Section 3: Building a Simple Autoencoder - A Step-by-Step Practical Guide
Now, let us get our hands dirty and implement a simple autoencoder using Python and the Keras API, which is part of TensorFlow. We will use the famous MNIST dataset, which consists of handwritten digits, as our running example. This dataset is perfect for demonstrating image reconstruction.
3.1 Setting up the Environment
First, ensure you have TensorFlow installed. If not, you can install it using pip:
pip install tensorflow numpy matplotlib
We will need `tensorflow` for building the neural network, `numpy` for numerical operations, and `matplotlib` for visualizing our results.
3.2 Data Preparation
The MNIST dataset contains 60,000 training images and 10,000 testing images of handwritten digits (0-9). Each image is a 28x28 pixel grayscale image.
Here is the code snippet to load and preprocess the MNIST dataset:
# Import necessary libraries
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# Load the MNIST dataset
# The dataset is split into training and testing sets, each containing images and their labels.
(x_train, _), (x_test, _) = tf.keras.datasets.mnist.load_data()
# Data Preprocessing:
# 1. Normalize pixel values: Scale images to the range [0, 1].
# Original pixel values are typically integers from 0 to 255.
# Dividing by 255.0 converts them to floating-point numbers in the desired range.
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
# 2. Reshape images: Flatten each 28x28 image into a 784-dimensional vector.
# Autoencoders often work with flattened inputs, especially dense autoencoders.
# The -1 in reshape automatically calculates the dimension based on the array's size.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))
# Print the shapes to verify the preprocessing
print("Shape of training data after preprocessing:", x_train.shape)
print("Shape of testing data after preprocessing:", x_test.shape)
The output of the print statements should look something like this, confirming that each image is now a 784-dimensional vector:
Shape of training data after preprocessing: (60000, 784)
Shape of testing data after preprocessing: (10000, 784)
3.3 Defining the Autoencoder Architecture
We will build a simple dense autoencoder. The input layer will have 784 neurons (for the flattened 28x28 images). The encoder will compress this into a smaller latent space (e.g., 32 neurons), and the decoder will expand it back to 784 neurons.
We will use the Keras Functional API, which is flexible for defining more complex models.
# Define the input dimension, which is the flattened size of an MNIST image (28*28 = 784).
input_dim = x_train.shape[1]
# Define the dimension of the latent space (bottleneck layer).
# This is a hyperparameter that determines the degree of compression.
latent_dim = 32
# Encoder Definition
# The encoder takes the input and compresses it into the latent space.
# It consists of several dense layers with activation functions.
input_layer = tf.keras.layers.Input(shape=(input_dim,))
# First encoding layer: Reduces dimensions from 784 to 128.
# 'relu' (Rectified Linear Unit) is a common activation function.
encoded_layer_1 = tf.keras.layers.Dense(128, activation='relu')(input_layer)
# Second encoding layer: Further reduces dimensions from 128 to 64.
encoded_layer_2 = tf.keras.layers.Dense(64, activation='relu')(encoded_layer_1)
# Latent space layer: The bottleneck, compressing data to 'latent_dim' (32) dimensions.
latent_space = tf.keras.layers.Dense(latent_dim, activation='relu')(encoded_layer_2)
# Decoder Definition
# The decoder takes the latent space representation and reconstructs the original input.
# It mirrors the encoder's structure, expanding dimensions.
# First decoding layer: Expands from 'latent_dim' (32) to 64.
decoded_layer_1 = tf.keras.layers.Dense(64, activation='relu')(latent_space)
# Second decoding layer: Expands from 64 to 128.
decoded_layer_2 = tf.keras.layers.Dense(128, activation='relu')(decoded_layer_1)
# Output layer: Reconstructs the original 784 dimensions.
# 'sigmoid' activation is used here because pixel values are normalized to [0, 1].
# Sigmoid squashes values between 0 and 1, which is suitable for image reconstruction.
output_layer = tf.keras.layers.Dense(input_dim, activation='sigmoid')(decoded_layer_2)
# Autoencoder Model Assembly
# Create the full autoencoder model by connecting the input to the output.
autoencoder = tf.keras.Model(inputs=input_layer, outputs=output_layer)
# Encoder Model (for extracting latent representations later)
# This model takes the input and outputs the latent space representation.
encoder = tf.keras.Model(inputs=input_layer, outputs=latent_space)
# Decoder Model (for generating reconstructions from latent representations later)
# This model takes the latent space as input and outputs the reconstructed image.
# We need a separate input layer for the decoder model.
encoded_input = tf.keras.layers.Input(shape=(latent_dim,))
# Re-use the decoder layers from the autoencoder.
decoder_output = autoencoder.layers[-3](encoded_input) # Layer 128
decoder_output = autoencoder.layers[-2](decoder_output) # Layer 64
decoder_output = autoencoder.layers[-1](decoder_output) # Output Layer
decoder = tf.keras.Model(inputs=encoded_input, outputs=decoder_output)
# Print a summary of the autoencoder model to see its architecture.
autoencoder.summary()
The `autoencoder.summary()` output will show the layers, output shapes, and number of parameters, confirming the structure we defined.
3.4 Compiling the Model
Before training, we need to compile the model. This involves specifying the optimizer and the loss function.
# Compile the autoencoder model.
# 'optimizer': 'adam' is a popular choice for its efficiency and good performance.
# 'loss': 'mse' (Mean Squared Error) is used because we are reconstructing continuous pixel values
# and want to minimize the squared difference between original and reconstructed images.
autoencoder.compile(optimizer='adam', loss='mse')
print("Autoencoder model compiled successfully.")
3.5 Training the Autoencoder
Training involves fitting the model to our preprocessed training data. Since it is an autoencoder, the input to the model is also its target output.
# Train the autoencoder model.
# 'x_train' is used as both the input and the target output for the autoencoder.
# 'epochs': Number of times to iterate over the entire training dataset. More epochs can lead to better learning,
# but also risk overfitting.
# 'batch_size': Number of samples per gradient update. Smaller batches introduce more noise but can help
# generalization. Larger batches provide a more stable gradient estimate.
# 'shuffle': Set to True to shuffle the training data before each epoch, which helps prevent the model
# from learning the order of the training examples.
# 'validation_data': Provides a separate dataset for evaluating the model's performance during training.
# This helps monitor for overfitting. Here, we use 'x_test' as validation input and target.
history = autoencoder.fit(x_train, x_train,
epochs=50,
batch_size=256,
shuffle=True,
validation_data=(x_test, x_test))
print("Autoencoder training complete.")
You will observe the loss decreasing over epochs for both training and validation data. A good sign is when both losses decrease, and the validation loss does not significantly diverge from the training loss, indicating that the model is generalizing well.
3.6 Evaluating the Autoencoder
After training, we can evaluate how well our autoencoder performs by reconstructing some images from the test set and visually comparing them to the originals.
# Make predictions on the test data.
# The autoencoder will take the test images and output their reconstructed versions.
reconstructed_images = autoencoder.predict(x_test)
# Number of images to display for comparison.
n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
# Display original images.
ax = plt.subplot(2, n, i + 1)
# Reshape the flattened 784-dimensional vector back to a 28x28 image.
plt.imshow(x_test[i].reshape(28, 28))
plt.gray() # Display in grayscale.
ax.get_xaxis().set_visible(False) # Hide x-axis ticks.
ax.get_yaxis().set_visible(False) # Hide y-axis ticks.
ax.set_title("Original") # Set title for original images.
# Display reconstructed images.
ax = plt.subplot(2, n, i + 1 + n)
# Reshape the reconstructed 784-dimensional vector back to a 28x28 image.
plt.imshow(reconstructed_images[i].reshape(28, 28))
plt.gray() # Display in grayscale.
ax.get_xaxis().set_visible(False) # Hide x-axis ticks.
ax.get_yaxis().set_visible(False) # Hide y-axis ticks.
ax.set_title("Reconstructed") # Set title for reconstructed images.
plt.show()
This code will display two rows of images: the top row showing original MNIST digits from the test set, and the bottom row showing their corresponding reconstructions generated by the autoencoder. You should observe that the reconstructed digits closely resemble the originals, demonstrating the autoencoder's ability to learn and reproduce the essential features of the handwritten digits.
Section 4: Advanced Concepts and Considerations
While our simple autoencoder is a great starting point, there are several ways to improve its performance and explore its capabilities further.
4.1 Hyperparameter Tuning
The performance of an autoencoder heavily depends on its hyperparameters. Experimenting with these values can significantly impact reconstruction quality and the learned latent representation.
Latent Space Dimension: The size of the bottleneck layer (`latent_dim`). A smaller dimension forces more compression but might lead to loss of detail. A larger dimension might allow the autoencoder to simply memorize the input without learning meaningful features. Finding the right balance is crucial.
Number of Layers: The depth of the encoder and decoder networks. Deeper networks can learn more complex mappings but are harder to train and prone to overfitting.
Activation Functions: The choice of activation functions (e.g., ReLU, Leaky ReLU, Sigmoid, Tanh) for hidden layers. ReLU is generally a good default for hidden layers, while Sigmoid or Tanh are often used for the output layer, especially when dealing with normalized data ranges.
Learning Rate: A parameter for the optimizer that controls the step size during weight updates. A too-high learning rate can cause the model to overshoot the optimal solution, while a too-low learning rate can make training very slow.
Batch Size and Epochs: As discussed during training, these influence the training stability and the extent of learning.
4.2 Regularization Techniques
To prevent overfitting, especially in cases where the latent space is not sufficiently small, regularization techniques can be employed.
L1/L2 Regularization: These add a penalty to the loss function based on the magnitude of the model's weights. L1 regularization encourages sparsity in weights, while L2 regularization encourages smaller weights, both helping to prevent complex models that fit noise in the training data.
Dropout: During training, dropout randomly sets a fraction of neuron outputs to zero at each update. This prevents neurons from co-adapting too much and forces the network to learn more robust features.
4.3 Challenges and Limitations
Despite their utility, autoencoders come with their own set of challenges.
Vanishing/Exploding Gradients: In deep networks, gradients can become extremely small (vanishing) or extremely large (exploding) during backpropagation, making training difficult. Proper weight initialization, activation functions, and batch normalization can mitigate these issues.
Reconstruction Quality vs. Compression: There is an inherent trade-off. Extreme compression (very small latent space) will inevitably lead to some loss of detail in reconstruction. The goal is to find a balance where essential information is preserved while achieving significant dimensionality reduction.
Identity Function Learning: If the autoencoder is too powerful (e.g., too many neurons, large latent space, or insufficient regularization), it might simply learn to copy the input to the output without learning any meaningful compressed representation. This is why the bottleneck is crucial.
Conclusion
Congratulations! You have successfully navigated the theoretical landscape and practical implementation of autoencoders. We started by understanding the core concept of data compression and reconstruction, delved into the symmetric encoder-decoder architecture, and explored the role of the loss function and training process. We then touched upon different types of autoencoders before diving into a hands-on example using the MNIST dataset and the Keras API.
You now possess the foundational knowledge to:
Understand the purpose and mechanics of a vanilla autoencoder.
Implement a basic autoencoder for dimensionality reduction and feature learning.
Prepare and preprocess data for autoencoder training.
Evaluate the performance** of an autoencoder through reconstruction quality.
Autoencoders are powerful tools that form the basis for many advanced techniques in deep learning, including generative models like Variational Autoencoders and anomaly detection systems. As you continue your journey, consider exploring convolutional autoencoders for image data, experimenting with different regularization techniques, and diving into the fascinating world of generative models. The principles you have learned here will serve as an excellent springboard for these exciting future explorations.
Keep experimenting, keep learning, and happy coding!
Addendum: Full Running Example Code
This section provides the complete, runnable Python script for the autoencoder example discussed in Section 3. You can save this code as a `.py` file (e.g., `autoencoder_mnist.py`) and run it directly.
# ==============================================================================
# Full Running Example: MNIST Autoencoder
# This script demonstrates a simple dense autoencoder using TensorFlow/Keras
# to reconstruct handwritten digits from the MNIST dataset.
# ==============================================================================
# 1. Import necessary libraries
# ------------------------------------------------------------------------------
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
print("Libraries imported successfully.")
# 2. Data Preparation
# ------------------------------------------------------------------------------
# Load the MNIST dataset.
# x_train and x_test contain the image data, while _ are placeholders for labels,
# which are not needed for unsupervised autoencoder training.
(x_train, _), (x_test, _) = tf.keras.datasets.mnist.load_data()
print("MNIST dataset loaded.")
# Normalize pixel values to the range [0, 1].
# Original pixel values are 0-255. Converting to float32 and dividing by 255.0
# ensures consistent input scaling for the neural network.
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
print("Pixel values normalized.")
# Reshape images from 28x28 to a flattened 784-dimensional vector.
# This is required for dense (fully connected) layers.
input_dim = np.prod(x_train.shape[1:]) # Calculate 28 * 28 = 784
x_train = x_train.reshape((len(x_train), input_dim))
x_test = x_test.reshape((len(x_test), input_dim))
print(f"Images reshaped to {input_dim}-dimensional vectors.")
print(f"Shape of training data: {x_train.shape}")
print(f"Shape of testing data: {x_test.shape}")
# 3. Define the Autoencoder Architecture
# ------------------------------------------------------------------------------
# Define the dimension of the latent space (bottleneck layer).
# This hyperparameter controls the degree of compression.
latent_dim = 32
print(f"Latent space dimension set to: {latent_dim}")
# Encoder Definition:
# The encoder takes the input and compresses it into the latent space.
# It consists of several dense layers with ReLU activation functions.
input_layer = tf.keras.layers.Input(shape=(input_dim,), name='encoder_input')
encoded_layer_1 = tf.keras.layers.Dense(128, activation='relu', name='encoder_dense_1')(input_layer)
encoded_layer_2 = tf.keras.layers.Dense(64, activation='relu', name='encoder_dense_2')(encoded_layer_1)
latent_space = tf.keras.layers.Dense(latent_dim, activation='relu', name='latent_space')(encoded_layer_2)
# Decoder Definition:
# The decoder takes the latent space representation and reconstructs the original input.
# It mirrors the encoder's structure, expanding dimensions.
decoded_layer_1 = tf.keras.layers.Dense(64, activation='relu', name='decoder_dense_1')(latent_space)
decoded_layer_2 = tf.keras.layers.Dense(128, activation='relu', name='decoder_dense_2')(decoded_layer_1)
# The output layer uses 'sigmoid' activation because pixel values are normalized to [0, 1].
output_layer = tf.keras.layers.Dense(input_dim, activation='sigmoid', name='decoder_output')(decoded_layer_2)
# Autoencoder Model Assembly:
# Create the full autoencoder model by connecting the input to the output.
autoencoder = tf.keras.Model(inputs=input_layer, outputs=output_layer, name='mnist_autoencoder')
print("\nAutoencoder model architecture defined.")
autoencoder.summary()
# Create separate encoder and decoder models for potential later use (e.g., feature extraction).
encoder = tf.keras.Model(inputs=input_layer, outputs=latent_space, name='encoder')
# To create the decoder model, we need a new input layer that matches the latent_dim.
encoded_input_for_decoder = tf.keras.layers.Input(shape=(latent_dim,))
# Re-use the decoder layers from the full autoencoder model.
# We access them by name or index. Here, we use index for simplicity.
# Note: Layer indices might change if model architecture is modified.
decoder_layers = autoencoder.layers[4:] # Assuming decoder layers start from index 4
x = encoded_input_for_decoder
for layer in decoder_layers:
x = layer(x)
decoder = tf.keras.Model(inputs=encoded_input_for_decoder, outputs=x, name='decoder')
print("\nSeparate encoder and decoder models created.")
# 4. Compiling the Model
# ------------------------------------------------------------------------------
# Compile the autoencoder model.
# 'optimizer': 'adam' is a robust and widely used optimizer.
# 'loss': 'mse' (Mean Squared Error) is suitable for continuous output values (pixel intensities).
autoencoder.compile(optimizer='adam', loss='mse')
print("\nAutoencoder model compiled with Adam optimizer and MSE loss.")
# 5. Training the Autoencoder
# ------------------------------------------------------------------------------
print("\nStarting autoencoder training...")
# Train the autoencoder. The input data (x_train) is also its target output.
# 'epochs': Number of full passes through the training dataset.
# 'batch_size': Number of samples processed before the model's weights are updated.
# 'shuffle': Shuffles the training data before each epoch to prevent learning data order.
# 'validation_data': Used to monitor the model's performance on unseen data during training.
history = autoencoder.fit(x_train, x_train,
epochs=50,
batch_size=256,
shuffle=True,
validation_data=(x_test, x_test))
print("\nAutoencoder training complete.")
# Plot training & validation loss values
plt.figure(figsize=(10, 6))
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss Progress During Training')
plt.ylabel('Loss (Mean Squared Error)')
plt.xlabel('Epoch')
plt.legend(loc='upper right')
plt.grid(True)
plt.show()
# 6. Evaluating the Autoencoder
# ------------------------------------------------------------------------------
print("\nEvaluating autoencoder performance on test data...")
# Generate reconstructed images from the test set.
reconstructed_images = autoencoder.predict(x_test)
# Visualize original vs. reconstructed images.
n = 10 # Number of images to display.
plt.figure(figsize=(20, 4))
for i in range(n):
# Display original images in the top row.
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i].reshape(28, 28)) # Reshape flattened vector back to 28x28.
plt.gray() # Display as grayscale.
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
ax.set_title("Original")
# Display reconstructed images in the bottom row.
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(reconstructed_images[i].reshape(28, 28)) # Reshape reconstructed vector.
plt.gray() # Display as grayscale.
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
ax.set_title("Reconstructed")
plt.suptitle("Original vs. Reconstructed MNIST Digits")
plt.show()
print("Visual comparison of original and reconstructed images displayed.")
# Optional: Visualize latent space for a few images (requires dimensionality reduction
# if latent_dim > 2 or 3, e.g., using PCA or t-SNE, not included here for simplicity).
# For latent_dim = 2, you could plot directly.
if latent_dim == 2:
encoded_imgs = encoder.predict(x_test)
plt.figure(figsize=(8, 8))
plt.scatter(encoded_imgs[:, 0], encoded_imgs[:, 1], c=_, cmap='viridis') # Using _ from load_data for colors
plt.colorbar()
plt.title('2D Latent Space Visualization')
plt.xlabel('Latent Dimension 1')
plt.ylabel('Latent Dimension 2')
plt.grid(True)
plt.show()
print("2D Latent space visualization displayed.")
else:
print(f"Latent space dimension ({latent_dim}) is not 2, skipping direct 2D visualization.")
print("\nScript execution finished.")