Welcome to an exploration of one of the most fascinating advancements in artificial intelligence: Generative Adversarial Networks, affectionately known as GANs. Imagine a world where machines can not only understand but also create new, original content that is indistinguishable from human-made creations. This is the promise and power of GANs, a revolutionary framework that has opened up new frontiers in machine learning.
The Core Idea: A Creative Forger and an Expert Detective
At its heart, a Generative Adversarial Network operates on a simple yet profound principle: competition. It involves two distinct neural networks, a "generator" and a "discriminator," locked in a continuous, adversarial game. Think of it like a skilled art forger trying to create a masterpiece that can fool an expert art detective. The forger constantly refines their technique, learning from their mistakes, while the detective sharpens their eye, becoming better at spotting fakes.
Let us illustrate this with a running example: imagine we want to train a machine to write realistic customer reviews for a fictional new product, say, a "Quantum Coffee Maker." Our goal is to generate reviews that sound so authentic, they could have been written by actual customers.
The Generator Network is our creative forger. Its job is to invent new customer reviews that appear genuine. Initially, it might produce nonsensical strings of words, like "coffee quantum good make machine." But over time, it learns from the feedback it receives, gradually improving its ability to craft believable sentences and sentiments.
The Discriminator Network is our expert detective. Its task is to examine any given review and determine whether it is a real review from an actual customer or a fake one conjured up by the generator. It is trained on a dataset of both real customer reviews and the fake reviews produced by the generator. Its ultimate aim is to become so adept that it can perfectly distinguish between the two.
How GANs Work: The Adversarial Dance
The training process of a GAN is an iterative dance between these two networks. They are trained simultaneously but with opposing goals.
In each training step, the generator first creates a batch of fake reviews from a random noise input. This noise acts as a creative spark, providing the generator with a unique starting point for each new review. Then, the discriminator is presented with a mix of these newly generated fake reviews and a batch of real customer reviews. The discriminator processes both sets and attempts to classify each review as either "real" or "fake."
The discriminator's performance is then evaluated. If it correctly identifies a real review as real and a fake review as fake, its internal parameters are adjusted to reinforce these correct classifications. If it makes a mistake, its parameters are adjusted to learn from that error. This process makes the discriminator a more discerning critic.
Following the discriminator's training, the generator gets its turn. The generator's goal is to fool the discriminator. It produces another batch of fake reviews, and these are then fed to the discriminator. However, during this phase, the discriminator's parameters are frozen; it acts as a fixed judge. The generator's parameters are adjusted based on how successful it was at tricking the discriminator. If the discriminator was fooled into thinking a fake review was real, the generator is rewarded and its internal mechanisms are adjusted to produce more reviews like that. If the discriminator easily spotted the fakes, the generator learns to improve its forgery skills.
This continuous back-and-forth, where the generator tries to improve its fakes and the discriminator tries to improve its detection, drives both networks to higher levels of performance. Eventually, the generator becomes so good at creating realistic reviews that the discriminator can no longer reliably tell the difference between real and fake ones, effectively guessing with 50 percent accuracy. At this point, the generator has learned to produce highly convincing synthetic data.
Constituents of a GAN: Broadly and Deeply
Let us delve into the specific components that make up these two powerful networks, using our customer review example.
The Generator Network: The Creative Forger
The generator's primary function is to transform a random input, often called a "latent space vector" or "noise vector," into a data instance that resembles the training data. For our review generation task, this means turning a numerical vector into a sequence of words that form a coherent and plausible customer review.
Input: The generator typically starts with a vector of random numbers, perhaps 100 dimensions long, sampled from a simple distribution like a uniform or normal distribution. This random vector serves as the creative seed for each unique review the generator will produce. Different random vectors should ideally lead to different generated reviews.
Output: The output of our generator will be a sequence of word identifiers, which can then be mapped back to actual words to form a review text. For instance, if our vocabulary assigns 'great' to 1, 'product' to 2, and 'love' to 3, the generator might output a sequence like [1, 2, 3] which translates to "great product love".
Architecture: For text generation, recurrent neural networks (RNNs) like Long Short-Term Memory (LSTM) units or Gated Recurrent Units (GRU) are commonly used because they are excellent at handling sequential data. The generator might start with a dense layer to expand the latent noise vector, followed by a series of LSTM or GRU layers to build up the sequence, and finally a dense layer with a softmax activation function over the entire vocabulary to output probabilities for each word at each position in the sequence.
Let us look at a simplified code snippet for building such a generator using Keras:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Reshape, Embedding, TimeDistributed
def build_generator(latent_dim, vocab_size, max_seq_length):
"""
Constructs the generator model responsible for creating fake reviews.
Parameters:
latent_dim (int): The dimensionality of the random noise input vector.
vocab_size (int): The total number of unique words in our vocabulary.
max_seq_length (int): The maximum length of a generated review sequence.
Returns:
tf.keras.Model: The compiled generator model.
"""
model = Sequential()
# Takes the latent_dim noise vector and transforms it into a sequence-like
# structure suitable for an LSTM layer. We'll start by expanding it.
# The initial dense layer expands the latent vector to a size that can be
# reshaped into a sequence of 'timesteps' for the LSTM.
# For example, if latent_dim is 100, and we want 10 timesteps with 10 units each,
# we'd expand to 100 then reshape to (10, 10).
model.add(Dense(128 * max_seq_length, input_dim=latent_dim))
model.add(Reshape((max_seq_length, 128))) # Reshape to (timesteps, features)
# LSTM layer to process the sequence and learn sequential dependencies.
# return_sequences=True ensures that the LSTM outputs a sequence for the next layer.
model.add(LSTM(256, return_sequences=True))
# TimeDistributed Dense layer applies a Dense layer to each timestep of the sequence.
# This is crucial for generating a probability distribution over the entire vocabulary
# for each word position in the review.
model.add(TimeDistributed(Dense(vocab_size, activation='softmax')))
return model
# Example usage (not part of the actual running example, just for illustration)
# generator_model = build_generator(latent_dim=100, vocab_size=1000, max_seq_length=10)
# generator_model.summary()
The `build_generator` function defines our generator. It takes a random noise vector, expands it, and then uses an LSTM layer to create a sequence of outputs. The `TimeDistributed(Dense(vocab_size, activation='softmax'))` layer is particularly important here; it ensures that for every position in the generated review sequence, the model outputs a probability distribution over all possible words in our vocabulary. The word with the highest probability is then chosen for that position.
The Discriminator Network: The Expert Detective
The discriminator's role is to evaluate incoming data and classify it as either real or fake. In our review example, it must distinguish between actual customer reviews and those fabricated by the generator.
Input: The discriminator receives sequences of word identifiers, representing either real or fake customer reviews.
Output: A single probability value between 0 and 1. A value close to 1 indicates the discriminator believes the input review is real, while a value close to 0 suggests it believes the review is fake.
Architecture: For text classification, the discriminator also often uses recurrent layers like LSTMs or GRUs, possibly combined with embedding layers to convert word identifiers into dense vector representations. The network typically ends with a dense layer and a sigmoid activation function to output the single probability.
Here is a simplified code snippet for building our discriminator:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Embedding, Flatten, Dropout
def build_discriminator(vocab_size, max_seq_length):
"""
Constructs the discriminator model responsible for classifying reviews as real or fake.
Parameters:
vocab_size (int): The total number of unique words in our vocabulary.
max_seq_length (int): The maximum length of a review sequence.
Returns:
tf.keras.Model: The compiled discriminator model.
"""
model = Sequential()
# Embedding layer converts word indices into dense vectors.
# input_length specifies the expected length of input sequences.
model.add(Embedding(vocab_size, 128, input_length=max_seq_length))
# LSTM layer processes the sequence of word embeddings.
# We don't need return_sequences=True here as we only care about the final
# classification for the entire sequence.
model.add(LSTM(256))
# Dropout layer helps prevent overfitting by randomly setting a fraction of input
# units to 0 at each update during training time, which helps prevent co-adaptation
# of neurons.
model.add(Dropout(0.3))
# Final dense layer with sigmoid activation outputs a single probability
# indicating whether the review is real (close to 1) or fake (close to 0).
model.add(Dense(1, activation='sigmoid'))
# Compile the discriminator with an optimizer and a loss function.
# Binary cross-entropy is standard for binary classification.
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
# Example usage (not part of the actual running example, just for illustration)
# discriminator_model = build_discriminator(vocab_size=1000, max_seq_length=10)
# discriminator_model.summary()
The `build_discriminator` function creates a model that takes a sequence of word IDs, converts them into meaningful embeddings, processes them with an LSTM, and then outputs a single probability. This probability tells us how confident the discriminator is that the input review is real.
The Training Loop: Orchestrating the Competition
The training loop is where the generator and discriminator engage in their adversarial game. It involves alternating updates to each network's weights.
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
# Assume build_generator and build_discriminator functions are defined as above
def build_gan(generator, discriminator):
"""
Combines the generator and discriminator into a single GAN model for training the generator.
During generator training, the discriminator's weights are frozen.
The GAN model takes random noise as input and outputs the discriminator's
classification of the generated output.
Parameters:
generator (tf.keras.Model): The generator model.
discriminator (tf.keras.Model): The discriminator model.
Returns:
tf.keras.Model: The compiled GAN model.
"""
# Make the discriminator non-trainable when training the generator
discriminator.trainable = False
# Connect the generator output to the discriminator input
gan_output = discriminator(generator.output)
# Define the GAN model: input is generator's input, output is discriminator's output
gan_model = Model(inputs=generator.input, outputs=gan_output)
# Compile the GAN model. The loss here is for the generator, trying to make
# the discriminator output 'real' (label 1) for its fake samples.
gan_model.compile(loss='binary_crossentropy', optimizer=Adam(learning_rate=0.0002, beta_1=0.5))
return gan_model
def train_gan(generator, discriminator, gan_model, real_reviews_sequences,
latent_dim, n_epochs=100, batch_size=64, vocab_size=None,
max_seq_length=None, tokenizer=None):
"""
Trains the Generative Adversarial Network.
Parameters:
generator (tf.keras.Model): The generator model.
discriminator (tf.keras.Model): The discriminator model.
gan_model (tf.keras.Model): The combined GAN model.
real_reviews_sequences (np.array): Preprocessed real review sequences.
latent_dim (int): Dimensionality of the generator's noise input.
n_epochs (int): Number of training epochs.
batch_size (int): Size of batches for training.
vocab_size (int): Total number of unique words in the vocabulary.
max_seq_length (int): Maximum length of a review sequence.
tokenizer (tf.keras.preprocessing.text.Tokenizer): The tokenizer used for text.
"""
half_batch = batch_size // 2
# Labels for real and fake samples
# Smooth labels are often used for discriminator to prevent overconfidence
real_labels = np.ones((half_batch, 1)) * 0.9 # Label smoothing for real samples
fake_labels = np.zeros((half_batch, 1)) + 0.1 # Label smoothing for fake samples
for epoch in range(n_epochs):
# ---------------------
# Train Discriminator
# ---------------------
# Select a random half_batch of real reviews
idx = np.random.randint(0, real_reviews_sequences.shape[0], half_batch)
real_reviews = real_reviews_sequences[idx]
# Generate a half_batch of fake reviews
noise = np.random.normal(0, 1, (half_batch, latent_dim))
generated_reviews_indices = generator.predict(noise)
# Convert generated indices to one-hot or categorical if needed,
# but for simple text, we might just take argmax to get word IDs.
# For simplicity, let's assume generator.predict directly gives word IDs.
# In a real scenario, this would involve sampling from softmax output.
# For this example, we'll round the probabilities to get integer word IDs
# and ensure they are within vocab_size.
generated_reviews_indices = np.argmax(generated_reviews_indices, axis=-1)
# Ensure generated indices are within valid range (0 to vocab_size-1)
generated_reviews_indices = np.clip(generated_reviews_indices, 0, vocab_size - 1)
# Train the discriminator
d_loss_real = discriminator.train_on_batch(real_reviews, real_labels)
d_loss_fake = discriminator.train_on_batch(generated_reviews_indices, fake_labels)
d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)
# ---------------------
# Train Generator
# ---------------------
# Generate a batch of noise vectors for the generator
noise = np.random.normal(0, 1, (batch_size, latent_dim))
# The generator wants the discriminator to classify its fakes as real (label 1)
valid_y = np.ones((batch_size, 1))
# Train the generator (via the combined GAN model)
g_loss = gan_model.train_on_batch(noise, valid_y)
# Print progress
print(f"Epoch {epoch}/{n_epochs} [D loss: {d_loss[0]:.4f}, acc.: {100*d_loss[1]:.2f}%] [G loss: {g_loss:.4f}]")
# Optionally, generate and print a sample review every few epochs
if epoch % 100 == 0:
sample_noise = np.random.normal(0, 1, (1, latent_dim))
generated_sample_indices = generator.predict(sample_noise)
generated_sample_indices = np.argmax(generated_sample_indices, axis=-1)
generated_sample_indices = np.clip(generated_sample_indices, 0, vocab_size - 1)
# Convert indices back to words for human-readable output
if tokenizer:
# Create a reverse word map for decoding
reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
def sequence_to_text(sequence):
words = [reverse_word_map.get(idx, '<UNK>') for idx in sequence if idx != 0] # Filter out padding
return ' '.join(words)
sample_review_text = sequence_to_text(generated_sample_indices[0])
print(f" Generated sample review: '{sample_review_text}'")
else:
print(f" Generated sample review indices: {generated_sample_indices[0]}")
# Note: The actual execution of train_gan with real data and a tokenizer
# will be in the addendum. This snippet focuses on the loop logic.
The `train_gan` function orchestrates the entire training process. It iteratively:
1. Trains the discriminator: It takes half a batch of real reviews and half a batch of fake reviews generated by the current generator. It then updates the discriminator's weights to better distinguish between them.
2. Trains the generator: It generates a full batch of fake reviews and tries to fool the discriminator into classifying them as real. During this step, only the generator's weights are updated, while the discriminator acts as a fixed judge.
Why GANs are Useful: Beyond Fake Reviews
While our example focuses on generating customer reviews, the applications of Generative Adversarial Networks extend far beyond this niche. Their ability to generate realistic data makes them incredibly versatile.
One prominent use is in image generation. GANs can create hyper-realistic images of faces, landscapes, animals, or objects that have never existed. This capability is used in fields like entertainment for creating virtual characters or in design for generating new product concepts.
Another significant application is image-to-image translation. This involves transforming an image from one domain to another. Examples include converting sketches into photorealistic images, changing day scenes to night scenes, or even altering facial expressions while preserving identity.
GANs are also invaluable for data augmentation. In scenarios where real training data is scarce, GANs can generate synthetic data that closely mimics the real data, thereby expanding the training set and improving the performance of other machine learning models. For instance, in medical imaging, GANs can create more examples of rare disease conditions.
Furthermore, they contribute to super-resolution, enhancing the quality of low-resolution images by generating missing details, and in drug discovery, where they can propose new molecular structures with desired properties. The creative capacity of GANs is constantly being explored, leading to novel solutions across various industries.
Challenges and Considerations
Despite their power, training GANs can be challenging. One common issue is mode collapse, where the generator learns to produce only a limited variety of outputs, even if the real data is diverse. For instance, our review generator might only learn to write positive reviews, ignoring negative or neutral sentiments. This happens when the generator finds a few samples that consistently fool the discriminator and stops exploring the full data distribution.
Another challenge is training stability. GANs are notoriously difficult to train because of the delicate balance required between the generator and discriminator. If one network becomes too powerful too quickly, the training can become unstable, leading to poor results. This often requires careful tuning of hyperparameters and architectural choices.
Finally, evaluation metrics for GANs are still an active area of research. It is hard to quantitatively assess the "realism" or "diversity" of generated content, especially for complex data like text or images. Human evaluation often remains a crucial, albeit subjective, method.
Conclusion
Generative Adversarial Networks represent a monumental leap in machine learning, enabling machines to move beyond mere analysis and into the realm of creation. By pitting two neural networks against each other in an adversarial game, GANs learn to generate highly realistic and diverse synthetic data. From crafting compelling fake customer reviews to conjuring photorealistic images and aiding scientific discovery, their potential continues to unfold. While challenges like mode collapse and training stability persist, the ongoing research and innovation in GANs promise even more astonishing applications in the years to come, further blurring the lines between artificial and authentic creation.
Addendum: Full Running Example Code for Generating Fake Customer Reviews
This complete script demonstrates how to build and train a simple GAN to generate short, fake customer reviews for our "Quantum Coffee Maker" example.
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, LSTM, Reshape, Embedding, TimeDistributed, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import os
# --- Configuration Parameters ---
VOCAB_SIZE = 50 # Maximum number of unique words in our vocabulary
MAX_SEQ_LENGTH = 10 # Maximum length of a review sequence (e.g., "great product love it" is 4 words)
LATENT_DIM = 100 # Dimension of the random noise input to the generator
N_EPOCHS = 2000 # Number of training iterations
BATCH_SIZE = 64 # Number of samples per training batch
BUFFER_SIZE = 10000 # For shuffling dataset
# --- 1. Define Generator Model ---
def build_generator(latent_dim, vocab_size, max_seq_length):
"""
Constructs the generator model responsible for creating fake reviews.
It takes a random noise vector and transforms it into a sequence of word indices.
"""
model = Sequential(name="generator")
# Initial dense layer to expand the latent noise vector.
# We expand it to a size that can be reshaped into a sequence of 'timesteps'
# for the LSTM layer. The 128 here is an arbitrary feature dimension per timestep.
model.add(Dense(128 * max_seq_length, input_dim=latent_dim))
model.add(Reshape((max_seq_length, 128))) # Reshape to (timesteps, features)
# LSTM layer to process the sequence and learn sequential dependencies.
# return_sequences=True ensures that the LSTM outputs a sequence for the next layer,
# which is necessary when generating a sequence of words.
model.add(LSTM(256, return_sequences=True))
# TimeDistributed Dense layer applies a Dense layer to each timestep of the sequence.
# This is crucial for generating a probability distribution over the entire vocabulary
# for each word position in the review. The softmax activation converts these
# into probabilities.
model.add(TimeDistributed(Dense(vocab_size, activation='softmax')))
return model
# --- 2. Define Discriminator Model ---
def build_discriminator(vocab_size, max_seq_length):
"""
Constructs the discriminator model responsible for classifying reviews as real or fake.
It takes a sequence of word indices and outputs a single probability.
"""
model = Sequential(name="discriminator")
# Embedding layer converts integer word indices into dense vectors.
# This helps the model understand semantic relationships between words.
# input_length specifies the expected length of input sequences.
model.add(Embedding(vocab_size, 128, input_length=max_seq_length))
# LSTM layer processes the sequence of word embeddings to capture context.
# We don't need return_sequences=True here as we only care about the final
# classification for the entire sequence, not a sequence of outputs.
model.add(LSTM(256))
# Dropout layer helps prevent overfitting by randomly setting a fraction of input
# units to 0 during training, which encourages the network to learn more robust features.
model.add(Dropout(0.3))
# Final dense layer with sigmoid activation outputs a single probability
# indicating whether the review is real (close to 1) or fake (close to 0).
model.add(Dense(1, activation='sigmoid'))
# Compile the discriminator with an Adam optimizer and binary cross-entropy loss,
# which is standard for binary classification tasks.
discriminator_optimizer = Adam(learning_rate=0.0002, beta_1=0.5)
model.compile(loss='binary_crossentropy', optimizer=discriminator_optimizer, metrics=['accuracy'])
return model
# --- 3. Define Combined GAN Model ---
def build_gan(generator, discriminator):
"""
Combines the generator and discriminator into a single GAN model for training the generator.
When training the GAN, the discriminator's weights are frozen.
"""
# Make the discriminator non-trainable when training the generator.
# This is crucial: we only want to update the generator's weights based on
# how well it fools the *current* discriminator.
discriminator.trainable = False
# Connect the generator output to the discriminator input.
gan_output = discriminator(generator.output)
# Define the GAN model: input is the generator's input (noise),
# and the output is the discriminator's classification of the generated data.
gan_model = Model(inputs=generator.input, outputs=gan_output, name="gan")
# Compile the GAN model. The loss here is for the generator, which tries to make
# the discriminator output 'real' (label 1) for its fake samples.
gan_optimizer = Adam(learning_rate=0.0002, beta_1=0.5)
gan_model.compile(loss='binary_crossentropy', optimizer=gan_optimizer)
return gan_model
# --- 4. Prepare Real Data (Simulated) ---
def load_real_samples(vocab_size, max_seq_length):
"""
Simulates loading and preprocessing real customer review data.
In a real scenario, this would load actual text files.
"""
# A small set of example real reviews for our "Quantum Coffee Maker"
real_reviews_text = [
"This quantum coffee maker is great, love the speed.",
"Best coffee machine ever, highly recommend to everyone.",
"Excellent product, makes amazing coffee every morning.",
"I love my new quantum coffee maker, it's a game changer.",
"Simply fantastic, the coffee tastes out of this world.",
"Highly satisfied with this purchase, worth every penny.",
"The best quantum coffee experience I've had so far.",
"Five stars for this innovative coffee maker.",
"My morning routine is so much better with this machine.",
"A must-have for any coffee enthusiast, truly revolutionary."
]
# Initialize a tokenizer to convert words to integers
# oov_token handles out-of-vocabulary words
tokenizer = Tokenizer(num_words=vocab_size, oov_token="<unk>")
tokenizer.fit_on_texts(real_reviews_text)
# Convert text reviews to sequences of integers
sequences = tokenizer.texts_to_sequences(real_reviews_text)
# Pad sequences to ensure all reviews have the same length
# 'post' padding adds zeros at the end, 'pre' adds at the beginning
padded_sequences = pad_sequences(sequences, maxlen=max_seq_length, padding='post')
print(f"Loaded {len(real_reviews_text)} real review samples.")
print(f"Vocabulary size: {len(tokenizer.word_index) + 1}")
print(f"Example real sequence (padded): {padded_sequences[0]}")
return padded_sequences, tokenizer
# --- 5. Helper Functions for Training ---
def generate_latent_points(latent_dim, n_samples):
"""
Generates random noise vectors as input for the generator.
"""
# Generate points in the latent space (e.g., from a normal distribution)
x_input = np.random.normal(0, 1, (n_samples, latent_dim))
return x_input
def generate_fake_samples(generator, latent_dim, n_samples, vocab_size):
"""
Uses the generator to create fake review sequences.
"""
# Generate random points in latent space
x_input = generate_latent_points(latent_dim, n_samples)
# Predict word probabilities using the generator
X_raw_output = generator.predict(x_input, verbose=0)
# Convert probabilities to discrete word indices by taking the argmax
# We clip to ensure indices are within the valid vocab_size range.
X = np.argmax(X_raw_output, axis=-1)
X = np.clip(X, 0, vocab_size - 1) # Ensure indices are valid
# Create 'fake' labels (0) for these generated samples
y = np.zeros((n_samples, 1)) + 0.1 # Label smoothing
return X, y
def summarize_performance(epoch, generator, discriminator, real_reviews_sequences,
latent_dim, n_samples=100, vocab_size=None,
max_seq_length=None, tokenizer=None):
"""
Evaluates and prints the performance of the GAN at given intervals.
Generates sample reviews and displays them.
"""
# Evaluate the discriminator on real samples
x_real, y_real = real_reviews_sequences, np.ones((len(real_reviews_sequences), 1)) * 0.9
_, acc_real = discriminator.evaluate(x_real, y_real, verbose=0)
# Generate fake samples and evaluate the discriminator on them
x_fake, y_fake = generate_fake_samples(generator, latent_dim, n_samples, vocab_size)
_, acc_fake = discriminator.evaluate(x_fake, y_fake, verbose=0)
# Summarize discriminator performance
print(f"Discriminator Accuracy: Real={acc_real*100:.2f}%, Fake={acc_fake*100:.2f}%")
# Generate and print a few sample reviews
print("--- Sample Generated Reviews ---")
sample_noise = generate_latent_points(latent_dim, 3) # Generate 3 samples
generated_samples_indices = generator.predict(sample_noise, verbose=0)
generated_samples_indices = np.argmax(generated_samples_indices, axis=-1)
generated_samples_indices = np.clip(generated_samples_indices, 0, vocab_size - 1)
reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
def sequence_to_text(sequence):
# Filter out padding (0) and unknown tokens (<UNK>) for cleaner output
words = [reverse_word_map.get(idx, '<UNK>') for idx in sequence if idx != 0 and reverse_word_map.get(idx, '<UNK>') != '<unk>']
return ' '.join(words)
for i, seq_indices in enumerate(generated_samples_indices):
review_text = sequence_to_text(seq_indices)
print(f" Sample {i+1}: '{review_text}'")
print("------------------------------")
# --- 6. Main Training Function ---
def train_gan_model(generator, discriminator, gan_model, real_reviews_sequences,
latent_dim, n_epochs, batch_size, vocab_size, max_seq_length, tokenizer):
"""
Trains the Generative Adversarial Network.
"""
half_batch = batch_size // 2
# Create dataset from real reviews for efficient batching
dataset = tf.data.Dataset.from_tensor_slices(real_reviews_sequences).shuffle(BUFFER_SIZE).batch(half_batch)
# Labels for real and fake samples (with label smoothing)
real_labels = np.ones((half_batch, 1)) * 0.9
fake_labels = np.zeros((half_batch, 1)) + 0.1
for epoch in range(n_epochs):
for i, real_batch in enumerate(dataset):
# ---------------------
# Train Discriminator
# ---------------------
# Generate a half_batch of fake reviews
noise = generate_latent_points(latent_dim, half_batch)
generated_reviews_indices, _ = generate_fake_samples(generator, latent_dim, half_batch, vocab_size)
# Train the discriminator on real and fake samples
# Note: real_batch is already a tf.Tensor, generated_reviews_indices is np.array
d_loss_real = discriminator.train_on_batch(real_batch, real_labels)
d_loss_fake = discriminator.train_on_batch(generated_reviews_indices, fake_labels)
d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)
# ---------------------
# Train Generator
# ---------------------
# Generate a full batch of noise vectors for the generator
noise = generate_latent_points(latent_dim, batch_size)
# The generator wants the discriminator to classify its fakes as real (label 1)
valid_y = np.ones((batch_size, 1))
# Train the generator (via the combined GAN model)
# The discriminator's weights are frozen during this step due to build_gan setup
g_loss = gan_model.train_on_batch(noise, valid_y)
# Print progress for the current batch
if i % 10 == 0: # Print every 10 batches
print(f"Epoch {epoch+1}/{n_epochs}, Batch {i+1} "
f"[D loss: {d_loss[0]:.4f}, acc.: {100*d_loss[1]:.2f}%] "
f"[G loss: {g_loss:.4f}]")
# Summarize performance and generate samples at the end of each epoch
if (epoch + 1) % 100 == 0 or epoch == 0: # Print summary every 100 epochs or at epoch 0
print(f"\n--- Epoch {epoch+1} Summary ---")
summarize_performance(epoch, generator, discriminator, real_reviews_sequences,
latent_dim, n_samples=10, vocab_size=vocab_size,
max_seq_length=max_seq_length, tokenizer=tokenizer)
print("-----------------------------\n")
# --- Main Execution Block ---
if __name__ == "__main__":
# Load and preprocess real review data
real_reviews_sequences, tokenizer = load_real_samples(VOCAB_SIZE, MAX_SEQ_LENGTH)
actual_vocab_size = len(tokenizer.word_index) + 1 # +1 for padding token
# Build the discriminator
discriminator = build_discriminator(actual_vocab_size, MAX_SEQ_LENGTH)
discriminator.summary()
# Build the generator
generator = build_generator(LATENT_DIM, actual_vocab_size, MAX_SEQ_LENGTH)
generator.summary()
# Build the combined GAN model
gan_model = build_gan(generator, discriminator)
gan_model.summary()
# Train the GAN
print("\nStarting GAN training...")
train_gan_model(generator, discriminator, gan_model, real_reviews_sequences,
LATENT_DIM, N_EPOCHS, BATCH_SIZE, actual_vocab_size, MAX_SEQ_LENGTH, tokenizer)
print("\nGAN training finished.")
print("Final performance summary:")
summarize_performance(N_EPOCHS, generator, discriminator, real_reviews_sequences,
LATENT_DIM, n_samples=10, vocab_size=actual_vocab_size,
max_seq_length=MAX_SEQ_LENGTH, tokenizer=tokenizer)
# Optional: Save the generator model for future use
# generator.save('review_generator.h5')
# print("\nGenerator model saved as 'review_generator.h5'")
No comments:
Post a Comment