Hitchhiker's Guide to AI, Software Architecture, and Everything Else: How Embeddings Work for Large Language Models (LLMs)

Friday, April 25, 2025

How Embeddings Work for Large Language Models (LLMs)

Introduction

Embeddings are a fundamental concept in modern Natural Language Processing (NLP), powering the impressive capabilities of Large Language Models (LLMs). At their core, embeddings translate words, phrases, or entire documents into numerical vectors that capture semantic meaning. These vectors allow LLMs to perform tasks such as semantic search, sentiment analysis, clustering, and classification effectively.

What Are Embeddings?

Embeddings are dense numeric representations of text in a continuous vector space. Each word or phrase is mapped to a high-dimensional vector (typically ranging from 128 to 2048 dimensions). Words with similar meanings or contexts have embeddings that are numerically close to each other, while unrelated words have distant embeddings.

How Do Embeddings Work?

Embeddings are typically generated using neural networks trained on large textual datasets. Models such as Word2Vec, GloVe, FastText, and transformer-based models (e.g., BERT, GPT) produce high-quality embeddings. Transformer-based embeddings are context-sensitive, meaning the embedding of a word can vary depending on its surrounding context.

For example, consider these sentences:

- "I deposited money in the bank."

- "I sat by the river bank."

The word "bank" will have different embeddings depending on the surrounding context. Transformer-based models capture these nuances effectively.

Using Embeddings with Hugging Face Transformers

Hugging Face provides an easy-to-use interface for generating embeddings using pre-trained transformer models. Here's how you can use it:

Step 1: Install Hugging Face libraries

pip install transformers torch

Step 2: Generate embeddings

Here's a simple example demonstrating how to generate embeddings using a popular Hugging Face model, such as "sentence-transformers/all-MiniLM-L6-v2":

from transformers import AutoTokenizer, AutoModel

import torch

# Load tokenizer and model

tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")

model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")

# Function to compute embeddings

def get_embedding(text):

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

with torch.no_grad():

outputs = model(**inputs)

embeddings = outputs.last_hidden_state.mean(dim=1)

return embeddings[0]

# Example usage

sentence1 = "I deposited money in the bank."

sentence2 = "I sat by the river bank."

embedding1 = get_embedding(sentence1)

embedding2 = get_embedding(sentence2)

print("Embedding for sentence 1:", embedding1)

print("Embedding for sentence 2:", embedding2)

Step 3: Comparing embeddings

To measure semantic similarity, you can calculate cosine similarity between embeddings:

from torch.nn.functional import cosine_similarity

similarity = cosine_similarity(embedding1, embedding2, dim=0)

print("Similarity score:", similarity.item())

A higher similarity score indicates more semantic similarity, while a lower score indicates less similarity.

Conclusion

Embeddings are powerful tools that enable LLMs to understand and represent textual data numerically. With Hugging Face transformers, you can easily generate embeddings for your NLP tasks, enabling you to perform semantic searches, clustering, classification, and more.

Hitchhiker's Guide to AI, Software Architecture, and Everything Else