Phase 2B: Caption Embeddings¶

Model: all-MiniLM-L6-v2
Embedding Dimension: 384
Normalization: L2-normalized vectors

Extracts Sentence-BERT (MiniLM) text embeddings from memory captions.

Run¶

source venv/bin/activate
pip install sentence-transformers>=2.2.0
python pipeline/extract_caption_embeddings.py

File	Description
`data/processed/text_embeddings.npy`	(N, 384) Sentence-BERT embeddings
`data/processed/caption_embedding_index.json`	Record ID to embedding index mapping