API Reference¶
Pipeline Modules¶
pipeline/config.py¶
Central configuration for all pipeline parameters.
pipeline/pull_data.py¶
Data acquisition from Cloudflare D1 database.
pipeline/extract_image_embeddings.py¶
CLIP image embedding extraction.
pipeline/extract_caption_embeddings.py¶
Sentence-BERT text embedding extraction.
pipeline/extract_emotions.py¶
Emotion score extraction using transformer models.
pipeline/extract_temporal_features.py¶
Temporal feature engineering.
pipeline/extract_location_clusters.py¶
DBSCAN-based location clustering.
Output Formats¶
NumPy Arrays (.npy)¶
Used for high-dimensional embeddings:
import numpy as np
embeddings = np.load('data/processed/image_embeddings.npy')
# Shape: (N, embedding_dim)
CSV Files¶
Used for tabular features:
JSON Index Files¶
Maps record IDs to array indices: