API Reference¶

Pipeline Modules¶

Central configuration for all pipeline parameters.

Data acquisition from Cloudflare D1 database.

CLIP image embedding extraction.

Sentence-BERT text embedding extraction.

Emotion score extraction using transformer models.

Temporal feature engineering.

DBSCAN-based location clustering.

Used for high-dimensional embeddings:

import numpy as np
embeddings = np.load('data/processed/image_embeddings.npy')
# Shape: (N, embedding_dim)

Used for tabular features:

import pandas as pd
emotions = pd.read_csv('data/processed/emotion_scores.csv')

Maps record IDs to array indices:

import json
with open('data/processed/image_embedding_index.json') as f:
    index = json.load(f)
# {"record_id": array_index, ...}