Pipeline Overview¶
The DREAMS pipeline consists of two main phases: data acquisition and feature extraction.
Architecture¶
flowchart TD
A[Phase 1: Data Pull] --> B[Raw Images + Metadata]
B --> C[Snapshot Freeze]
C --> D[Phase 2A: Image Embeddings]
C --> E[Phase 2B: Caption Embeddings]
C --> F[Phase 2C: Emotion Extraction]
C --> G[Phase 2D: Temporal Features]
C --> H[Phase 2E: Location Clustering]
D --> I[processed/image_embeddings.npy]
E --> J[processed/text_embeddings.npy]
F --> K[processed/emotion_scores.csv]
G --> L[processed/temporal_features.csv]
H --> M[processed/place_ids.csv] Phase Summary¶
| Phase | Input | Output | Model |
|---|---|---|---|
| 1 | D1 Database | Raw images + metadata | - |
| 2A | Images | 512-dim embeddings | CLIP ViT-B/32 |
| 2B | Captions | 384-dim embeddings | Sentence-BERT |
| 2C | Captions | Valence/arousal + emotions | DistilRoBERTa |
| 2D | Timestamps | Circadian encoding | - |
| 2E | GPS coords | Place IDs | DBSCAN |
Data Flow¶
- Pull: Download images and metadata from Cloudflare D1
- Freeze: Create immutable snapshot for experiment reproducibility
- Extract: Run feature extraction pipelines on frozen data
- Analyze: Use extracted features for research analysis