Phase 2E: Location Clustering¶

Clusters raw GPS coordinates into categorical Place IDs using DBSCAN.

Run¶

source venv/bin/activate
python pipeline/extract_location_clusters.py

data/processed/place_ids.csv with the following columns:

Design Choice

Location is treated as categorical context, not a continuous vector. This prevents overfitting to GPS noise.

Parameter	Value	Description
Grid precision	4 decimal places	~11m spatial buffer
DBSCAN ε	7.85×10⁻⁶ rad	~50m radius
min_samples	1	Single-point clusters allowed