Quickstart¶
This guide walks through running a complete video mining pipeline.
Prerequisites¶
- Data Miner installed
- PostgreSQL running (via Docker Compose or local)
- Database initialized (
data-miner init-db)
Step 1: Create Configuration¶
Create a config file config.yaml:
project_name: "glass_doors_demo"
output_dir: "./output"
input:
search_queries:
- "glass door installation tutorial"
max_results_per_query: 10 # Start small for testing
# Reduced workers for demo
supervisor:
download_workers: 2
extract_workers: 1
filter_workers: 1
dedup_workers: 1
detect_workers: 1
filter:
threshold: 0.25
positive_prompts:
- "a glass door"
- "a sliding glass door"
negative_prompts:
- "a window"
- "a mirror"
Step 2: Initialize Database¶
Step 3: Populate Videos¶
# Search YouTube and add videos to database
data-miner populate --config config.yaml
# Check status
data-miner status --project glass_doors_demo
Expected output:
Step 4: Setup Workers¶
# Generate supervisor config
data-miner workers setup --config config.yaml
# Verify config was created
cat /etc/supervisor/conf.d/data_miner.conf
Step 5: Start Pipeline¶
# Start all workers
data-miner workers start
# Monitor progress
watch -n 5 "data-miner status --project glass_doors_demo"
Step 6: Monitor Progress¶
# Check worker status
data-miner workers status
# Check pipeline status
data-miner status --project glass_doors_demo
As the pipeline progresses, you'll see:
- POPULATING → Videos being downloaded/extracted
- FILTERING → Frames being filtered
- DEDUP_READY → All videos filtered, cross-dedup starting
- DETECT_READY → Dedup complete, detection starting
- COMPLETE → Pipeline finished
Step 7: View Results¶
output/projects/glass_doors_demo/
├── frames_filtered/ # Frames that passed filter
│ └── {video_id}/
├── frames_dedup/ # Unique frames (flat)
└── detections/
├── annotations.json # COCO-format annotations
└── visualizations/ # Bounding box images
Common Workflows¶
Re-run Deduplication¶
Re-run Detection¶
Stop Pipeline¶
Delete and Start Over¶
Troubleshooting¶
Workers Not Starting¶
# Check supervisor logs
sudo tail -f /var/log/supervisor/supervisord.log
# Check worker logs
tail -f output/logs/download_*.log
Videos Stuck¶
# Check for stale locks (monitor worker handles this automatically)
data-miner status --project glass_doors_demo
GPU Memory Issues¶
Reduce batch size in config:
Next Steps¶
- Configuration - All config options
- Architecture Overview - System design