RideTool
OverviewSet It UpThe Me CardThree Emails a WeekUsing RideTool with ClaudeYour Ride HistoryCalendarWeatherWeightCaloriesHydrationApple WatcheMTBsYour Ride FilesConnecting Strava, Garmin & WahooSharing RidesAccount & BillingPricing & Your DataTroubleshooting
Advanced
AdvancedTraining Metrics (Formulas)Sync & Dedup Engine
Docs › Sync & Dedup Engine

Sync & Dedup Engine

This page documents the deduplication engine, source priority order, and how rides flow from each provider. For the rider-facing version, see Connecting Strava, Garmin & Wahoo.

Why this is hard

A single ride recorded on a Garmin watch can arrive as a Garmin webhook, a Strava webhook (because Garmin auto-syncs to Strava), and a manual FIT upload. Each source reports slightly different timestamps, distances, and durations. The dedup engine has to recognize them as the same ride and decide what to keep.

Three-level dedup

  1. Provider ID match — unique indexes on strava_activity_id, garmin_activity_id, etc. Same provider re-importing the same ride is rejected instantly.
  2. Dedup key match — every ride gets a fingerprint: {start_minute}||{distance_m}, stored and indexed. Same fingerprint = same ride.
  3. Fuzzy match — when the fingerprint doesn't match exactly, RideTool checks for rides within ±2 minutes start time, ±15% duration, ±10% distance. Catches cross-source duplicates where each provider reported slightly different numbers.

Source priority

Higher priority wins on conflict:

  1. Manual FIT upload (4) — raw device file, richest data
  2. Garmin (3) — device-direct webhook
  3. Wahoo (2) — device-direct when recorded on Wahoo hardware
  4. Strava (1) — aggregator, least detail

When a duplicate is detected:

  • Higher priority incoming → action: "replace" (delete existing, insert new)
  • Lower priority incoming → action: "skip"

Known gaps

  • Replace = delete + insert, not field-level merge. When a higher-priority source replaces a lower one, enrichment from the original (Strava ride name, social data) is lost.
  • Race condition — if two webhooks arrive simultaneously for the same ride, both may pass dedup before either is inserted.
  • Fuzzy thresholds — duplicates with timestamps or distances differing by more than the tolerances slip through.

When fitness recomputes

  1. After each ride upload — immediate recalculation.
  2. Nightly — around midnight in the user's local timezone (fatigue decays even on rest days).
  3. Manual rebuild — from Account → Recompute Fitness Stats.

Key files in the codebase

  • backend/services/dedup_service.py — core engine (DedupService, make_dedup_key)
  • backend/services/mongo_strava_sync_service.py — Strava sync with dedup
  • backend/services/mongo_garmin_sync_service.py — Garmin sync (async)
  • backend/services/mongo_wahoo_sync_service.py — Wahoo sync
  • tests/test_dedup_service.py — unit tests

Need help? Our Discord server is the support channel — click the Discord icon in the nav bar after logging in.

RideToolJust keep riding
Privacy Policy·Terms of Service·Support·Docs·Changelog·Blog