Case Study: Driver Dispatch & 120s TTL Optimization
Role: Principal Platform Engineer & Technical Writer
Feature: Driver Dispatch & 120s TTL (Time-To-Live)
System context: Go (Golang), Kubernetes, RabbitMQ, MongoDB
Executive Summary (Non-Tech)
Success: A rider requests a trip, and the closest available driver sees the request and accepts it within seconds, confirming the ride.
Failure (Timeout): If no driver is found within 2 minutes, the system automatically stops searching and notifies the rider so they can try again or change their fare.
Scenario 1: The Success Path (Match < 10s)
A rider requests a trip, and a driver accepts within the golden window.
Technical Workflow
- Producer:
Trip Servicepublishestrip.requestedto thefind_available_driversqueue in RabbitMQ. - Topic/Queue:
find_available_driversstores the message with anx-message-ttlof 120,000ms. - Consumer:
Driver Serviceconsumes the message and pushes it to eligible drivers via WebSockets. - State Change: When a driver accepts,
Trip Serviceupdates the MongoDB record fromstatus: "pending"tostatus: "accepted". - Acknowledge: The
Driver Serviceacknowledges the RabbitMQ message, removing it from the queue.
The "10x" Observer
- OTel Spans:
trip_request_latency,driver_acceptance_lag. - Prometheus Metric:
match_latency_seconds(Histogram). - SQL/Log Query:
-- Find the duration between request and acceptance
SELECT trip_id, (updated_at - created_at) as match_time
FROM trips
WHERE status = 'accepted' AND trip_id = 'TRIP_ID_123';
Scenario 2: The Failure Path (Timeout @ 120s)
No driver is found within the 120-second timeout.
Technical Workflow
- RabbitMQ TTL: The message sits in
find_available_driversfor 120s with noACK. - Dead Letter: RabbitMQ automatically moves the message to the
dlxexchange, then todead_letter_queue. - DLQ Consumer:
API Gatewayconsumes the expired message from the DLQ. - State Change: No change in
tripscollection (remainspendingunless explicitly canceled later). - Notification: API Gateway identifies the
OwnerIDand sends aTripEventNoDriversFoundWebSocket to the rider.
The "10x" Observer
- OTel Spans:
queue_ttl_expiry,dlq_redirection. - Prometheus Metric:
driver_search_timeout_total(Counter). - SQL/Log Query:
# Check logs for expired messages in the DLQ
kubectl logs api-gateway | grep "dlq consumer: expired message for user_id: USER_123"
Scenario 3: The Edge Case (Race Condition - Double Accept)
Two drivers accept the same ride at the exact same millisecond.
Can it happen?
No. We have implemented a Compare-and-Swap (CAS) strategy at the database level.
Technical Prevention
- Atomic Update: The
UpdateTriprepository logic uses a combined filter:_id: TRIP_IDANDstatus: "pending". - First Actor Wins: The first driver's request updates the status to
accepted. - Second Actor Fails: The second request tries to update where
status: "pending", but since the status is nowaccepted, theModifiedCountreturns0. - Graceful Rejection: The service detects
ModifiedCount == 0and returns an error to the second driver: "Trip already assigned."
The "10x" Observer
- OTel Spans:
atomic_update_check,concurrency_conflict. - Prometheus Metric:
conflicting_acceptance_attempts_total(Counter). - SQL/Log Query:
// Verify that only one driver is assigned to the trip
db.trips.find({ "_id": ObjectId("TRIP_ID_123") }).pretty()
Future Backend Upgrades Required: To tackle overlapping efficiently at scale, the backend must implement partial path overlapping. This could involve:
- Passing the active Tripline (Polyline) to PostGIS or Redis spatial operations.
- Intersecting Trip A's remaining untouched geometry with Trip B's start/end coordinates.
- Only dispatching the AMQP
DriverCmdTripRequestif the backend confidently verifies a partial overlap, keeping irrelevant requests off the driver's WebSocket connection entirely.
Scenario 4: Carpool Overlap Dispatching
A second rider requests a carpool while a driver is already on an active carpool trip.
Technical Workflow
- Producer:
Trip Servicepublishestrip.requestedfor Trip B (carpool). - Consumer:
Driver Serviceconsumes the request. - Capacity Check: Filter out drivers who have
AvailableSeats < requested_seats. - Geospatial Check: For each eligible carpool driver, the service fetches their active trips via the Trip HTTP API. A bounding box heuristic (with a ±0.005 degree / ~0.5 km tolerance) is calculated natively in Go over the driver's active route geometry, and the incoming carpool route only qualifies when its sampled points all remain inside that expanded box.
- Dispatch: Only drivers whose active routes successfully overlap with Trip B's requested route receive the AMQP WebSocket
DriverCmdTripRequest. Non-overlapping requests are silently discarded from the broadcast.
The "10x" Observer
- OTel Spans:
fetch_active_trips,calculate_boundingBox_overlap. - Prometheus Metric:
carpool_overlap_filter_droprate(Counter). - SQL/Log Query:
# Check logs for driver suitability evaluation
kubectl logs driver-service | grep "Found suitable drivers: current="
Infrastructure Impact
- HPA Scaling:
Driver Servicescales based on therabbitmq_queue_depthoffind_available_drivers. If the queue grows, K8s spins up more pods to handle the load. - Database Load: Atomic updates ensure we don't need complex distributed locks (Redis Redlock), keeping the MongoDB write-load predictable and lightweight.
[!IMPORTANT] The Distributed CAS pattern is the backbone of our 99.9% consistency in the dispatch loop, while the native Go geospatial clipping preserves driver WebSocket bandwidth during high-concurrency carpool searches.