Case Study: Driver Dispatch & 120s TTL Optimization

Role: Principal Platform Engineer & Technical Writer
Feature: Driver Dispatch & 120s TTL (Time-To-Live)
System context: Go (Golang), Kubernetes, RabbitMQ, MongoDB

Executive Summary (Non-Tech)

Success: A rider requests a trip, and the closest available driver sees the request and accepts it within seconds, confirming the ride.
Failure (Timeout): If no driver is found within 2 minutes, the system automatically stops searching and notifies the rider so they can try again or change their fare.

Scenario 1: The Success Path (Match < 10s)

A rider requests a trip, and a driver accepts within the golden window.

Technical Workflow

Producer: Trip Service publishes trip.requested to the find_available_drivers queue in RabbitMQ.
Topic/Queue: find_available_drivers stores the message with an x-message-ttl of 120,000ms.
Consumer: Driver Service consumes the message and pushes it to eligible drivers via WebSockets.
State Change: When a driver accepts, Trip Service updates the MongoDB record from status: "pending" to status: "accepted".
Acknowledge: The Driver Service acknowledges the RabbitMQ message, removing it from the queue.

The "10x" Observer

OTel Spans: trip_request_latency, driver_acceptance_lag.
Prometheus Metric: match_latency_seconds (Histogram).

SQL/Log Query:

-- Find the duration between request and acceptance
SELECT trip_id, (updated_at - created_at) as match_time 
FROM trips 
WHERE status = 'accepted' AND trip_id = 'TRIP_ID_123';

Scenario 2: The Failure Path (Timeout @ 120s)

No driver is found within the 120-second timeout.

Technical Workflow

RabbitMQ TTL: The message sits in find_available_drivers for 120s with no ACK.
Dead Letter: RabbitMQ automatically moves the message to the dlx exchange, then to dead_letter_queue.
DLQ Consumer: API Gateway consumes the expired message from the DLQ.
State Change: No change in trips collection (remains pending unless explicitly canceled later).
Notification: API Gateway identifies the OwnerID and sends a TripEventNoDriversFound WebSocket to the rider.

The "10x" Observer

OTel Spans: queue_ttl_expiry, dlq_redirection.
Prometheus Metric: driver_search_timeout_total (Counter).

SQL/Log Query:

# Check logs for expired messages in the DLQ
kubectl logs api-gateway | grep "dlq consumer: expired message for user_id: USER_123"

Scenario 3: The Edge Case (Race Condition - Double Accept)

Two drivers accept the same ride at the exact same millisecond.

Can it happen?

No. We have implemented a Compare-and-Swap (CAS) strategy at the database level.

Technical Prevention

Atomic Update: The UpdateTrip repository logic uses a combined filter: _id: TRIP_ID AND status: "pending".
First Actor Wins: The first driver's request updates the status to accepted.
Second Actor Fails: The second request tries to update where status: "pending", but since the status is now accepted, the ModifiedCount returns 0.
Graceful Rejection: The service detects ModifiedCount == 0 and returns an error to the second driver: "Trip already assigned."

The "10x" Observer

OTel Spans: atomic_update_check, concurrency_conflict.
Prometheus Metric: conflicting_acceptance_attempts_total (Counter).

SQL/Log Query:

// Verify that only one driver is assigned to the trip
db.trips.find({ "_id": ObjectId("TRIP_ID_123") }).pretty()

Future Backend Upgrades Required: To tackle overlapping efficiently at scale, the backend must implement partial path overlapping. This could involve:

Passing the active Tripline (Polyline) to PostGIS or Redis spatial operations.
Intersecting Trip A's remaining untouched geometry with Trip B's start/end coordinates.
Only dispatching the AMQP DriverCmdTripRequest if the backend confidently verifies a partial overlap, keeping irrelevant requests off the driver's WebSocket connection entirely.

Scenario 4: Carpool Overlap Dispatching

A second rider requests a carpool while a driver is already on an active carpool trip.

Technical Workflow

Producer: Trip Service publishes trip.requested for Trip B (carpool).
Consumer: Driver Service consumes the request.
Capacity Check: Filter out drivers who have AvailableSeats < requested_seats.
Geospatial Check: For each eligible carpool driver, the service fetches their active trips via the Trip HTTP API. A bounding box heuristic (with a ±0.005 degree / ~0.5 km tolerance) is calculated natively in Go over the driver's active route geometry, and the incoming carpool route only qualifies when its sampled points all remain inside that expanded box.
Dispatch: Only drivers whose active routes successfully overlap with Trip B's requested route receive the AMQP WebSocket DriverCmdTripRequest. Non-overlapping requests are silently discarded from the broadcast.

The "10x" Observer

OTel Spans: fetch_active_trips, calculate_boundingBox_overlap.
Prometheus Metric: carpool_overlap_filter_droprate (Counter).

SQL/Log Query:

# Check logs for driver suitability evaluation
kubectl logs driver-service | grep "Found suitable drivers: current="

Infrastructure Impact

HPA Scaling: Driver Service scales based on the rabbitmq_queue_depth of find_available_drivers. If the queue grows, K8s spins up more pods to handle the load.
Database Load: Atomic updates ensure we don't need complex distributed locks (Redis Redlock), keeping the MongoDB write-load predictable and lightweight.

[!IMPORTANT] The Distributed CAS pattern is the backbone of our 99.9% consistency in the dispatch loop, while the native Go geospatial clipping preserves driver WebSocket bandwidth during high-concurrency carpool searches.

Executive Summary (Non-Tech)​

Scenario 1: The Success Path (Match < 10s)​

Technical Workflow​

The "10x" Observer​

Scenario 2: The Failure Path (Timeout @ 120s)​

Technical Workflow​

The "10x" Observer​

Scenario 3: The Edge Case (Race Condition - Double Accept)​

Can it happen?​

Technical Prevention​

The "10x" Observer​

Scenario 4: Carpool Overlap Dispatching​

Technical Workflow​

The "10x" Observer​

Infrastructure Impact​

Executive Summary (Non-Tech)

Scenario 1: The Success Path (Match < 10s)

Technical Workflow

The "10x" Observer

Scenario 2: The Failure Path (Timeout @ 120s)

Technical Workflow

The "10x" Observer

Scenario 3: The Edge Case (Race Condition - Double Accept)

Can it happen?

Technical Prevention

The "10x" Observer

Scenario 4: Carpool Overlap Dispatching

Technical Workflow

The "10x" Observer

Infrastructure Impact