The machine learning magic that powers Grab’s marketplace

Serene Ow, Head of Data Science

. August 8, 2024 . Regional

At Grab, we’re in the process of making our marketplace more and more self-reliant, so that it can automatically adapt to a variety of environmental situations in real-time. These could include changes in weather, traffic patterns, supply-and-demand imbalances, and so on.

We call this building an ‘auto-adaptive system’. We want to get to a point where the Grab platform automatically processes and learns from historical data, tweaks its levers to adjust to real-time situations, feeds that data back into its learning process, and eventually becomes better and better at predicting and responding to future scenarios. ‘Levers’ could be things like ride fares, driver incentives, or allocation mechanisms. The goal is to ensure an optimal experience for everyone—which includes the shortest possible wait times for passengers and least unwanted idle time for driver-partners—in spite of constraints.

What goes into the creation of such an auto-adaptive system? Interestingly, becoming a mother gave me a good framework to help me explain this.

Learning from the environment

When we had our child, we realised it’s up to us to create an environment which is optimal for him to learn and grow.

We exposed him to different environments, different textures, and smells. We showed him pictures, read him books, and took him to the zoo and parks. This rich tapestry of impressions and information expanded his understanding of the world.

For Grab’s auto-adaptive system, one important step was to build a ‘signals marketplace’ for our systems to tap into.

Similar to how we exposed our child to a variety of inputs, we essentially created continuous data streams with a variety of live data, attributes, and real-time metrics from our marketplace. Parsing this data allows our systems to more accurately make sense of what is happening on the ground.

This allows our systems to have robust and accurate information in real-time about the state or changes in the surrounding context as well as their performance, and enables up-to-date features for online inference across our models

Feedback loop

A child learns when parents provide continual feedback to actions and behaviours.

Similar to that process, we built infrastructure to support continual feedback to our models. This allows us to effectively use strategies such as online learning and reinforcement learning. The term online learning refers to models that constantly retrain and update their parameters in real-time as it operates, taking into account the latest information.

Reinforcement learning allows models to learn and optimise based on feedback gained from deliberate experimentation, such as user response to an intervention.

With continual feedback, our child—and our auto-adaptive systems—learn about the consequences of their decisions and actions in a guided process.

Adaptive experimentation

Lastly, as parents, we also need to let our children push boundaries and learn by themselves.

Similarly, we provide the infrastructure and tools to let our models learn from adaptive experimentation.

You may be familiar with traditional A/B testing, where we compare outcomes of two distinct scenarios. With machine learning, we can automate the repeated design of experiments: Both selecting the variables, and learning from previous experiments, in a sequential way.

This approach enables our models to continually test new parameters in a live environment on small populations. These new parameters can then be automatically deployed. This allows our systems to iteratively learn more optimal parameters, adapting to ever-changing marketplace conditions.

Observability

Like any good parent, we make sure that experimentation happens in a safe environment, and where we can keep an eye on things from a distance.

We’ve built simulation tools to mitigate the risks of anything going wrong once changes are deployed. Simulation tools allow us to run experiments in a non-live environment—essentially, a digital replica of our marketplace where we can study the effects of variable changes without impacting actual operations.

Our automated processes also have a high level of observability. This means we can monitor conveniently and manually “break-glass” and intervene if necessary.

Journey towards automation

The journey towards full marketplace automation takes time, just like it takes time for a child to grow towards independence. It also happens in different stages.

Similar to the stages of a self-driving car: Stage 0 represents no automation, with full manual control. Stage 1 introduces some automation, with manual interventions. Then, step-by-step, more automation is introduced, to the point where the system is able to dynamically adapt to standard scenarios—such as regular spikes in food delivery demand at lunch time—as well as special scenarios—such as unexpected road closures after flooding or traffic accidents.

So the next time you find yourself stuck in heavy rain, where more people start booking rides, and driver supply drops—rest assured that our automated marketplace systems have detected the rain. They’re adjusting their parameters to fulfil your ride within the constraints of the poor external conditions, and they’re getting better at it each time.

Perspectives