How Large Language Models help us make content moderation more precise

Haitao Bao, Senior Data Science Manager; Jia Chen, Head of Data Science, Integrity and Core platform

. February 28, 2025 . Regional

At Grab, we want to provide a positive and safe experience for everyone using our app. Therefore, we need a solid content moderation system that helps us screen user-generated content—such as merchant catalogues and user reviews—to flag incidents of offensive language, inappropriate images, and so on.

Thankfully, the vast majority of interactions and content uploaded to our platform is harmless. And with good content moderation, we can catch the bad apples early on.

We’ve always done this with a blend of AI-supported automated content filters and human moderators weighing in on the more complex decisions. More recently, however, the rise of Large Language Models (LLMs) made our automated systems more precise, which means we’re getting more effective at quick, accurate content moderation at a mass-scale, while reducing the strain on our content moderation team. Here’s how

Content moderation is getting better thanks to LLMs

Grab’s content moderation works in a two-tier system. In the first layer, we employ small, specialised AI models that can screen large amounts of content quickly and efficiently.

An example of a Tier 1 model is keyword filtration. Based on a pre-defined list of problematic words, the system flags content that contains any of these words. Similarly, an AI model can screen images for potential violations after it has been pre-trained.

Such task-specific AIs manage to “pass” or “fail” large amounts of data with high accuracy. Our Tier 1 models currently flag less than 5% of content as potential violations. These then get passed on to “large” Tier 2 models for another assessment

Context-aware decision-making with LLMs

In Tier 2, LLMs can make more complex, context-aware decisions. We can simply prompt the LLM with our violation policies on specific topics, let’s say the sale of tobacco on Grab, or our sexual harassment policy. This applies even to complex scenarios, for example to differentiate traditional tobacco and e-cigarettes. The prompts for detecting all tobacco and the prompt for detecting e-cigarette are shown in the table below. This gives a high level of flexibility to handle complex scenarios defined in policies.

The LLM can then assess how likely it is that a piece of content is in violation of one of our policies. A low score means the LLM is confident the content is safe. Meanwhile, a high score indicates the content should be filtered. A medium score means the LLM isn’t quite sure—it could go either way. That’s where human moderators step in for the final say.

This second layer of automated moderation is where we’ve seen the most change. We started implementing LLMs in Tier 2 moderation in 2023 and are gradually expanding their use to additional cases through Q3 2024. Thus we are able to reduce 90% of human efforts in the moderation process and reduce SLA from days to minutes.

Next steps: finetuning our models

We achieved great results leveraging LLMs, but we’re still working on improvements, specifically in two areas:

Latency: The LLM latency with image input is still quite high. We therefore need to improve latency to achieve use cases in which users get near-instant feedback about their uploaded content
Accuracy: LLMs don’t always have good knowledge on the topics we want to detect. Also, understanding local languages can be challenging. We therefore need to enhance the model’s understanding of those areas

Leveraging our in-house data, our team is constantly finetuning our content moderation LLMs to address the challenges above, with promising initial results.

Solutions

How Large Language Models help us make content moderation more precise

Content moderation is getting better thanks to LLMs

Context-aware decision-making with LLMs

Next steps: finetuning our models

Komsan Chiyadis

Komsan Chiyadis