We made translations on GrabChat more accurate. Here’s how.

Grab Engineering Team

. September 19, 2024 . Regional

Messages you exchange with your driver-partner in the Grab app are automatically translated. This GrabChat feature is meant to ensure that you’re able to communicate with driver- and delivery-partners seamlessly, even when you don’t speak the same language.

When we received feedback that these translations were occasionally inaccurate, we put together an engineering squad to tackle the issue.

The goal was simple. We had to improve the quality of translations on our platform, while keeping cost efficiency in mind.

Building an in-house translation model

Initially, we completely relied on an external service to translate our chat messages. Although the third-party tool was generally effective, translations weren’t always accurate.

We realised we could create our own translation system and even aim to outperform off-the-shelf tools that often lacked context. Our advantage? Our wealth of data which is directly related to the context of booking rides and ordering food—and thus can generate more precise translations.

What are users saying?

For example, we found that many of the chats centred around pickup points. The exchange of courtesies such as ‘Hi’ and ‘Thank you’ were also common. Users also frequently used GrabChat to inform each other about their arrivals.

This allowed us to build a dataset that is representative of the types of chat messages exchanged between users and driver-partners, and diverse enough to capture all of the nuances of the conversations.

By aggregating and analysing past conversations, we can pinpoint recurring themes and topics. This valuable information can then be used to enhance the training of our translation models and improve their accuracy and effectiveness.

Setting the quality bar

We then worked with Grab’s localisation team to set a standard for translations. The goal wasn’t to create a dataset large enough to fully train our model, but to get enough data that would help us set some benchmarks.

These translations will also serve as a guide for the model to accurately capture the desired style and tone.

How to train a model?

To create our model, we also needed training data. We used an open-source Large Language Model (LLM) to create artificial translation data. The model had to be large enough to produce high-quality results and could handle the many Southeast Asian languages across the markets we serve.

This was especially important for languages like Vietnamese and Thai that consist of large character sets and diacritics—marks, shapes or strokes that accompany letters.

We then used some of the translations with high benchmark scores as data for our model to learn from. After all, the quality of translations by Language Learning Models (LLMs) is only as good as the data they are trained on.

Fine-tuning our model

Before our translation system was good to go, we also had to make sure that elements such as numbers or unique symbols were not misinterpreted or omitted during translation. This is critical as displaying incorrect numbers in a translation can confuse users.

The model pulls out all non-translatable items from the original message, tally each occurrence, and then attempt to find a corresponding match in the translation. If the match is not found, we reject the internally generated translation and revert to using an external third-party translation service.

Expanding our solutions

We believe that our proprietary in-house translation models are not only more cost-effective but cater more accurately to our unique use cases compared to third-party services. We will focus on expanding these models to more languages and countries across our operating regions.

We are also exploring opportunities to apply learnings of our chat translations to other Grab content. This strategy aims to guarantee a seamless language experience for our rapidly expanding user base, especially travellers.

The problem of language translation and translation quality benchmarking is highly complex. If you’d like to know more, for example about language detection (how do we know which languages which languages to translate from, and into?) and the measures required to do all of this at manageable costs, dig into this post on Grab’s engineering blog, which gets into these points with more detail.

Solutions