Translation from ML Kit Supports Direct MT

Translation from ML Kit Supports Direct MT

The translation service from HMS Core ML Kit supports multiple languages and is ideal for a range of scenarios, when combined with other services.

The translation service is perfect for those who travel overseas. When it is combined with the text to speech (TTS) service, an app can be created to help users communicate with speakers of other languages, such as taking a taxi or ordering food. Not only that, when translation works with text recognition, these two services help users understand menus or road signs, simply using a picture taken of them.

Translation Delivers Better Performance with a New Direct MT System

Most machine translation (MT) systems are pivot-based: They first translate the source language to a third language (named pivot language, which is usually English) and then translate text from that third language to the target language.

This process, however, compromises translation accuracy and is not that effective because it uses more compute resources. Apps expect a translation service that is more effective and more accurate when handling idiomatic language.

To meet such requirements, HMS Core ML Kit has strengthened its translation service by introducing a direct MT system in its new version, which supports translation between Chinese and Japanese, Chinese and German, Chinese and French, and Chinese and Russian.

Compared with MT systems that adopt English as the pivot language, the direct MT system has a number of advantages. For example, it can concurrently process 10 translation tasks with 100 characters in each, delivering an average processing speed of about 160 milliseconds — a 100% decrease. The translation result is also remarkable. For example, when translating culture-loaded expressions in Chinese, the system manages to ensure the translation complies with the idiom of the target language, and is accurate and smooth.

As an entry to the shared Task: Triangular MT: Using English to improve Russian-to-Chinese machine translation in the Sixth Conference on Machine Translation (WMT21), the mentioned direct MT system adopted by ML Kit won the first place with superior advantages.

Technical Advantages of the Direct MT System

The direct MT system leverages the pioneering research of Huawei in machine translation, while Russian-English and English-Chinese corpora are used for knowledge distillation. This, combined with the explicit curriculum learning (CL) strategy, gives rise to high-quality Russian-Chinese translation models when only a small amount of Russian-Chinese corpora exists — or none at all. In this way, the system avoids the low-resource scenarios and cold start issue that usually baffle pivot-based MT systems.

Direct MT.png Direct MT

Technology 1: Multi-Lingual Encoder-Decoder Enhancement

tech 1.png

This technology overcomes the cold start issue. Take Russian-Chinese translation as an example. It imports English-Chinese corpora into a multi-lingual model and performs knowledge distillation on the corpora, to allow the decoder to better process the target language (in this example, Chinese). It also imports Russian-English corpora into the model, to help the encoder better process the source language (in this example, Russian).

Technology 2: Explicit CL for Denoising

tech 2.png Sourced from HW-TSC's Participation in the WMT 2021 Triangular MT Shared Task

Explicit CL is used for training the direct MT system. According to the volume of noisy data in the corpora, the whole training process is divided into three phases, which adopts the incremental learning method.

In the first phase, use all the corpora (including the noisy data) to train the system, to quickly increase its convergence rate. In the second phase, denoise the corpora by using a parallel text aligning tool and then perform incremental training on the system. In the last phase, perform incremental training on the system, by using the denoised corpora that are output by the system in the second phase, to reach convergence for the system.

Technology 3: FTST for Data Augmentation

FTST stands for Forward Translation and Sampling Backward Translation. It uses the sampling method in its backward model for data enhancement, and uses the beam search method in its forward models for data balancing. In the comparison experiment, FTST delivers the best result.

tech 3.png Sourced from HW-TSC's Participation in the WMT 2021 Triangular MT Shared Task

In addition to the mentioned languages, the translation service of ML Kit will support direct translation between Chinese and 11 languages (Korean, Portuguese, Spanish, Turkish, Thai, Arabic, Malay, Italian, Polish, Dutch, and Vietnamese) by the end of 2022. This will open up a new level of instant translation for users around the world.

The translation service can be used together with many other services from ML Kit. Check them out and see how they can help you develop an AI-powered app.