Just a few days after Game Global Digital Summit October 2021, we had a conversation with Konstantin Savenkov, CEO at Intento, on the release of their latest State of Machine Translation Report and the use of MT in the video game industry. Konstantin shared some interesting insights on the current state of MT for the localization industry, with a special focus on its effect on video games. Read our questions and answers below for more details.

Improvements in the Machine Translation Landscape and What They Mean for the Video Game Industry

What’s with all the buzz around machine translation – and why is it happening now?

The main reason why people are talking about machine translation is that it delivers. We see that without machine translation, and despite large budgets or a pool of translators, companies are just able to translate about 5% of what’s needed. All real-time translation scenarios, such as live communication or on-demand translation of community content, are impossible without MT. Most content is only translated to the main languages, which ultimately excludes a huge population that doesn’t have access to these languages. Machine translation replaces translation with post-editing, and the cost of human intervention is significantly reduced. This way, companies can do much more with a human touch, and the rest, including real-time, goes with MT only. It’s not perfect, but in 2021 it’s 97% flawless, not 50%. 

Recently we’ve also seen that the quality of off-the-shelf stock engines is constantly increasing. On our side, we have systems to detect how those models change. Several major MT providers updated their models at the end of August. We also see that since 2019 we have had the possibility to customize models with your own content, and more recently, since the beginning of 2020, you are able to also use your glossaries on top of these customized models to ensure that the relevant terminology is followed by linguists and post-editors. 

Last but not least, we finally have multiple public success cases where companies see that MT promises were kept and they were not just a marketing strategy promoted by vendors. People are sharing how they applied machine translation and how they boosted translators’ or post-editors productivity by four or five times. Everyone now understands that this is an important competitive edge, and if you don’t embrace this new technology, your competitors will, and they will get to those new markets first.

Let’s talk about machine translation for gaming companies specifically; what’s in it for them? Is MT only used for in-game content? Or are there other use cases? 

It does work for in-game content, but it always needs post-editing, and we know that in the gaming industry, you face a number of other challenges, i.e. sensitive content, NDAs, and decision-making conflicts between localization and production. However, many translators are starting to see the quality and value of using MT. 

Outside of in-game content, companies operating in foreign markets have other types of content to translate aside from product-related content. When it comes to user-facing content, you have support tickets, support chats with customers, and even chats between players who want to build cross-country communities to discuss to play and talk about their favorite games. There’s also a number of internal use cases, i.e., communication amongst developing teams who might sit in different locations all over the world or software documentation, etc. These cases are typically not within the localization department’s domain. 

In my opinion, for a company to truly operate globally, a lot of content needs to be translated in real-time. Using machine translation takes the pressure off business operations to make this happen.

We have seen that in the case of gaming companies, machine translation can be very useful on the business side, not only for in-game content. So, what recommendations do you have for a company that is just starting to use machine translation? For example, what is the best engine to start with? 

One thing we discovered is that there is no best vendor. It might sound a bit controversial, especially from a traditional procurement point of view, but the reality is that each engine will perform differently depending on language pairs and domains. Each model is customized and trained based on the amount of data you have at your disposal.

When working with multiple language pairs, we recommend that you consider all available machine translation systems from around the globe, testing and customizing them on your content to see which one works better for you. A good list to find all these resources is our latest State of Machine Translation Report.

For some customers, we have seen that they select a certain system because it is faster for post-editors to work on it. Even if it has some translation mistakes, they are easy to spot, and outside of those mistakes, the translation is very good. However, in other cases where MT is used for real-time translation (and the content goes directly to end-users), clients prefer less risky systems which produce less spectacular results but with fewer mistakes. Even for the same language pair, you might decide to select two different MT vendors for different uses.

So far, we have talked about customization based on your content, but how do you start where there is no legacy data you can use to train your engine? 

The typical chicken and egg problem. First, let me mention that it’s not only about training your engines but also whether you can train your resources and adjust your workflows to adapt to MT. In any case, when you have no legacy data, you must use off-the-shelf stock models. There are still ways to get them closer to your domain by adding a glossary, which is possible for most of the systems today.

Another way to jump-start this process is to base your decision on the industry report.  

In our case, we worked together with TAUS to select high-quality data in several domains (for entertainment, we only have one language pair, Chinese to English, but there is still plenty of data for other domains) and we translated it with 29 off the shelf engines. Then we compared the automatic scores – semantic similarity scores – to see which engines are statistically significant.  Internally we ran this type of evaluation on a wider scale for roughly 50 language pairs. 

We made the results of both evaluations available on our platform, where we have something called smart routing. After sending a request to translate text, it will be routed to the best off-the-shelf model. This approach helps you get the best out of off-the-shelf models. As your content gets edited by post-editors, you will start having your own custom translation memories which you will be able to use to further improve your models. At the end of the day, the right way to think about this is that every dollar you spend on editing the MT output should be a dollar you spend on improving your models. 

Tell us a bit more about the report that you just published. What are the most interesting insights you would like to share? 

We’ve run this report since 2017, so it’s been through many advancements and changes. We’ve seen a huge increase in language pairs support, from 16,000 a year ago to now about 100,000. That’s not just single providers adding language pairs; I am talking about unique language pairs across all providers. This is a dramatic increase that mostly comes from two machine translation providers – one is a Chinese company called Niutrans with 88,000 language pairs, and then Alibaba with 20,000.

For the first time, we evaluated open-source, pre-trained models which you can deploy yourself and run. M2M from Facebook supports lots of language pairs, and although it may not be as commercially viable to fully pursue all languages, I generally believe that support for multiple languages is the key to accessibility. Of course, these open-source engines don’t always perform as well compared to commercial systems, but they still rank quite satisfactory. 

Out of the 29 systems we evaluated, 19 ended up being the best for at least one language pair and domain. To cover all 13 language pairs and 7 domains, you need 9 different MT engines. That’s still supporting my case that there is no best standard. You need at least nine different engines to get the best quality for all language pairs.

When it comes to Spanish, Russian, or Chinese, several machine translation engines in the top tier, meaning that one of them may be the best by score, but it’s not statistically significant. You cannot confidently say one is better. But at the same time, if you work with domains such as legal, financial, and health care, they require a more careful choice of machine translation provider, as relatively few models perform at the top level. Out of all domains evaluated, we see that language education got relatively low scores, meaning for this domain, you will most likely need customized models.

Are there any other considerations to choose MT engines besides translation quality?

Yes, depending on your approach, there will be many. You need to look at how the data is protected, especially if you translate anything where you may have personally identifiable information (communication, support, etc.). You also need to be mindful of data protection legislation such as GDPR etc. 

At Intento, we collect and update this information in a database so that we can pick the right model. We look at privacy, how that is used and being stored, and if the machine translation company reserves the right to use your data to improve their models. You need to be careful because some companies do that by default, and you must know how to opt out. 

Our goal is to focus on linguistic quality and security, avoiding these trade-offs between selecting the best engine from a linguistic point of view or the engine which has the right features to be integrated with your system. To solve this issue, Intento offers a Machine Translation Hub that works as an integration platform with any engine available in any system. All engines will be enhanced with special features such as tone of voice control, gender control, profanity filtering, etc. We understand it is difficult to choose with so many factors, and we try to make it as simple as possible. 

What are the next steps after choosing the best engine (or combination of engines) for our needs?

It depends on the use case. Will the content need post-editing or be presented to the end-users as is? The latter would be the easiest case because we just enable it, and if the user experience is right, you start benefiting from the start. You just need to make sure you collect their feedback to improve the model. 

Will it be used for real-time translation for end-users? 

On the other hand, if we need to do post-editing, you typically go to your language service provider, and you tell them to use the best model from your machine translation system. You’ll need to find a way to share the benefits brought by MT, so you’re on the same page regarding the kind of effort machine translation entails compared to translating from scratch. Your vendor will not only have to agree on a price but will also have to go through significant changes. Not every translator is a post editor, so they most likely will have to either train their resources or onboard employees, adjusting their own toolchain to work with MT. Otherwise, you, as the company, will need to adjust to their workflows. It’s important to understand that you need to work with them transparently on finding the right business model so that they can comfortably and confidently work for you.  

With so many factors to navigate, how can we help a localization manager at a gaming company decide whether to use machine translation and then how to implement it? It sounds like a lot of work and resources are going to be needed, and the ROI might not be straightforward or easy to justify at a company level.

There are two ways to go about this: one way could be that you outsource the whole project to a provider like Intento, who will take care of everything, and the only thing you will have to do as a localization manager is to provide feedback about MT quality. When you adopt new technology, it is crucial to involve humans. This is because they will give feedback on how the technology works or preferential requirements that are harder to capture by machine. It’s also because we want to help them overcome this psychological challenge, becoming aware of the decisions they are making. When you help humans with technology, you always want to take their opinion on that. So, with this model, the only requirement from the localization manager is to set up the workflow and quality control process, then the technical side of things is taken care of by the vendor. 

Another way to go about it, which some gaming companies have chosen, is to start developing their own expertise after an initial outsourcing step. This might be one of the reasons why localization is often one of the first use cases for machine translation in a company, eventually expanding to other departments such as customer service, legal sales, marketing, etc. These other departments will have their own questions and requirements, and sometimes talking to them is easier if you have your own internal expertise or company-based MT program. In this case, we see that customers start to hire people. And to help them do all things we do on our side, we have started offering what we call the MT Studio, which is a set of tools to clean data, customize models, and so on.

Regarding the ROI part of this question, it’s useful to provide edit distance reports with data that will help compare MT post-editing effort to TM editing effort and establish the right price.

Back to use cases: have you seen MT applied successfully to in-game content? Successfully, meaning everyone is happy with the workflow and quality while seeing savings. 

I think it will take some time until we have a public use case around that – it’s all a work in progress. We had a pilot project where we ran a blind study with trained models at Playrix (which was also presented at the Game Global Digital Summit in March 2021), where translators – usually very skeptical in the gaming industry – were positively impressed. It is hard to make a production case even if it works quite well for some language pairs. As already mentioned, it doesn’t only require machine translation; it takes time and effort, even when you have machine translation, to make it successful for in-game content in production. Like in other machine translation or localization use cases, the way it’s going to happen in games is that large companies will have to bring it into fashion. They are the only ones who can do it because they currently have a tight grasp on the industry. Consequently, language service providers will have to learn how to serve them, and after that, it will become more widespread. 

A question that non-technical people in the industry might be interested in: we see that there is a big difference between engines, language pairs, and verticals. Do you see this changing anytime soon? Will all engines be able to handle all languages in all domains? 

We see it moving in the opposite direction. Machine translation engines need access to data in order to work well for all domains and language pairs. Of course, you can collect some data from the Internet, or you can buy some data from companies like TAUS, but the most useful domain data belongs to specific companies which restrict access to outsiders. In fact, today, fewer and fewer companies are willing to share their data to train models for other companies. If you take the GDPR, for example, to use data for training, you need to store it somewhere, so it turns from being a data processor to the data controller, which is a hard thing to achieve. 

Also, if you’re a retail company and you’re providing your retail data to a machine translation provider to improve their engine for retail, you are basically helping your competitors in the retail industry! Therefore, we don’t see much interest in providers getting all the data in the world to train the best model for every language domain. 

Simply training on data is not enough – you also need to improve engines based on customer feedback. We provide tons of this feedback to different machine translation providers based on our experiences. Every MT engine is improving through feedback from the companies who use it, and there are different companies for different engines, so they’re moving in different directions. A machine translation engine might be good for legal content but not as good for colloquial content, and vice versa, which widens the gap.