The translation of text between languages comes with a wealth of challenges, even for deep learning AI programs. However, machine translation has made great strides in this area of linguistics.
When you need or want information written in a language you don’t know, you have a few possible solutions: learn the language yourself, find and pay for a good human translator, or use a machine translation program like Google Translate. Luckily, the easiest and cheapest solution is also rapidly becoming one of the most accurate. The benefits of this for business or other urgent uses are obvious.
Additionally, advances in any one aspect of computational linguistics will in turn advance other aspects of the field. Chatbots for business use or mental health outreach programs, for example, both benefit from connecting with people who don’t speak English in their native tongues.
There are not a lot of internal data sources for this use case.
There are a lot of external data that would be essential for automatic translation programs, particularly language dictionaries.
Some other useful external data include having native speakers of the target language on staff or having an open-source model that accepts user-submitted corrections. Additionally, since many translation programs translate to and from English instead of directly from one language to another, having accurate English dictionaries and translation capabilities in the program can be very helpful.
Challenges of automatic translation abound, especially between languages from vastly different language families. In addition, computational linguistics already has trouble determining tone; in translation, this becomes an even greater difficulty. Translating creative writing therefore is much more difficult than technical writing.
Further, a lot of text submitted for translation contains typos, grammatical errors, and emojis. Enabling a machine translation service to read errors correctly is an ongoing challenge.
Finally, automatic translation uses specialized coding languages and programs like Keras, RNN, and LSTM.
TranslateFX: What is Neural Machine Translation & How does it work?
[Facebook research assistant Angela] Fan noted that many machine translation models begin by translating from Chinese to English first, and then from English to French. This is done “because English training data is the most widely available,” she said. But such a method can lead to mistakes in translation.
“Our model directly trains on Chinese to French data to better preserve meaning,” Fan said. Facebook said the system outperformed English-centered systems in a widely used system that uses data to measure the quality of machine translations.
VOA Learning English: Facebook Develops Machine Translation System for 100 Languages