AI Translation: Impact Could AI Save Europe’s Rare and Endangered Languages from Extinction?

Published on:

  • Meta’s NLLB project aims to enhance  AI translation services for 200 lesser-spoken languages.
  • Consulting native speakers and language specialists is crucial for better translation quality.
  • AI translation can play a significant role in preserving and revitalizing endangered languages across Europe.

It will soon be easier to see Facebook and Instagram posts in lesser-spoken global languages, but an expert suggests that Meta should talk to native speakers to improve the tool.

It will soon be more accessible to see Facebook and Instagram posts in 200 lesser-spoken languages around the world. Meta’s No Language Left Behind (NLLB) project announced in a paper published this month that they’ve scaled their original technology.

The project includes a dozen “low resource” European languages, like Scottish Gaelic, Galician, Irish, Lingurian, Bosnian, Icelandic, and Welsh.

According to Meta, that language has less than one million sentences in data that can be used. Experts say that to improve the service, Meta should consult with native speakers and language specialists as the tool still needs work.

AI Translation: Impact Could AI Save Europe’s Rare and Endangered Languages from Extinction?

Meta trains its artificial intelligence (AI) with data from the Opus repository. It is an open-source platform with a collection of authentic text of speech or writing for various languages that can program machine learning.

Contributors to the dataset are experts in natural language processing (NLP), the subset of AI research that allows computers to translate and understand human language. Meta said they also use a combination of mined data from sources like Wikipedia in their databases.

The data is used to create what Meta calls a multilingual language model (MLM), where the AI can translate “between any pair… of languages without relying on English data,” according to their website.

The NLLB team evaluates the quality of their translations with a benchmark of human-translated sentences they’ve created that is also open-source. This includes a list of “toxicity” words or phrases that humans can teach the software to filter out when translating text.

According to their latest paper, the NLLB team improved the accuracy of translations by 44 per cent from their first model, which was released in 2020. When the technology is fully implemented, Meta estimates there will be more than 25 billion daily translations on Facebook News Feed, Instagram, and other platforms.

NLLB Project Insights

William Lamb, professor of Gaelic ethnology and linguistics at the University of Edinburgh, is an expert in Scottish Gaelic, one of the low-resource languages identified by Meta in its NLLB project.

About 2.5 per cent of Scotland’s population, roughly 130,000 people, told the 2022 census that they have some skills in the 13th-century Celtic language. There are also approximately 2,000 Gaelic speakers in eastern Canada, where it is a minority language. UNESCO classifies the language as “threatened” by extinction because of how few people speak it regularly.

meta-ai-translation
AI translation can play a significant role in preserving and revitalizing endangered languages across Europe.[Photo/Medium]

What they should do … if they want to improve the translation is to talk to the people, the native Gaelic speakers that still live and breathe the language,” Lamb said. Lamb noted that Meta’s translations in Scottish Gaelic are “not very good yet” because of the crowdsourced data they’re using, despite their “heart being in the right place.”

Most native speakers are in their 70s and do not use computers. The young speakers “use Gaelic habitually, unlike their grandparents.” A good replacement would be for Meta to strike a licensing agreement with the BBC, which works to preserve the language by creating high-quality online content.

Native Speaker Consultation

Alberto Bugarín-Diz, professor of AI at the University of Santiago de Compostela in Spain, believes linguists like Lamb should work with Big Tech companies to refine the data sets available to them.

This needs to be done by specialists who can revise the texts, correct them, and update them with metadata that we could use,” Bugarin-Diz said. “People from humanities and a technical background like engineers need to work together; it’s a real need,” he added.

Bugarin-Diz continued that Meta has an advantage in using Wikipedia because the data would reflect “almost every aspect of human life,” meaning that the quality of the language could be much better than just using more formal texts.

However, Bugarin-Diz suggests Meta and other AI companies take the time to look for quality data online and then go through the legal requirements necessary to use it without breaking intellectual property laws.

ALSO, READ; India’s Legal Battle Against Highrich Online’s Crypto Ponzi Scheme Unfolds.

Lamb, meanwhile, said he won’t recommend that people use it due to errors in the data unless Meta makes some changes in their dataset.

I wouldn’t say their translation abilities are at the point where the tools are useful,” Lamb said. “I wouldn’t encourage anybody as reliable language tools yet; I think they would be upfront in saying that too.”

Bugarín-Diz takes a different stance. He believes that if no one uses the Meta translations, they “will not be willing” to invest time and resources into improving them. Like other AI tools, Bugarin-Diz believes it’s a matter of knowing the technology’s weaknesses before using it.

By integrating the AI translation technology within the No Language Left Behind (NLLB) project, Meta aims to make lesser-spoken languages more accessible. However, experts agree that collaboration with native speakers and language specialists is crucial to refining and perfecting these AI models.

Despite the impressive advancements in natural language processing, continual improvements and community involvement remain key to preserving and promoting endangered languages.

Related

Leave a Reply

Please enter your comment!
Please enter your name here

Kevin Odero
Kevin Odero
Kevin is a web3 and crypto enthusiast who writes about various developments and advancements of web3 as a whole, and how it affects Africa. When not writing he likes following technological advancements and reading as a hobby.