Hi-tech tool prompts hope of virtual assistants fluent in Gaelic

Voice-activated digital assistants that speak Scottish Gaelic could be one step closer thanks to a hi-tech advance by University experts.

A team of linguists and Artificial Intelligence specialists, helped by funding from the Data-Driven Innovation initiative, has developed software that can listen to spoken Gaelic and print it out as written text.

Now researchers hope to upgrade the technology so it not only prints what it hears, but responds verbally too – just like voice assistants Siri, Alexa or Google.

The speech recognition system can already provide subtitles for online video content. It can also help those who are learning the language and support Gaelic-medium schoolchildren with dyslexia, the researchers say.

The team – led by the University of Edinburgh – collected millions of spoken and written Gaelic words and trained a computer system to recognise how they were related.

Researchers did so using a neural network – AI that enables computers to process language in the same way that humans do.

The software was developed in tandem with University of Edinburgh spin-out company Quorate Technology Ltd.

Also involved were the University of the Highlands and Islands and the Tobar an Dualchais/Kist o Riches project – a unique online record of Scotland’s rich oral heritage.

The team has begun working with Tobar an Dualchais to transcribe interviews with Gaelic speakers that include precious elements of oral history and traditional storytelling.

Project leader Dr William Lamb, of the University of Edinburgh’s School of Literatures, Languages and Cultures, said: “Ensuring that Gaelic has a place in the modern technological landscape is key for its survival.

“By enlisting the support and expertise of the Gaelic community, and giving back to them in this way, we hope to demonstrate that any minority language can thrive in the digital age.”

Researchers say the challenges of setting up automated speech recognition for Gaelic are immense compared with a similar system for English.

Anyone creating an English ASR system has a wealth of publicly available data to draw from, as well as technologies tailor made to assist with the process.

State of the art English systems today are trained on one million hours of audio, which is more speech than any individual will hear in a lifetime.

Good quality, transcribed speech data is generally not so easy to come by in minority languages. The Gaelic initiative is more or less starting from scratch with just 65 hours of audio, but a deal with Gaelic broadcasting organisation MG Alba will boost this significantly.

There are other handicaps. Despite the relatively small area in which Scottish Gaelic is spoken today, there is dense dialectal diversity and no accepted standard.

Despite the obstacles, the team is hopeful that the emerging technology can help widen access to Gaelic for anyone who wants to learn it.

The project is backed by the Data-Driven Innovation (DDI) initiative, which is led by the University of Edinburgh and Heriot-Watt University and a key part of the Edinburgh and South East Scotland City Region Deal.

DDI supports the Region in its bid to become the data capital of Europe and its Building Back Better fund awards grants of up to £500,000 to research that builds economic and social recovery from Covid-19.

The speech technology project also received generous support from Soillse, the National Research Network for the Maintenance and Revitalisation of Gaelic Language and Culture.

Read the latest DDI news

picture of Allison Schrager and Jamie Bartlett

Leading authors Jamie Bartlett and Allison Shrager confirmed for DDI’s conference

Book now for our conference Doing Data Better, 30 September 2021 The Data-Driven Innovation (DDI)…

Image of DDI lightbulb

Smoke-free cars can cut child tobacco risk by a third, study finds

Banning in-car smoking when children are present can reduce their exposure to tobacco smoke by…

Image of scientist

Genetic variations linked to severe Covid-19 risk identified

An international group of scientists have discovered 13 DNA sequences that are associated with people…