Breaking the sound barrier
Academics have used funding from the University of Edinburgh’s Data-Driven Innovation initiative to develop ground-breaking Gaelic speech recognition software. Article by Lucy Saddler
A Gaelic speech recognition system is something that the University of Edinburgh academic, Dr William Lamb, has hoped to develop since he completed his PhD in Linguistics.
Fast-forward a number of years and Dr Lamb was able to fulfil this goal, through data driven innovation and interdisciplinary collaboration.
On a programming course that is part of the university’s MSc in Speech and Language Technology, Dr Lamb met Dr Mark Sinclair, a speech scientist who works for the university spinout company, Quorate Technology.
Dr Lamb asked him about developing a speech recognition system for Gaelic:
“At the time, Mark said it’s not so simple; we would need a lot of data. I went back to him a few years later, once we had worked together and he said maybe we can attempt it now.”
They were joined by others including Dr Beatrice Alex, a DDI Chancellor’s Fellow at the Edinburgh Futures Institute and an expert in natural language processing.
When the pandemic hit in March 2020, museums and archives were forced to shut. The only public exhibitions were online but with many Gaelic archives still not digitised and transcribed, they were rendered inaccessible.
With this in mind, the group applied for funding through the Data Driven Innovation’s ‘Building Back Better’ open call, which aimed to promote social, cultural and economic recovery from the pandemic.
“When we saw the call, we thought our aims would fit well with different aspects of that ethos,” says Dr Lamb.
The DDI-funded project is delivered by the Edinburgh Futures Institute (EFI), which is one of five innovation hubs delivering data-driven innovation as part of the Edinburgh South East Scotland City Region deal.
As Dr Alex explains, “the whole point of the EFI is to encourage interdisciplinary collaboration.”
The project was a great example of where such joint working could be valuable. As Dr Lamb observed; “There’s no one fluent in Gaelic who is also a natural language processing expert.”
The key was to bring together a team with a diverse skillset, from experts on Gaelic morphology to speech processing specialists. No one person has all of the skills needed to work in this area.
Building the system
Expectations were initially low. Before they could even begin to develop a speech recognition system, the team needed data in the right form – and that data had to be built from scratch.
“Data is everything. Data is the currency here. The reason why it hasn’t been done before is that the data just didn’t exist in the correct format,” explains Mark Sinclair.
“The really innovative thing that we did was repurposing data that was never intended for this purpose to be useful for speech recognition training,” he adds.
The team needed audio data segmented into smaller utterances and an exact verbatim transcript of that audio.
Using Gaelic archives from the School of Scottish Studies and Tobar an Dulchais (an online catalogue of Gaelic oral recordings), the team converted pre-existing Gaelic language data.
For example, some of the initial data came from handwritten manuscripts. The team first had to digitise the manuscript, recognise the handwriting and correct any errors.
The next stage was the use of an aligner, which matches up the audio data with its verbatim transcript.
However, the aligner needed a speech recognition system, but with no Gaelic speech recognition system, the team were once again prompted to think outside the box.
“The big and innovative thing that we did was to use an English system to align Gaelic.”
Mark explained that the team managed to create a pseudo-Gaelic English language, by mapping the Gaelic sounds onto English sounds, which allowed the aligner to work.
Once they had the first alignment of the initial Gaelic data, only then could they train an Gaelic speech recognition system.
Having repurposed the data and used the aligner to break it down into utterances, the next step was using that data to train the acoustic model.
Mark explains: “The acoustic model learns the association between the words and the audio examples of the words – the same way a baby learns to speak.”
At the same time, the team also trained the other component of the speech recogniser – the language model, which learns the patterns of a language, enabling it to predict the likely next word of a sequence.
For example, if you hear someone say ‘the United States of…’ the most likely next word is going to be America. The language model learns to predict just like that.
The speech recogniser then uses a combination of both models with the acoustic model looking at sequences of individual sounds to work out the most likely next sound. Following on from that, the language model looks at sequences of sounds to work out the reciprocal word.
Through these two models, the speech recogniser predicts what was spoken and transcribes it.
One-step closer to a Gaelic speaking voice assistant
The Gaelic speech recogniser currently has about 75% accuracy, but the more data that can be used for training the language and acoustic models, the more accurate the system will become.
In comparison to the 100 hours of training data that the team have used so far, an English speech recogniser with human parity will be trained on about 10,000 hours of data.
“But we don’t need human parity to make this machine useful. Once we cross the 85% mark, the system becomes suitable for automatic transcription,” explains Mark.
The team are focused on launching a website that will allow users to record themselves speaking Gaelic or upload an audio file and then receive a transcript – all on their own mobile or laptop, making the system more efficient and less labour intensive.
The response to the team’s success has been universally positive within the Scottish Gaelic community.
“Everyone wants this now and they want it to be perfect. Everyone who works with Gaelic will have a reason to use this and for us, that’s incredibly exciting,” says Dr Lamb.
Although the speech recogniser alone won’t be enough to totally revitalise Gaelic, an endangered language, Dr Lamb believes that the opportunities the recogniser creates will be vital in encouraging more people to speak the language consistently.
The team are keen to adapt the technology towards conversation agents such as Alexa or Siri. Dr Lamb believes that this could make speaking the language consistently more enticing, especially for young people.
“We could bring in a device to schools that kids can play with. They can teach it to say a few things or teach it to respond to commands. That suddenly makes the language exciting and useful for kids, it improves the chances they’ll keep speaking it. And who knows – perhaps we’ll inspire a good batch of Gaelic-speaking NLP researchers.”
The project is funded by the Data-Driven Innovation initiative (DDI), delivered by the University of Edinburgh and Heriot-Watt University for the Edinburgh and South East Scotland City Region Deal. DDI is an innovation network helping organisations tackle challenges for industry and society by doing data right to support Edinburgh in its ambition to become the data capital of Europe. The project was delivered by the Edinburgh Futures Institute (EFI), one of five DDI innovation hubs which collaborates with industry, government and communities to build a challenge-led and data-rich portfolio of activity that has an enduring impact.
Read the latest Case studies
As one of the TRAIN@Ed programme fellows, Sarah Galey had the opportunity to work with…
Childlight, the data institute based in the University of Edinburgh, is exploring how data can…