Shaping the future of speech recognition

Back to Previous

Shaping the future of speech recognition

Writer

Likes 0

Expected time to read 5 min

As we live through the most connected time in history, its clear to see that data plays a huge role in many businesses. It’s what keeps businesses running smoothly, so naturally it needs to be of the highest quality. But what are the most reliable sources? Customer feedback is at the top of most businesses lists. Customers certainly won’t hesitate to tell you what they like or dislike about your product or service. But how can we use all this information? Can conversations with call centres or an online chatbot harvest all the value from these conversations?

Cambridge-based, any-context speech recognition company Speechmatics’ story began back in the 1980s when the Founder, Dr Toby Robinson introduced the approach of applying neural networks to speech recognition, demonstrating that neural networks greatly outperform traditional systems. Today’s computing power, along with the rise of graphics processing and cloud computing, now makes the huge potential of this approach a reality and the company believe they are nearing the point where computers truly understand us. Here, Startups Magazine talks to John Milliken, CEO of Speechmatics, who is playing an essential role in accelerating Speechmatics’ global growth.

Milliken comes from a tech background, particularly fintech. He has a track record in accelerating the growth of exciting, entrepreneurial businesses in both private and public spheres, including being a member of the founding team of a company that grew to a market capitalisation of over £1bn.

Let’s get technical

Automatic Speech Recognition (ASR) is the use of Artificial Intelligence (AI), or more specifically Machine Learning, to be able to transcribe what people say. Speechmatics provides ASR technology in a form of an ASR engine for large enterprises so that they can harvest data and understand, at scale, what people are saying to one another.

“Humans are built to communicate through voice, that's how we are designed, so if you look at the interfaces between the human and the business, you’ll see that all of those points can have value added by using this technology, to either optimise the conversation or improve the workflow around it. Therefore, this unlocks a variety of different use cases.” explained Milliken.

Most consumers will have encountered speech recognition technology in some form, either through their bank or mobile phone. It’s often used by customer services, however, there are many other industries using it too. For example, the media industry uses it for captioning, editing, indexing, and locating data; and in the legal and medical industry, there’s transcription work to help support compliance needs. It’s clear to see that there is so much more to this market than the voice assistants and the support conversations you might have on the phone.

Speechmatics’ ASR technology has the flexibility to be deployed anywhere. It can be used on-premises to ensure data remains within your private environment, with your choice of cloud provider, or using Speechmatics’ cloud offering. Users can also transcribe pre-recorded files whenever they want and have the option to schedule a transcription at a time that is best suited to them.

Users also have the option to transcribe in real-time to gather their data instantly. Speechmatics’ Low Latency Finals deliver accuracy through automatic word correction re-scoring. The configurable latency setting is designed for low latency use cases, including captioning live broadcast and monitoring the media.

It can also identify a change of speaker within a transcript, a token is automatically added to the transcript each time a speaker change is detected. This enables easy modification of the transcript to improve its readability.

Last year, the company launched their Advanced Punctuation feature to deliver the accurate placements of full stops, commas, question marks, exclamation marks, as well as enhancing capitalisation in the transcription.

A new approach to global English

Traditionally, to get the most accurate results from speech recognition technology, specialising was key. When confronted with accents, dialects and other regional variations in speech, specialist language packs were developed to ensure reliable results. But times have since changed, and speech recognition is constantly improving. Speechmatics’ technology has mastered 31 languages in a multitude of dialects and accents, regardless of context.

Since their launch, Virtual Personal Assistants (VPAs) such as Siri and Alexa have faced issues with certain accents for English language recognition. This has led to many users being forced to modify their speech patterns to make themselves understood, adapting their voices to the technology.

However, Speechmatics’ technology is adapting to its users. By harnessing recent advances in algorithms and neural network architectures, Speechmatics can now deliver one English language pack supporting all major accents and dialects. Removing the need to use multiple languages packs for English dialects means customers will benefit from simplified deployments as well as a reduction in the overall footprint. In turn this reduces the overhead costs for customers regardless of application or use case.

Teamwork makes the dreamwork

Speechmatics has a strong team of researchers with 14 PHDs, who are at the centre of AI, neural networks, machine learning and language models. Teams are encouraged to innovate and iterate, applying their knowledge and expertise quickly to the ASR technology. Continual development means that companies’ languages are constantly evolving to provide accuracy and performance.

“With a culture that does not separate research from development, Speechmatics is constantly looking for opportunities to prototype new capabilities, evolve its language offering and rapidly develop new algorithms to remain at the forefront of ASR development,” explained Milliken.

In the last six months, the company has opened an engineering centre in Chennai, India and in the Czech Republic. Milliken explained why they chose these locations: “We know that these locations have been recognised as the centre of speech technology and Machine Learning. In particular, the Czech Republic is known as the EU hub of AI and machine learning and we want to tap into the talent pool to make sure we’ve got the broadest range of talent available to us as we grow.”

Future growth goals

In the past six months, Speechmatics has also opened an office in Denver, US. “We’ve had a clear route into the market as we already generate over 45% of our revenues in the US. We’re looking to establish a much stronger presence there as we move forward. Most of the organisations we deal with are large software companies and it is a well-known fact that there is a number of those based in the US.”

From a product to market perspective, the company has some very exciting things planned for the future. Not only about how they can make changes to the quality of their speech recognition, but also thinking about how they can link it to video recognition to improve their understanding at scale.

“If we can link what people say and to how they're saying it. And if we can include a view on facial emotion to stimulation intonation, then those sorts of things will add a whole new layer of meaning around what we can provide downstream.”

This scale-up tech company is a testament to the innovation coming out of the UK and how it is leading the way in an evolving market.