A new method is in development to ensure quality in automatic voice recording by creating a new algorithm that makes it resistant to background noise of 10dB and higher.
This new algorithm can operate in real-time, making it possible to use the software to collect voice biometrics for a wide variety of purposes, according to researchers from HSE University and Nizhny Novgorod State Linguistic University (LUNN).
Speech recognition technologies have been the focus of much research and development over past decade, with significant progress having been achieved to date. This is evidenced by the rising popularity of voice assistants, such as Siri and others.
Along those lines, there will be almost 8 billion smart speakers by 2023, compared to 2.5 billion speakers in use in 2018, according to a new forecast from British analysts at Juniper Research.
These technologies seem to be attractive not only for mobile app creators, but also for companies, such as call centers and banks, which utilize phone subscriber verification. However, there are numerous obstacles in the way of the widespread introduction of voice identification systems. One of them is poor quality of voice reference templates. Every so often, the recognition algorithms may refuse an authentic user due to the presence of noise in the voiceprint template.
The problem is voice biometrics data is collected in offices, where there usually is a lot of background noise. Since a mere pencil tap on a desktop might prevent the algorithm from identifying a speaker’s voice, it is essential recordings with ambient noise be identified during the voiceprint collection. The new method put forward by Professor Andrey Savchenko (HSE University) and Professor Vladimir Savchenko (LUNN) can reduce this error rate down to two percent.
Companies are interested in having preventive tools at their disposal. For instance, this could be a system which automatically identifies if a recording is bad before the client leaves their office. With this in mind, the goal is to develop an effective method capable of processing sounds on any device, from a cheap smartphone to a laptop or an office computer, in real-time, Savchenko said.
Furthermore, the researchers proposed using an algorithm that splits the recorded speech into short frames, measuring the pitch frequency in each of them. Their software assesses the pronunciation stability against its average level and displays the dependence of the measured speech quality on time as a color chart.
Pitch frequency (PF) is a unique characteristic of human speech. The PF can either increase or decrease depending on the speaker’s emotional state, which causes this fluctuation, the reserchers said.
The system treats the initial parts of a recording as a template, awarding them with 100 percent quality. If the estimated pitch frequencies of the next speech frames are more or less stabilized, the recording will be seen as of good quality. If there is a wide range in the values, the record will be considered faulty. Such faults may be caused by an interfering voice with a different pitch frequency.
A major Russian bank is interested in this development, and has already provided 30 recordings from its database for the initial testing. The software findings appear to have matched the estimation of the people who check the quality of the recordings in 93.3 percent of the cases.