Allion Labs | Greg Tsai
Test Environment and Measurement Technologies are the Pillars of Voice Recognition Testing
Voice recognition is gradually being incorporated into our daily lives in the forms of smart speakers, phone voice assistants, and other everyday devices. In order for voice recognition to be helpful for users in real life though, not only should it recognize speech in the presence of background noise, it should also be able to interpret speech properly.
To assess the effectiveness of voice recognition, acoustic conditions of the environment should be as similar as possible to everyday environments of users. For example, recognition of speech from different directions and ranges must be considered. Thus, it is necessary to create an acoustic test environment for assessing voice recognition. The setup and tools should also be examined to ensure test reliability.
In our experience, the best method to assess the stability of the test environment setup is if each measurement of delay time remains consistent. This is why it is necessary to measure the delay time in all three speakers. In doing so, we can calculate the latency bias in each run.
In the test environment of a smart speaker, two additional speakers are needed: one to simulate a speaking user and the other to simulate background noise. The two speakers and smart speaker will be placed according to different test scenarios, with a free-field microphone equidistant from the two speakers and smart speaker. Latency testing can begin once the set-up is complete.
Figure 1: Latency Test Set-up
Figure 2: Actual Latency Test Set-up
What Is Difficult about Smart Speaker Latency Testing?
Perhaps most people think speaker latency testing is a simple task: all that needs to be done is to measure the time between the signal activation of the speaker to the signal reception of the microphone. This method definitely works for analog speakers, but when it comes to smart speakers, latency testing doesn’t work as you might expect.
For the latency testing of smart speakers, signals of test recordings must be played from the internet instead of being directly fed via analog input jacks. Precisely controlling the start time of test recordings is a challenging task since it is impossible to manually pinpoint the accurate beginning of test signals. Thus, Allion’s engineering team created a test method which allows for automatic and accurate control of the test recording audio signals and developed digital signal processing technologies for determining the exact start times.
Latency Testing Demonstration and Results
We used the Audio Precision APx500 for test setup. First, we placed the speakers and the microphone 2.5m apart and received a reading of 7.35ms of delay in 25°C. With the formula C=331+0.6T, we can find an acoustic velocity of 346 m/s.
The distance can be calculated as shown below:
Distance = Speed * Time = 346 * 0.00735 ≈ 2.54 m.
This demonstrates that this measuring method is effective in determining audio latency.
Figure 3: Audio Latency Testing Validation: Speakers and Microphone are placed 2.5 meters apart
For comparison, we conducted both manual and automatic measurements. The data collected from manual measurements are as follows:
Table 1: Deviation of Manual Measurement
The line chart comparison is as follows:
Figure 1: Deviation of Manual Measurement Line Chart
After several testing trials, there are obvious deviations of Speaker A, Speaker B, and the Smart Speaker. These differences can result from time differences due to manual operation or manual alignment. In such manual methods, it is difficult to discover the variables in the test system of the smart speaker, because the measurements are immensely affected by manual errors.
Next, we will use Allion’s exclusive automatic test system to complete the setup in Figure 2, with the result shown below:
Table 2: Deviation of Automatic Measurement
Figure 2: Deviation of Automatic Measurement Line Chart
From the results, one can see that the delay times of Speaker A (role of speaking user) and Speaker B (background noise simulator) remains stable even after 20 trials, varying in the range of approximately 0.0024 seconds and 0.001 seconds.
The smart speaker has a jitter: every measurement of delay time varies slightly with a range of 0.15 seconds. This is mainly caused by inconsistencies due to internet connection.
From these tests, we can see that Allion’s automation of audio latency testing removes the uncertainties of manual testing and shows the delay of the Smart Speaker, creating a solid basis for voice recognition testing.
We can conclude that Allion’s automated latency testing is capable of accurate measurement that manual testing cannot achieve. Aside from delivering the quality and quantity desired by smart speaker manufacturers, Allion can help vendors eliminate manual errors through automation, save time, and increase efficiency.