Although speech recognition technology has made great strides in recent years, there are still some limitations
which scientists are working hard to overcome. One major limitation is the environmental noise which limits the
use of this exciting technology to only close-talk or headset based applications.
By complimenting the speech recognition technology with microphone array technology, the above limitation can be
effectively resolved. Microphone array technology is based on the principle that by deploying more than one microphone
elements, the speech input processing software can simulate the human hearing in detecting the direction of the sound
source, classify them accordingly, derive and output only the sound of interest and suppressing all other unwanted sound.
The end result of microphone array processing are clean speech signals with environmental and interference noise suppressed,
which is perfectly suited for speech recognition and communication purposes.
Sound waves emitted from a single source would arrive at an array of elements spaced at certain intervals apart, with
different time delays. As the location of the sound source moves, the arrival time delays for each element would also
change. (Figure 1)
By careful summation of the signals received at each array element with the corrected time delays, the signal quality
(SNR) of the sound source would be improved at a factor proportional to the number of receiving elements. This process is
called “Beam Forming”.
Moreover, with the introduction of DSP-based processing, a process called “adaptive-beam forming” can be used to maximize
this signal-quality (SNR) improvement. A similar process called “Adaptive Interference-Cancellation” could also be used
to minimise or cancel the signal.
On top of this, the time delay information could be used to determine where the sound source comes from. We could then
determine if we want to maximize the signal-quality of sound sources that come from pre-defined locations or cancel those
that are outside.
BITwave’s Digital Microphone Array eliminates the need for a headset or close-talk microphone with a superior approach
to noise suppression. It separates wanted sounds from unwanted noise sources at multiple stages using different DSP
algorithms. Its purpose is to suppress different types of noise by different methods. The technology can be used in
any area in which hands-free environment is required.
In this stage, “Adaptive Beam Forming” algorithm is used to optimize the SNR of the wanted signal and continue to track
the user’s voice within the Sweet Spot Area. A led will light up if the user is within the accepted zone. Here, compared
to conventional beam former our adaptive solution:
· optimizes the wanted sound source voice
· compensates for microphone manufacturing characteristical differences and
· provides moderate room reverberation improvement.
Moreover, our logarithmically spaced microphone works in conjunction with our DSP software to improve SNR over a wider
frequency range
This is the “Adaptive Interference-Cancellation” stage. Its purpose is to suppress the man made noise sources
(those which contain directional information) the array identifies as outside the ‘sweet zone’. A ‘null or dead spot’
in the direction of each identified noise source, as shown in Figure 4, is produced adaptively to optimize the
cancellation.
The number of noise sources that can be suppressed by nulls is only limited by the bandwidth of the noise source and
the processing power of the DSP.
Moreover, we incorporated an innovative mixture of time and frequency-domain cancellation, an astonishing high-amount
of cancellation could be achieved.
At the end of this juncture, the array has separated out, by using directional cues and other information,
wanted man made sounds from unwanted man made sounds. Then both of these signals are sent to a third stage.
In this stage, the array primarily resolves the natural noise sources, those that are diffused or otherwise lacking
in directional information. These noises have no directional information and are therefore suppressed all the time.
This stage also functions adaptively and works by identifying noises that can further be removed from the wanted channel.
It continuously monitors the noises and suppresses them in the frequency domain.
Here, We use a mathematically optimized algorithm that suppresses the natural noises from human speech. This differs
from simple spectral subtraction used by others in that the resultant voice will have minimum distortion. Using this
mathematically optimized algorithm, we are able to suppress a lot more accurately and retain maximum intelligibility
than other forms of frequency suppression algorithms.
Lastly, this 3rd stage also performs some recovery of wanted signal information and then outputs the final processed
signal.
To determine when and how we want to optimize or suppress a sound source, every sound source, including noises,
needs to be analyzed. Basically there are two types of noises:
Man made noises: These include unwanted voices or specific environmental noise sources like fans, radios, etc that
are outside the “sweet zone”. These noise sources have directional information.
Natural noises: These include noise generated by microphones, circuit noise, and diffuse sound noise sources.
These noise sources contain no directional information.
In many situations it is very difficult to be 100% sure whether the signal captured contains wanted information,
unwanted information, or both. All sound sources are analyzed and tracked to determine their signal quality and
locations. These informations are then used to determine whether to improve or suppress them and how to suppress
them. This, we use an innovative combination of multi-dimension mathematical and fuzzy-logic modeling to accurately
determine the sources. Here, the robustness of this combination of algorithm could be proven in the presence of
multiple and non-stationary noise sources that often cripples other microphone-arrays.
Together, these technologies produce astonishingly levels of noise suppression and interference cancellation never
achieved before. The result of processing different types of noise by different intelligent adaptive processes
(as opposed to single juncture cancellation) results in superior delivery of the desired voice and greater suppression
of unwanted noise. The suppression of unwanted noise sources in the Digital Microphone Array is on average 24 dB and
as high as 30+dB.
So, for the first time, this advanced microphone array technology makes accurate dictation a possibly at the PC
without wearing a headset microphone. The followings are 2 examples of the substantial Signal-to-Noise-Ratio improvement
our MicArray technology could provide under extreme environments.
|