Bandwidth Extension of Speech Using Perceptual Criteria (Synthesis Lectures on Algorithms and Software in Engineering) (Paperback)
Bandwidth extension of speech is used in the International Telecommunication Union G.729.1 standard in which the narrowband bitstream is combined with quantized high-band parameters. Although this system produces high-quality wideband speech, the additional bits used to represent the high band can be further reduced. In addition to the algorithm used in the G.729.1 standard, bandwidth extension methods based on spectrum prediction have also been proposed. Although these algorithms do not require additional bits, they perform poorly when the correlation between the low and the high band is weak. In this book, two wideband speech coding algorithms that rely on bandwidth extension are developed. The algorithms operate as wrappers around existing narrowband compression schemes. More specifically, in these algorithms, the low band is encoded using an existing toll-quality narrowband system, whereas the high band is generated using the proposed extension techniques. The first method relies only on transmitted high-band information to generate the wideband speech. The second algorithm uses a constrained minimum mean square error estimator that combines transmitted high-band envelope information with a predictive scheme driven by narrowband features. Both algorithms make use of novel perceptual models based on loudness that determine optimum quantization strategies for wideband recovery and synthesis. Objective and subjective evaluations reveal that the proposed system performs at a lower average bit rate while improving speech quality when compared to other similar algorithms.
Visar Berisha is an Assistant Professor with a joint appointment in the Department of Speech and Hearing Science and the School of Electrical, Computer, and Energy Engineering at Arizona State University. His research interests fall mainly in the field of speech and audio perception, signal processing, and machine learning. He obtained his Ph.D. in Electrical Engineering at Arizona State University. Following his degree, he worked at MIT Lincoln Laboratory and Raytheon Co. as research engineerSteven Sandoval received a B.S. Electrical Engineering in 2007 and his M.S. Electrical Engineering in 2010 from the Klipsch School of Electrical and Computer Engineering, New Mexico State University, Las Cruces, NM. He previously worked for five years as a system analyst for a defense contractor. He is presently working on his Ph.D. degree in electrical engineering in the Ira A. Fulton Schools of Engineering, SenSIP Center, Arizona State University, Tempe, AZ. His research interests include signal processing, specifically audio and speech processing, timefrequency analysis, machine learning, and robotics.Julie Liss is a Professor in the Department of Speech and Hearing Science. She is Director of the Motor Speech Disorders Lab where she conducts research on the perception of degraded speech. Her interest is in modeling the cognitive-perceptual strategies involved in deciphering degraded speech to further elucidate the construct of speech intelligibility. She and Dr. Berisha work collaboratively to apply signal processing methodology to issues intelligibility in clinical populations