Date of Award

12-2012

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Legacy Department

Computer Engineering

Advisor

Gowdy, John N.

Committee Member

Schalkoff , Robert

Committee Member

Birchfield , Stanley

Committee Member

Park , Chanseok

Abstract

This dissertation focuses on determining specific vowel phonemes which work best for speaker identification and speaker verification, and also developing new algorithms to improve speaker identification accuracy. Results from the first part of our research indicate that the vowels /i/, /E/ and /u/ were the ones having the highest recognition scores for both the Gaussian mixture model (GMM) and vector quantization (VQ) methods (at most one classification error). For VQ, /i/, /I/, /e/, /E/ and /@/ had no classification errors. Persons speaking /E/, /o/ and /u/ have been verified well by both GMM and VQ methods in our experiments. For VQ, the verification results are consistent with the identification results since the same five phonemes performed the best and had less than one verification error.
After determining several ideal vowel phonemes, we developed new algorithms for improved speaker identification accuracy. Phoneme weighting methods (which performed classification based on the ideal phonemes we found from the previous experiments) and other weighting methods based on energy were used. The energy weighting methods performed better than the phoneme weighting methods in our experiments. The first energy weighting method ignores the speech frames which have relatively small magnitude. Instead of ignoring the frames which have relatively small magnitude, the second method emphasizes speech frames which have relatively large magnitude. The third method and the adjusted third method are a combination of the previous two methods. The error reduction rate was 7.9% after applying the first method relative to a baseline system (which used Mel frequency cepstral coefficients (MFCCs) as feature and VQ as classifier). After applying the second method and the adjusted third method, the error reduction rate was 28.9% relative to a baseline system.

Share

COinS