BE2M31ZRE seminar
GMM-based speaker identification
REQUIRED HOMEWORK: (5 points)
- Cepstral space of two speakers
- Create training data of selected speaker SPK1 (optimally for your voice), i.e. for signals S0-S9 compute matrix of MFCC cepstral coefficients for all available short-time frames (e.g. "T2ExxxS0.CS0 - T2ExxxS9.CS0").
- Use signals recorded by you at the first seminar:
- Your signals resampled to 16 kHz (files *.CS0) are in FEL
classroom available in the direcotry
"K:\VYUKA\ZRE\signaly\zreratdb".
- For your work outside of FEL network the database is avaialable
at http:noel.feld.cvut.cz/vyu/data/zreratdb. In
this location, there are both the original signals sampled by
48 kHz (*.BIN files) same as 16 kHz resample data (*.CS0 files).
- For your work outside of FEL network you can also download the
following archive zrerat_blocken_2024_cs0.zip.
- Use the function vmfcc.m
(and further called functions
, melbf.m, mel.m, melinv.m),
and use the following setup of MFCC computation:
- apply preemphasis to processed signal with the coefficient m = 0.97,
- frame length 25 ms, frame shift 5 ms, Hamming window,
- number of filter-bank bands M=30; frequency range fmin=100Hz, fmax=6500Hz,
- number of cepstral coefficients cp=12 (i.e. 12 + 1).
- apply VAD based on power in
dB with fixed thresholding with the level
approx. at 40-50% of dynamics (for application of VAD use
functions speechpwr.m
and thr_fixed.m
used already at previous seminars),
- Computed cepstra without c[0] join into one matrix called cspk1_train.
- Repeat it for the same speaker SPK1 but with the sentences Z0-Z9 (e.g. "T2ExxxZ0.CS0 - T2ExxxZ9.CS0") and save computed MFCCs without c[0] into matrix called cspk1_test.
- Finally, repeat it for another speaker SPK2 and sentences Z0-Z9 (e.g. "T2EyyyZ0.CS0 - T2EyyyZ9.CS0") and save computed MFCCs without c[0] into matrix called cspk2_test.
- Results:
Observe obtained results in the following 3 Figures:
- Figure 1: dependecies c[1]-c[2], c[3]-c[4], c[5]-c[6], c[7]-c[8] for matrix cspk1_train, i.e. MFCC distribution for utterances S0 - S9 pronunced by speaker SPK1.
- Figure 2: dependecies c[1]-c[2], c[3]-c[4], c[5]-c[6], c[7]-c[8] for matrix cspk1_test, i.e. MFCC distribution for utterances Z0 - Z9 pronunced by speaker SPK1.
- Figure 3: dependecies c[1]-c[2], c[3]-c[4], c[5]-c[6], c[7]-c[8] for matrix cspk2_test, i.e. MFCC distribution for utterances Z0 - Z9 pronunced by speaker SPK2.
- Deliver the solution via WEB interface at Moodle FEL, see GMM-based speaker recognition - home preparation. Delivery due is till Tuesday 30 Apr 2024, 9:00 AM.
SEMINAR GUIDELINES:
- GMM-based speaker verification
- Compute GMM models of cepstrum variability for SPK1 using the function fitgmdist and training data saved in cspk1_train. Use number of mixtures m=4-6 and full covarinace matrix (option 'CovType','Full').
- Compute log-likelihoods for all available short-time frames of analyzed 2 speakers, i.e. use as observations in MATLAB function pdf cepstra saved in matrices cspk1_test and cspk2_test.
- Results: Display:
- all short-time values of emitted logarithmic likelihood for all test
utterances for SPK1, i.e. correct speaker,
- add to the same figure all short-time values of emitted logarithmic likelihood for test
utterances for SPK2, i.e. incorrect speaker,
- compute the score on the basis of mean values of emitted logarithmic likelihood for both speakers SPK1 and SPK2.
- Try to define empirically verification threshold to accept correct speaker and reject the other one.
- Repeat the verification with other speakers.
- ON-LINE speaker verification based on GMM
- Realize the verification of on-line recorded utterance.
- recommended outputs: ACCEPTANCE/REJECTION of supposed identity + computed MEAN distance + verification threshold.