BE2M31ZRE - spk verif

Ceska verze teto stranky

Back to main page | list of seminars

BE2M31ZRE seminar
GMM-based speaker identification

REQUIRED HOMEWORK: (5 points)

Cepstral space of two speakers
- Create training data of selected speaker SPK1 (optimally for your voice), i.e. for signals S0-S9 compute matrix of MFCC cepstral coefficients for all available short-time frames (e.g. "T2ExxxS0.CS0 - T2ExxxS9.CS0").
- Use signals recorded by you at the first seminar:
  - Your signals resampled to 16 kHz (files *.CS0) are in FEL classroom available in the direcotry "K:\VYUKA\ZRE\signaly\zreratdb".
  - For your work outside of FEL network the database is avaialable at http:noel.feld.cvut.cz/vyu/data/zreratdb. In this location, there are both the original signals sampled by 48 kHz (*.BIN files) same as 16 kHz resample data (*.CS0 files).
  - For your work outside of FEL network you can also download the following archive zrerat_blocken_2024_cs0.zip.
- Use the function vmfcc.m (and further called functions , melbf.m, mel.m, melinv.m), and use the following setup of MFCC computation:
  - apply preemphasis to processed signal with the coefficient m = 0.97,
  - frame length 25 ms, frame shift 5 ms, Hamming window,
  - number of filter-bank bands M=30; frequency range fmin=100Hz, fmax=6500Hz,
  - number of cepstral coefficients cp=12 (i.e. 12 + 1).
  - apply VAD based on power in dB with fixed thresholding with the level approx. at 40-50% of dynamics (for application of VAD use functions speechpwr.m and thr_fixed.m used already at previous seminars),
- Computed cepstra without c[0] join into one matrix called cspk1_train.
- Repeat it for the same speaker SPK1 but with the sentences Z0-Z9 (e.g. "T2ExxxZ0.CS0 - T2ExxxZ9.CS0") and save computed MFCCs without c[0] into matrix called cspk1_test.
- Finally, repeat it for another speaker SPK2 and sentences Z0-Z9 (e.g. "T2EyyyZ0.CS0 - T2EyyyZ9.CS0") and save computed MFCCs without c[0] into matrix called cspk2_test.
Results: Observe obtained results in the following 3 Figures:
- Figure 1: dependecies c[1]-c[2], c[3]-c[4], c[5]-c[6], c[7]-c[8] for matrix cspk1_train, i.e. MFCC distribution for utterances S0 - S9 pronunced by speaker SPK1.
- Figure 2: dependecies c[1]-c[2], c[3]-c[4], c[5]-c[6], c[7]-c[8] for matrix cspk1_test, i.e. MFCC distribution for utterances Z0 - Z9 pronunced by speaker SPK1.
- Figure 3: dependecies c[1]-c[2], c[3]-c[4], c[5]-c[6], c[7]-c[8] for matrix cspk2_test, i.e. MFCC distribution for utterances Z0 - Z9 pronunced by speaker SPK2.
- Deliver the solution via WEB interface at Moodle FEL, see GMM-based speaker recognition - home preparation. Delivery due is till Tuesday 30 Apr 2024, 9:00 AM.

SEMINAR GUIDELINES:

GMM-based speaker verification
- Compute GMM models of cepstrum variability for SPK1 using the function fitgmdist and training data saved in cspk1_train. Use number of mixtures m=4-6 and full covarinace matrix (option 'CovType','Full').
- Compute log-likelihoods for all available short-time frames of analyzed 2 speakers, i.e. use as observations in MATLAB function pdf cepstra saved in matrices cspk1_test and cspk2_test.
- Results: Display:
  - all short-time values of emitted logarithmic likelihood for all test utterances for SPK1, i.e. correct speaker,
  - add to the same figure all short-time values of emitted logarithmic likelihood for test utterances for SPK2, i.e. incorrect speaker,
  - compute the score on the basis of mean values of emitted logarithmic likelihood for both speakers SPK1 and SPK2.
  - Try to define empirically verification threshold to accept correct speaker and reject the other one.
- Repeat the verification with other speakers.
ON-LINE speaker verification based on GMM
- Realize the verification of on-line recorded utterance.
- recommended outputs: ACCEPTANCE/REJECTION of supposed identity + computed MEAN distance + verification threshold.

BE2M31ZRE seminar GMM-based speaker identification

BE2M31ZRE seminar
GMM-based speaker identification