BE2M31ZRE - seminars - TASK No. 3
Fundamental frequency (pitch) and its estimation
Tasks to do:
Estimation of fundamental frequency (pitch) for one signal frame.
Estimate the fundamental frequency (pitch of the voice) for the following voiced frame of speech signal muz1-AA-frame.CS0
(raw data without header, fs=16000 Hz, for loading into MATLAB
use function loadbin.m). Make the
estimation in the following steps:
compute and observe autocorrelation of the frame (biased
estimation using xcorr)
on the basis of maximum location compute fundamental period in
samples (L_0), fundamental period in seconds (T_0) and the value of
pitch, i.e. the fundamental frequency in Hz (f_0),
fcn max in MATLAB returns maximum and also its
position as the second output parameter,
restrict the looking for the maximum accroding to the typical
range of human voice pitch which is 60-260 Hz.
Result: pro for one voiced frame muz1-AA-frame.CS0 observe:
- time waveform fo the singal commonly with the
computed autocorrelation function - boundaries for the possible maximum location in
computed autocorrelation function,
- on the basis of detected maximum and its position, compute
the values of L_0, T_0 a f_0.
Repeat also for one unvoiced speech
frame muz1-SS-frame.CS0,
zena3-SS-frame.CS0 and
observe mainly the differences in time waveform same as in the
estimation of autocorrealtion function for unvoiced frame with noise
character.
Pitch estimation in longer utternace
Implement estimation of fundamental frequency within particular
short-time frames for the whole utterance
(similarly as for the power computation realized within Task
No.2). Compute pitch for the each short-time frame of the analyzed
signal.
The length of the frame should be 32 ms, the frame step should be
16 ms (i.e. work with 50% overlapping of analyzed short-time
frames).
Result: Signal waveform and
computed value of the pitch (f_0) for all available frames in
the whole utteranceSA176S01.CS0 (raw data,
fs=16000 Hz) same as for your own on-line recorded signal.
The detection of voiced frames during the pitch estimation
Try to detect unvoice frames on the basis of ZCR. Fundamental
frequency for uvoiced frame should be set to f_0 = 10 Hz.
Detect also speech pause frames on the basis of energy computation
(for the frames with low energy). Set the value of f_0 = 0 Hz for
non-speech frame.
Result: Signal waveform and
computed f_0 for voiced frames only
in on-line recorded signal
(or for analyzed utterance SA176S01.CS0).
POSSIBLE IMPROVEMENT - think about the smooting of pitch
estimation using median filtering (function
med.m) or other postprocessing.
your own utternaces from database zreratdb available as *.CS0 files (resampled to fs = 16000 Hz), for efficient download are your utterances available in the following archive zrerat_blocken_2025_cs0.zip
NOTE - Use function loadbin.m to load raw-data (without header) to MATLAB, to load the data in wav-format use standard MATLAB function audioread.
Pitch of your voice.
Result: Compute the average
value of your voice pitch (f_0) computed from voiced frames only.
Try to estimate the pitch also in Praat, and Wavesurfer.