代码先锋网 代码片段及技术文章聚合

kaldi中fbank特征提取详解(结合源码,深度剖析)

技术标签: kaldi学习  语音识别  人工智能

kaldi中相关函数在 src/feat 目录下

1. feature-window

1.1 feature-window.h 中默认值

struct FrameExtractionOptions {
	... ...
	FrameExtractionOptions():
	samp_freq(16000),
	frame_shift_ms(10.0),
	frame_length_ms(25.0),
	dither(1.0),
	preemph_coeff(0.97),
	remove_dc_offset(true),
	window_type("povey"),
	round_to_power_of_two(true),
	blackman_coeff(0.42),
	snip_edges(true),
	allow_downsample(false),
	allow_upsample(false),
	max_feature_vectors(-1)
	... ...
}

这些参数,可以在conf中设置,若不设置,则为默认值。

函数中参数名及默认值conf 中设置示例意义
samp_freq(16000)–sample-frequency=16000音频处理时的采样率,单位是 Hz
frame_shift_ms(10.0)–frame-shift=10窗移时间,单位是 ms
frame_length_ms(25.0)–frame-length=25窗长时间,单位是 ms
dither(1.0)–dither=1.0每帧添加的随机噪声系数,训练时用,相当于增加扰动,但会增加特征提取时间,关闭则设置为 0.0,默认为 1.0
preemph_coeff(0.97)–preemphasis-coefficient=0.97预加重系数
remove_dc_offset(true)–remove-dc-offset=true每帧数据均值移到0,若要保持数据原始特性,则设置成 false
window_type(“povey”)–window-type=povey窗函数,包含 hamming,hanning,povery,rectangular,sine,blackmann,其中povery是Dan自己设计的
round_to_power_of_two(true)–round-to-power-of-two=trueFFT变换时,用0填充至2幂次
blackman_coeff(0.42)–blackman-coeff=0.42窗函数用blackman时的相关系数
snip_edges(true)–snip-edges=true若为true,则只输出完全适合文件的帧来处理结束效果,帧数取决于帧长度。若为false,则帧数仅取决于帧移动,我们将在末尾反映数据。
allow_downsample(false)–allow-downsample=false是否降采样。若true时,表示允许降采样,将根据 sample-frequency 设置的参数进行降采样
allow_upsample(false)–allow-upsample=false是否上采用
max_feature_vectors(-1)–max-feature-vectors=-1内存优化。若大于0,则定期删除特征向量,以便仅保留此数量的最新特征向量。主要是为了防止 out of memory 这种报错,导致特征提取时异常结束

1.2 feature-window.cc 中相关函数

1.2.1 ExtractWindow

主要是根据采样率,是否降采样等,计算窗长,窗移等,而后调用 ProcessWindow 对每一帧进行操作。

1.2.2 ProcessWindow

对每一帧进行细致操作,具体如下

1.2.2.1 dither

该值默认为1.0,表示对每一帧数据添加随机高斯的系数。可理解为数据扰动,但是提取特征时,会花更多时间用于产生高斯随机数。若数据已做过比较充分的数据扩增,可以将其设置为0.0。
其公式为
x i = x i + G a u s s ∗ d i t h e r x_{i} = x_{i} + Gauss * dither xi=xi+Gaussdither
具体函数如下

void ProcessWindow(...){
	... ...
	if (opts.dither != 0.0)
		Dither(window, opts.dither);
	... ...
}
void Dither(VectorBase<BaseFloat> *waveform, BaseFloat dither_value) {
	if (dither_value == 0.0)
		return;
	int32 dim = waveform->Dim();
	BaseFloat *data = waveform->Data();
	RandomState rstate;
	for (int32 i = 0; i < dim; i++)
		data[i] += RandGauss(&rstate) * dither_value;
}

1.2.2.2 remove_dc_offset

该值默认为true,表示是否对每帧的数据点进行平移,使其均值为0。若录音设备电压不稳定,可能导致录的音频电位漂移。正常设备,在时间窗内,数据均值是接近0的数。true或false,两者会有略微差别,但差别不是很大,个人经验,对于fbank而言,差别在±1以内。
其公式为
x i = x i − x ‾ x_{i} = x_{i} - \overline{x} xi=xix

具体函数如下

void ProcessWindow(...){
	... ...
	if (opts.remove_dc_offset)
		window->Add(-window->Sum() / frame_length);
	... ...
}

1.2.2.3 log_energy_pre_window

该值默认是NULL,表示对窗内数据点是否做log操作,无相关外部输入。无修改kaldi源码的情况下,不会进行操作。
具体函数如下

void ProcessWindow(...){
	... ...
	if (log_energy_pre_window != NULL) {
		BaseFloat energy = std::max<BaseFloat>(VecVec(*window, *window),
					std::numeric_limits<float>::epsilon());
		*log_energy_pre_window = Log(energy);
	}
	... ...
}

1.2.2.4 preemph_coeff

该值默认值为0.97,表示预加重权重。
注:其预加重方式对第一帧也做了特殊处理,其公式为
x i = { x i − α ∗ x i i = 0 x i − α ∗ x i − 1 i > = 1 x_{i}=\left\{ \begin{aligned} &x_{i} - \alpha * x_{i} & &i=0\\ &x_{i} - \alpha * x_{i-1} & &i>=1 \end{aligned} \right. xi={xiαxixiαxi1i=0i>=1

具体函数如下

void ProcessWindow(...){
	... ...
	if (opts.preemph_coeff != 0.0)
		Preemphasize(window, opts.preemph_coeff);
	... ...
}
... ...
void Preemphasize(VectorBase<BaseFloat> *waveform, BaseFloat preemph_coeff) {
	if (preemph_coeff == 0.0) return;
	KALDI_ASSERT(preemph_coeff >= 0.0 && preemph_coeff <= 1.0);
	for (int32 i = waveform->Dim()-1; i > 0; i--)
		(*waveform)(i) -= preemph_coeff * (*waveform)(i-1);
	(*waveform)(0) -= preemph_coeff * (*waveform)(0);
 }

1.2.2.5 window->MulElements(window_function.window)

MulElements函数是将时域数据一边进行FFT变换一边乘以窗函数(比较高级,一般FFT两层循环,他一层循环就搞定了!!!)。
具体代码在src/matrix/kaldi-matrix.cc和src/matrix/cblas-wrappers.h中,
其函数如下(其中mul_elements是在src/matrix/cblas-wrappers.h中):

//----------------------------------------
// in src/matrix/kaldi-matrix.cc
template<typename Real>
void MatrixBase<Real>::MulElements(const MatrixBase<Real> &a) {
	KALDI_ASSERT(a.NumRows() == num_rows_ && a.NumCols() == num_cols_);

	if (num_cols_ == stride_ && num_cols_ == a.stride_) {
		mul_elements(num_rows_ * num_cols_, a.data_, data_);
	} else {
		MatrixIndexT a_stride = a.stride_, stride = stride_;
		Real *data = data_, *a_data = a.data_;
		for (MatrixIndexT i = 0; i < num_rows_; i++) {
			mul_elements(num_cols_, a_data, data);
			a_data += a_stride;
			data += stride;
		}
	}
 }
//-----------------------------------------------
// in src/matrix/cblas-wrappers.h
inline void mul_elements(
			const MatrixIndexT dim,
			const float *a,
			float *b) { // does b *= a, elementwise.
	float c1, c2, c3, c4;
	MatrixIndexT i;
	for (i = 0; i + 4 <= dim; i += 4) {
		c1 = a[i] * b[i];
		c2 = a[i+1] * b[i+1];
		c3 = a[i+2] * b[i+2];
		c4 = a[i+3] * b[i+3];
		b[i] = c1;
		b[i+1] = c2;
		b[i+2] = c3;
		b[i+3] = c4;
	}
	for (; i < dim; i++)
		b[i] *= a[i];
}

kaldi用到的窗函数及公式如下:

窗函数名称公式
hanning w i = 0.5 − 0.5 ∗ c o s ( 2 ∗ π ∗ i / ( N − 1 ) ) , 0 < = i < N w_i=0.5-0.5*cos(2*\pi*i/(N-1)) ,0<=i<N wi=0.50.5cos(2πi/(N1)),0<=i<N
sine w i = s i n ( π ∗ i / ( N − 1 ) ) , 0 < = i < N w_i=sin(\pi*i/(N-1)) ,0<=i<N wi=sin(πi/(N1)),0<=i<N
hamming w i = 0.54 − 0.46 ∗ c o s ( 2 ∗ π ∗ i / ( N − 1 ) ) , 0 < = i < N w_i=0.54-0.46*cos(2*\pi*i/(N-1)) ,0<=i<N wi=0.540.46cos(2πi/(N1)),0<=i<N
povery w i = ( 0.5 − 0.5 ∗ c o s ( 2 ∗ π ∗ i / ( N − 1 ) ) ) 0.85 , 0 < = i < N w_i=(0.5-0.5*cos(2*\pi*i/(N-1)) )^{0.85},0<=i<N wi=(0.50.5cos(2πi/(N1)))0.85,0<=i<N
rectangular w i = 1 w_i=1 wi=1
blackman w i = b l a c k m a n _ c o e f f − 0.5 ∗ c o s ( 2 ∗ π ∗ i / ( N − 1 ) ) + ( 0.5 − b l a c k m a n _ c o e f f ) ∗ c o s ( 4 ∗ π ∗ i / ( N − 1 ) ) w_i=blackman\_coeff - 0.5*cos(2*\pi*i/(N-1)) + (0.5 - blackman\_coeff)*cos(4*\pi*i/(N-1)) wi=blackman_coeff0.5cos(2πi/(N1))+(0.5blackman_coeff)cos(4πi/(N1))

具体函数如下

void ProcessWindow(...,
				   const FeatureWindowFunction &window_function,
				   ...){
	... ...
	window->MulElements(window_function.window)
}
... ...
FeatureWindowFunction::FeatureWindowFunction(const FrameExtractionOptions &opts) {
	int32 frame_length = opts.WindowSize();
	KALDI_ASSERT(frame_length > 0);
	window.Resize(frame_length);
	double a = M_2PI / (frame_length-1);
	for (int32 i = 0; i < frame_length; i++) {
		double i_fl = static_cast<double>(i);
		if (opts.window_type == "hanning") {
			window(i) = 0.5  - 0.5*cos(a * i_fl);
		} else if (opts.window_type == "sine") {
			// when you are checking ws wikipedia, please
			// note that 0.5 * a = M_PI/(frame_length-1)
			window(i) = sin(0.5 * a * i_fl);
		} else if (opts.window_type == "hamming") {
			window(i) = 0.54 - 0.46*cos(a * i_fl);
		} else if (opts.window_type == "povey") {  // like hamming but goes to zero at edges.
			window(i) = pow(0.5 - 0.5*cos(a * i_fl), 0.85);
		} else if (opts.window_type == "rectangular") {
			window(i) = 1.0;
		} else if (opts.window_type == "blackman") {
			window(i) = opts.blackman_coeff - 0.5*cos(a * i_fl) +
			(0.5 - opts.blackman_coeff) * cos(2 * a * i_fl);
		} else {
			KALDI_ERR << "Invalid window type " << opts.window_type;
		}
	}
}

2. mel-computations

2.1 mel-computations.h

2.1.1 默认值

struct MelBanksOptions {
	... ...
	explicit MelBanksOptions(int num_bins = 25)
		: num_bins(num_bins), low_freq(20), high_freq(0), vtln_low(100),
		  vtln_high(-500), debug_mel(false), htk_mode(false) {}
	... ...
函数中参数名及默认值conf 中设置示例意义
num_bins = 25–num-mel-bins=25梅尔滤波器(倒三角)个数
low_freq(20)–low-freq=20滤波器最低截止频率
high_freq(0)–high-freq=0滤波器最高截止频率,注:若该值小于等于0,则 截止频率 = 采样率 - high_freq
vtln_low(100)–vtln-low=100分段线性VTLN翘曲函数的低拐点
vtln_high(-500)–vtln-high=-500分段线性VTLN翘曲函数的高拐点,注:若该值小于0,则类似high_freq
debug_mel(false)–debug-mel=false打印mel bin计算的调试信息
htk_mode(false)无外部接口————

注:0 <= low_freq < vltn_low < vltn_high < high_freq <= samp_freq

2.1.2 fbank相关类的定义

InverseMelScale表示Mel频谱到Hz频谱的转换
MelScale表示Hz频谱到Mel频谱的转换
MelBanks表示具体fbank提取过程

class MelBanks {
	public:
		static inline BaseFloat InverseMelScale(BaseFloat mel_freq) {
			return 700.0f * (expf (mel_freq / 1127.0f) - 1.0f);
		}

		static inline BaseFloat MelScale(BaseFloat freq) {
			return 1127.0f * logf (1.0f + freq / 700.0f);
		}
		
		static BaseFloat VtlnWarpFreq(BaseFloat vtln_low_cutoff,
		              BaseFloat vtln_high_cutoff,  // discontinuities in warp func
		              BaseFloat low_freq,
		              BaseFloat high_freq,  // upper+lower frequency cutoffs in
		              // the mel computation
		              BaseFloat vtln_warp_factor,
		              BaseFloat freq);
		
		static BaseFloat VtlnWarpMelFreq(BaseFloat vtln_low_cutoff,
		                 BaseFloat vtln_high_cutoff,
		                 BaseFloat low_freq,
		                 BaseFloat high_freq,
		                 BaseFloat vtln_warp_factor,
		                 BaseFloat mel_freq);
		MelBanks(const MelBanksOptions &opts,
		const FrameExtractionOptions &frame_opts,
		BaseFloat vtln_warp_factor);
		
		/// Compute Mel energies (note: not log enerties).
		/// At input, "fft_energies" contains the FFT energies (not log).
		void Compute(const VectorBase<BaseFloat> &fft_energies,
				VectorBase<BaseFloat> *mel_energies_out) const;
		
		int32 NumBins() const { return bins_.size(); }
		
		// returns vector of central freq of each bin; needed by plp code.
		const Vector<BaseFloat> &GetCenterFreqs() const { return center_freqs_; }
		
		const std::vector<std::pair<int32, Vector<BaseFloat> > >& GetBins() const {
			return bins_;
		}
		// Copy constructor
		MelBanks(const MelBanks &other);
	private:
		// Disallow assignment
		MelBanks &operator = (const MelBanks &other);
		
		// center frequencies of bins, numbered from 0 ... num_bins-1.
		// Needed by GetCenterFreqs().
		Vector<BaseFloat> center_freqs_;
		
		// the "bins_" vector is a vector, one for each bin, of a pair:
		// (the first nonzero fft-bin), (the vector of weights).
		std::vector<std::pair<int32, Vector<BaseFloat> > > bins_;
		
		bool debug_;
		bool htk_mode_;
};

2.2 mel-computations.cc中相关函数

2.2.1对部分参数赋值

相关参数有 low_freq, high_freq, num_fft_bins, fft_bin_width, mel_low_freq, mel_high_freq, mel_freq_delta, vtln_low, vtln_high,代码相对简单,不再累述
具体函数如下

MelBanks::MelBanks(const MelBanksOptions &opts,
                     const FrameExtractionOptions &frame_opts,
                     BaseFloat vtln_warp_factor):
  			htk_mode_(opts.htk_mode) {
 	int32 num_bins = opts.num_bins;
    if (num_bins < 3) KALDI_ERR << "Must have at least 3 mel bins";
    BaseFloat sample_freq = frame_opts.samp_freq;
    int32 window_length_padded = frame_opts.PaddedWindowSize();
    KALDI_ASSERT(window_length_padded % 2 == 0);
    int32 num_fft_bins = window_length_padded / 2;
    BaseFloat nyquist = 0.5 * sample_freq;

    BaseFloat low_freq = opts.low_freq, high_freq;
    if (opts.high_freq > 0.0)
      high_freq = opts.high_freq;
    else
      high_freq = nyquist + opts.high_freq;
 
    if (low_freq < 0.0 || low_freq >= nyquist
        || high_freq <= 0.0 || high_freq > nyquist
        || high_freq <= low_freq)
      KALDI_ERR << "Bad values in options: low-freq " << low_freq
                << " and high-freq " << high_freq << " vs. nyquist "
                << nyquist;
 
    BaseFloat fft_bin_width = sample_freq / window_length_padded;
 		// fft-bin width [think of it as Nyquist-freq / half-window-length]
 
    BaseFloat mel_low_freq = MelScale(low_freq);
    BaseFloat mel_high_freq = MelScale(high_freq);
 
    debug_ = opts.debug_mel;
 
    // divide by num_bins+1 in next line because of end-effects where the bins
    // spread out to the sides.
    BaseFloat mel_freq_delta = (mel_high_freq - mel_low_freq) / (num_bins+1);
 
    BaseFloat vtln_low = opts.vtln_low,
        vtln_high = opts.vtln_high;
    if (vtln_high < 0.0) {
      vtln_high += nyquist;
    }
    ... ...
}

2.2.2 vtln_warp_factor

该值通过外部传参输入,一般是1.0,表示不做 vtln 相关操作,若做操作,会影响滤波器的权重分布,
其公式如下:
l = v t l n _ l o w _ c u t o f f ∗ m a x ( 1.0 , v t l n _ w a r p _ f a c t o r ) h = v t l n _ h i g h _ c u t o f f ∗ m i n ( 1.0 , v t l n _ w a r p _ f a c t o r ) s c a l e _ l e f t = ( l − l o w _ f r e q ) / ( l − l o w _ f r e q ) s c a l e _ r i g h t = ( h i g h f r e q − h ) / ( h i g h f r e q − h ) \begin{aligned} l & = vtln\_low\_cutoff * max(1.0, vtln\_warp\_factor) \\ h & = vtln\_high\_cutoff * min(1.0, vtln\_warp\_factor)\\ scale\_left & = (l - low\_freq) / (l - low\_freq)\\ scale\_right & = (high_freq - h) / (high_freq - h)\\ \end{aligned} lhscale_leftscale_right=vtln_low_cutoffmax(1.0,vtln_warp_factor)=vtln_high_cutoffmin(1.0,vtln_warp_factor)=(llow_freq)/(llow_freq)=(highfreqh)/(highfreqh)

f r e q _ o u t = { l o w _ f r e q + s c a l e _ l e f t ∗ ( f r e q − l o w _ f r e q ) f r e q < l s c a l e ∗ f r e q l < = f r e q < h h i g h _ f r e q + s c a l e _ r i g h t ∗ ( f r e q − h i g h _ f r e q ) f r e q > = h freq\_out=\left\{ \begin{aligned} &low\_freq + scale\_left * (freq - low\_freq) & &freq < l \\ &scale * freq & &l <= freq < h\\ &high\_freq + scale\_right * (freq - high\_freq) & &freq >=h \end{aligned} \right. freq_out=low_freq+scale_left(freqlow_freq)scalefreqhigh_freq+scale_right(freqhigh_freq)freq<ll<=freq<hfreq>=h
具体函数如下

MelBanks::MelBanks(...):
  			htk_mode_(opts.htk_mode) {
	... ...
	for (int32 bin = 0; bin < num_bins; bin++) {
		... ...
		if (vtln_warp_factor != 1.0) {
	   			left_mel = VtlnWarpMelFreq(vtln_low, vtln_high, low_freq, high_freq,
	                              vtln_warp_factor, left_mel);
	   			center_mel = VtlnWarpMelFreq(vtln_low, vtln_high, low_freq, high_freq,
	                              vtln_warp_factor, center_mel);
	   			right_mel = VtlnWarpMelFreq(vtln_low, vtln_high, low_freq, high_freq,
	                              vtln_warp_factor, right_mel);
		}
		... ...
	}
	... ...
}
... ...
BaseFloat MelBanks::VtlnWarpMelFreq(...) {
	return MelScale(VtlnWarpFreq(vtln_low_cutoff, vtln_high_cutoff,
								low_freq, high_freq,
								vtln_warp_factor, InverseMelScale(mel_freq)));
}
... ...
BaseFloat MelBanks::VtlnWarpFreq(...){
	if (freq < low_freq || freq > high_freq) return freq;  // in case this gets called
	// for out-of-range frequencies, just return the freq.
	KALDI_ASSERT(vtln_low_cutoff > low_freq &&
		"be sure to set the --vtln-low option higher than --low-freq");
KALDI_ASSERT(vtln_high_cutoff < high_freq &&
	    "be sure to set the --vtln-high option lower than --high-freq [or negative]");
	BaseFloat one = 1.0;
    BaseFloat l = vtln_low_cutoff * std::max(one, vtln_warp_factor);
    BaseFloat h = vtln_high_cutoff * std::min(one, vtln_warp_factor);
    BaseFloat scale = 1.0 / vtln_warp_factor;
    BaseFloat Fl = scale * l;  // F(l);
    BaseFloat Fh = scale * h;  // F(h);
    KALDI_ASSERT(l > low_freq && h < high_freq);
    // slope of left part of the 3-piece linear function
    BaseFloat scale_left = (Fl - low_freq) / (l - low_freq);
    // [slope of center part is just "scale"]

    // slope of right part of the 3-piece linear function
    BaseFloat scale_right = (high_freq - Fh) / (high_freq - h);
    
    if (freq < l) {
    	return low_freq + scale_left * (freq - low_freq);
	} else if (freq < h) {
		return scale * freq;
	} else {  // freq >= h
		return high_freq + scale_right * (freq - high_freq);
	}
}

2.2.3 正常 mel-bins

若vtln_warp_factor==1.0,则不做mel权重的变形操作,为正常mel倒三角权重。
其公式为:
B i n m [ k ] = { 0 f k < f m + 1 ( f k − f m − 1 ) / ( f m − f m − 1 ) f m − 1 < f k < f m ( f m + 1 − f k ) / ( f m + 1 − f m ) f m < f k < f m + 1 0 f k > f m + 1 Bin_m[k]=\left\{ \begin{aligned} &0 & &f_k <f_{m+1}\\ &(f_k -f_{m-1})/(f_m-f_{m-1}) & &f_{m-1}<f_k<f_m\\ &(f_{m+1}-f_k)/(f_{m+1}-f_m) & &f_m<f_k<f_{m+1}\\ &0 & &f_k > f_{m+1} \end{aligned} \right. Binm[k]=0(fkfm1)/(fmfm1)(fm+1fk)/(fm+1fm)0fk<fm+1fm1<fk<fmfm<fk<fm+1fk>fm+1
具体函数如下

MelBanks::MelBanks(...):
  			htk_mode_(opts.htk_mode) {
	... ...
	for (int32 bin = 0; bin < num_bins; bin++) {
		BaseFloat left_mel = mel_low_freq + bin * mel_freq_delta,
		center_mel = mel_low_freq + (bin + 1) * mel_freq_delta,
		right_mel = mel_low_freq + (bin + 2) * mel_freq_delta;
		if (vtln_warp_factor != 1.0) {
	   			... ...
		}
		center_freqs_(bin) = InverseMelScale(center_mel);
		// this_bin will be a vector of coefficients that is only
		// nonzero where this mel bin is active.
		Vector<BaseFloat> this_bin(num_fft_bins);
	     int32 first_index = -1, last_index = -1;
	     for (int32 i = 0; i < num_fft_bins; i++) {
	     	BaseFloat freq = (fft_bin_width * i); 
	     	// Center frequency of this fft bin.
			BaseFloat mel = MelScale(freq);
			if (mel > left_mel && mel < right_mel) {
				BaseFloat weight;
				if (mel <= center_mel)
					weight = (mel - left_mel) / (center_mel - left_mel);
				else
					weight = (right_mel-mel) / (right_mel-center_mel);
				this_bin(i) = weight;
				if (first_index == -1)
					first_index = i;
				last_index = i;
			}
		}
		KALDI_ASSERT(first_index != -1 && last_index >= first_index
				&& "You may have set --num-mel-bins too large.");
		
		bins_[bin].first = first_index;
		int32 size = last_index + 1 - first_index;
		bins_[bin].second.Resize(size);
		bins_[bin].second.CopyFromVec(this_bin.Range(first_index, size));

		// Replicate a bug in HTK, for testing purposes.
		if (opts.htk_mode && bin == 0 && mel_low_freq != 0.0)
			bins_[bin].second(0) = 0.0;
			
	}
	... ...
}

3. feature-fbank

3.1 feature-fbank.h中的默认值

struct FbankOptions {
	... ...
	FbankOptions(): mel_opts(23),
                   use_energy(false),
                   energy_floor(0.0),
                   raw_energy(true),
                   htk_compat(false),
                   use_log_fbank(true),
                   use_power(true) {}
 	... ...
 }
函数中参数名及默认值conf 中设置示例意义
mel_opts(23)–num-mel-bins=25梅尔滤波器(倒三角)个数
use_energy(false)–use-energy=falsefbank输出增加能量维度
energy_floor(0.0)–energy-floor=0.0对能量进行限制,当–use-energy=true时,起作用;仅在–dither=0.0时,才有必要,因为log时会报错
raw_energy(true)–raw-energy=true计算预加重和加窗之前的能量
htk_compat(false)–htk-compat=false不知道
use_log_fbank(true)–use-log-fbank=true对fbank特征取log
use_power(true)–use-power=true若true,用频谱能量;若false,用频谱绝对值

3.2 feature-fbank.cc中相关函数

3.2.1 FbankComputer

FbankComputer中通过GetMelBanks调用mel滤波器相关

FbankComputer::FbankComputer(const FbankOptions &opts):
		opts_(opts), srfft_(NULL) {
	if (opts.energy_floor > 0.0)
		log_energy_floor_ = Log(opts.energy_floor);
		
	int32 padded_window_size = opts.frame_opts.PaddedWindowSize();
	if ((padded_window_size & (padded_window_size-1)) == 0)  // Is a power of two...
		srfft_ = new SplitRadixRealFft<BaseFloat>(padded_window_size);

	// We'll definitely need the filterbanks info for VTLN warping factor 1.0.
	// [note: this call caches it.]
	GetMelBanks(1.0);
}

3.2.2 GetMelBanks

fbank中,通过GetMelBanks(1.0)调用获得特征数据,其中传入 vtln_warp=1.0,参考 2.2.2 vtln_warp_factor,即不做vtln操作。
通过this_mel_banks = new MelBanks获得当前帧的fbank(MelBanks的类在src/feat/mel-computations.h中)
具体函数如下

const MelBanks* FbankComputer::GetMelBanks(BaseFloat vtln_warp) {
	MelBanks *this_mel_banks = NULL;
	std::map<BaseFloat, MelBanks*>::iterator iter = mel_banks_.find(vtln_warp);
	if (iter == mel_banks_.end()) {
	this_mel_banks = new MelBanks(opts_.mel_opts,
							opts_.frame_opts,
							vtln_warp);
	mel_banks_[vtln_warp] = this_mel_banks;
	} else {
		this_mel_banks = iter->second;
	}
return this_mel_banks;
}

4.小结

在kaldi中,一般通过shell脚本调用*.o程序来实现具体过程,其中参数通过shell脚本传入。一般性的配置参数,在egs/*/s5/conf/下。如提取fbank特征时,一般设置在conf/fbank.conf中配置,而后通过compute-fbank-feats --config=$fbank_config来配置
例如16000Hz音频提取40个fbank的一般性fbank.conf中配置为

--num-mel-bins=40
--sample-frequency=16000

其相当于

--num-mel-bins=40
--sample-frequency=16000

--use-energy=false
--energy-floor=0.0
--raw-energy=true
--htk-compat=false
--use-log-fbank=true
--use-power=true
--low-freq=20
--high-freq=0
--debug-mel=false
--frame-shift=10
--frame-length=25
--dither=1.0
--preemphasis-coefficient=0.97
--remove-dc-offset=true
--window-type=povey
--round-to-power-of-two=true
--snip-edges=true
--allow-downsample=false
--allow-upsample=false
--max-feature-vectors=-1
版权声明:本文为c12345678999原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/c12345678999/article/details/121850138

智能推荐

sklearn中的特征提取(important)

http://d0evi1.com/sklearn/feature_extraction/ 1.介绍 sklearn.feature_extraction模块,可以用于从包含文本和图片的数据集中提取特征,以便支持机器学习算法使用。 注意:Feature extraction与Feature Selection是完全不同的:前者将专有数据(文本或图片)转换成机器学习中可用的数值型特征;后者则是用在这...

sklearn 中的特征提取

sklearn.feature_extraction模块主要处理从原始数据中特征提取,目前主要包括从文本或图像中提取特征方法。 sklearn.feature_extraction.DictVectorizer(dtype=<type ‘numpy.float64’>, separator=’=’,sparse=True, sort=Tr...

【深度学习】backbone 特征提取网络 参数比较 【3】

文章目录 EfficientNetB0 # 524w ESNet_x0_25 # 281w GhostNet_x0_5 # 257w GoogLeNet # 1153w HarDNet39_ds # 347w HRNet_W18_C # 2124w InceptionV3 # 2380w InceptionV4 # 4261w LeViT_128 # 878w MixNet_S # 410w Mo...

【深度学习】backbone 特征提取网络 参数比较 【1】

文章目录 Alexnet # 6110w CSPDarkNet53 # 2760w CSWinTransformer_tiny_224 # 2232w DarkNet53 # 4157w DeiT_base_distilled_patch16_224 # 8718w 这篇https://blog.csdn.net/x1131230123/article/details/125643918在介绍Pa...

【深度学习】backbone 特征提取网络 参数比较 【2】

文章目录 DenseNet264 # 3293w DenseNet121 # 789w DLA34 # 1572w DPN68 # 1254w DenseNet264 # 3293w {‘total_params’: 33736232, ‘trainable_params’: 32939176} DenseNet121 # 789w {‘...

猜你喜欢

特征提取

1.基本文本处理技能 1.1 分词的概念(分词的正向最大、逆向最大、双向最大匹配法) 所谓词典正向最大匹配就是将一段字符串进行分隔,其中分隔 的长度有限制,然后将分隔的子字符串与字典中的词进行匹配,如果匹配成功则进行下一轮匹配,直到所有字符串处理完毕,否则将子字符串从末尾去除一个字,再进行匹配,如此反复。逆向匹配与此类似。 正向最大匹配法和逆向最大匹配法,都有其局限性,因此有人又提出了双向最大匹配...

特征提取

 链接:https://www.nowcoder.com/questionTerminal/5afcf93c419a4aa793e9b325d01957e2 来源:牛客网   小明是一名算法工程师,同时也是一名铲屎官。某天,他突发奇想,想从猫咪的视频里挖掘一些猫咪的运动信息。为了提取运动信息,他需要从视频的每一帧提取“猫咪特征”。一个猫咪特征是一个两维的...

Android 使用Lottie的三个小技巧

Android 使用Lottie的三个小技巧 Shawn 文章目录 Android 使用Lottie的三个小技巧 I 开启硬件加速 II 通过添加AnimatorListener来控制动画行为 III 通过设置播放速度来实现动画倒放 I 开启硬件加速 开启硬件加速是个提升lottie动画表现效果的一个好办法,在我的老手机上,不开硬件加速就跟幻灯片一样. II 通过添加AnimatorListene...