kaldi中相关函数在 src/feat 目录下
struct FrameExtractionOptions {
... ...
FrameExtractionOptions():
samp_freq(16000),
frame_shift_ms(10.0),
frame_length_ms(25.0),
dither(1.0),
preemph_coeff(0.97),
remove_dc_offset(true),
window_type("povey"),
round_to_power_of_two(true),
blackman_coeff(0.42),
snip_edges(true),
allow_downsample(false),
allow_upsample(false),
max_feature_vectors(-1)
... ...
}
这些参数,可以在conf中设置,若不设置,则为默认值。
| 函数中参数名及默认值 | conf 中设置示例 | 意义 |
|---|---|---|
| samp_freq(16000) | –sample-frequency=16000 | 音频处理时的采样率,单位是 Hz |
| frame_shift_ms(10.0) | –frame-shift=10 | 窗移时间,单位是 ms |
| frame_length_ms(25.0) | –frame-length=25 | 窗长时间,单位是 ms |
| dither(1.0) | –dither=1.0 | 每帧添加的随机噪声系数,训练时用,相当于增加扰动,但会增加特征提取时间,关闭则设置为 0.0,默认为 1.0 |
| preemph_coeff(0.97) | –preemphasis-coefficient=0.97 | 预加重系数 |
| remove_dc_offset(true) | –remove-dc-offset=true | 每帧数据均值移到0,若要保持数据原始特性,则设置成 false |
| window_type(“povey”) | –window-type=povey | 窗函数,包含 hamming,hanning,povery,rectangular,sine,blackmann,其中povery是Dan自己设计的 |
| round_to_power_of_two(true) | –round-to-power-of-two=true | FFT变换时,用0填充至2幂次 |
| blackman_coeff(0.42) | –blackman-coeff=0.42 | 窗函数用blackman时的相关系数 |
| snip_edges(true) | –snip-edges=true | 若为true,则只输出完全适合文件的帧来处理结束效果,帧数取决于帧长度。若为false,则帧数仅取决于帧移动,我们将在末尾反映数据。 |
| allow_downsample(false) | –allow-downsample=false | 是否降采样。若true时,表示允许降采样,将根据 sample-frequency 设置的参数进行降采样 |
| allow_upsample(false) | –allow-upsample=false | 是否上采用 |
| max_feature_vectors(-1) | –max-feature-vectors=-1 | 内存优化。若大于0,则定期删除特征向量,以便仅保留此数量的最新特征向量。主要是为了防止 out of memory 这种报错,导致特征提取时异常结束 |
主要是根据采样率,是否降采样等,计算窗长,窗移等,而后调用 ProcessWindow 对每一帧进行操作。
对每一帧进行细致操作,具体如下
该值默认为1.0,表示对每一帧数据添加随机高斯的系数。可理解为数据扰动,但是提取特征时,会花更多时间用于产生高斯随机数。若数据已做过比较充分的数据扩增,可以将其设置为0.0。
其公式为
x
i
=
x
i
+
G
a
u
s
s
∗
d
i
t
h
e
r
x_{i} = x_{i} + Gauss * dither
xi=xi+Gauss∗dither
具体函数如下
void ProcessWindow(...){
... ...
if (opts.dither != 0.0)
Dither(window, opts.dither);
... ...
}
void Dither(VectorBase<BaseFloat> *waveform, BaseFloat dither_value) {
if (dither_value == 0.0)
return;
int32 dim = waveform->Dim();
BaseFloat *data = waveform->Data();
RandomState rstate;
for (int32 i = 0; i < dim; i++)
data[i] += RandGauss(&rstate) * dither_value;
}
该值默认为true,表示是否对每帧的数据点进行平移,使其均值为0。若录音设备电压不稳定,可能导致录的音频电位漂移。正常设备,在时间窗内,数据均值是接近0的数。true或false,两者会有略微差别,但差别不是很大,个人经验,对于fbank而言,差别在±1以内。
其公式为
x
i
=
x
i
−
x
‾
x_{i} = x_{i} - \overline{x}
xi=xi−x
具体函数如下
void ProcessWindow(...){
... ...
if (opts.remove_dc_offset)
window->Add(-window->Sum() / frame_length);
... ...
}
该值默认是NULL,表示对窗内数据点是否做log操作,无相关外部输入。无修改kaldi源码的情况下,不会进行操作。
具体函数如下
void ProcessWindow(...){
... ...
if (log_energy_pre_window != NULL) {
BaseFloat energy = std::max<BaseFloat>(VecVec(*window, *window),
std::numeric_limits<float>::epsilon());
*log_energy_pre_window = Log(energy);
}
... ...
}
该值默认值为0.97,表示预加重权重。
注:其预加重方式对第一帧也做了特殊处理,其公式为
x
i
=
{
x
i
−
α
∗
x
i
i
=
0
x
i
−
α
∗
x
i
−
1
i
>
=
1
x_{i}=\left\{ \begin{aligned} &x_{i} - \alpha * x_{i} & &i=0\\ &x_{i} - \alpha * x_{i-1} & &i>=1 \end{aligned} \right.
xi={xi−α∗xixi−α∗xi−1i=0i>=1
具体函数如下
void ProcessWindow(...){
... ...
if (opts.preemph_coeff != 0.0)
Preemphasize(window, opts.preemph_coeff);
... ...
}
... ...
void Preemphasize(VectorBase<BaseFloat> *waveform, BaseFloat preemph_coeff) {
if (preemph_coeff == 0.0) return;
KALDI_ASSERT(preemph_coeff >= 0.0 && preemph_coeff <= 1.0);
for (int32 i = waveform->Dim()-1; i > 0; i--)
(*waveform)(i) -= preemph_coeff * (*waveform)(i-1);
(*waveform)(0) -= preemph_coeff * (*waveform)(0);
}
MulElements函数是将时域数据一边进行FFT变换一边乘以窗函数(比较高级,一般FFT两层循环,他一层循环就搞定了!!!)。
具体代码在src/matrix/kaldi-matrix.cc和src/matrix/cblas-wrappers.h中,
其函数如下(其中mul_elements是在src/matrix/cblas-wrappers.h中):
//----------------------------------------
// in src/matrix/kaldi-matrix.cc
template<typename Real>
void MatrixBase<Real>::MulElements(const MatrixBase<Real> &a) {
KALDI_ASSERT(a.NumRows() == num_rows_ && a.NumCols() == num_cols_);
if (num_cols_ == stride_ && num_cols_ == a.stride_) {
mul_elements(num_rows_ * num_cols_, a.data_, data_);
} else {
MatrixIndexT a_stride = a.stride_, stride = stride_;
Real *data = data_, *a_data = a.data_;
for (MatrixIndexT i = 0; i < num_rows_; i++) {
mul_elements(num_cols_, a_data, data);
a_data += a_stride;
data += stride;
}
}
}
//-----------------------------------------------
// in src/matrix/cblas-wrappers.h
inline void mul_elements(
const MatrixIndexT dim,
const float *a,
float *b) { // does b *= a, elementwise.
float c1, c2, c3, c4;
MatrixIndexT i;
for (i = 0; i + 4 <= dim; i += 4) {
c1 = a[i] * b[i];
c2 = a[i+1] * b[i+1];
c3 = a[i+2] * b[i+2];
c4 = a[i+3] * b[i+3];
b[i] = c1;
b[i+1] = c2;
b[i+2] = c3;
b[i+3] = c4;
}
for (; i < dim; i++)
b[i] *= a[i];
}
kaldi用到的窗函数及公式如下:
| 窗函数名称 | 公式 |
|---|---|
| hanning | w i = 0.5 − 0.5 ∗ c o s ( 2 ∗ π ∗ i / ( N − 1 ) ) , 0 < = i < N w_i=0.5-0.5*cos(2*\pi*i/(N-1)) ,0<=i<N wi=0.5−0.5∗cos(2∗π∗i/(N−1)),0<=i<N |
| sine | w i = s i n ( π ∗ i / ( N − 1 ) ) , 0 < = i < N w_i=sin(\pi*i/(N-1)) ,0<=i<N wi=sin(π∗i/(N−1)),0<=i<N |
| hamming | w i = 0.54 − 0.46 ∗ c o s ( 2 ∗ π ∗ i / ( N − 1 ) ) , 0 < = i < N w_i=0.54-0.46*cos(2*\pi*i/(N-1)) ,0<=i<N wi=0.54−0.46∗cos(2∗π∗i/(N−1)),0<=i<N |
| povery | w i = ( 0.5 − 0.5 ∗ c o s ( 2 ∗ π ∗ i / ( N − 1 ) ) ) 0.85 , 0 < = i < N w_i=(0.5-0.5*cos(2*\pi*i/(N-1)) )^{0.85},0<=i<N wi=(0.5−0.5∗cos(2∗π∗i/(N−1)))0.85,0<=i<N |
| rectangular | w i = 1 w_i=1 wi=1 |
| blackman | w i = b l a c k m a n _ c o e f f − 0.5 ∗ c o s ( 2 ∗ π ∗ i / ( N − 1 ) ) + ( 0.5 − b l a c k m a n _ c o e f f ) ∗ c o s ( 4 ∗ π ∗ i / ( N − 1 ) ) w_i=blackman\_coeff - 0.5*cos(2*\pi*i/(N-1)) + (0.5 - blackman\_coeff)*cos(4*\pi*i/(N-1)) wi=blackman_coeff−0.5∗cos(2∗π∗i/(N−1))+(0.5−blackman_coeff)∗cos(4∗π∗i/(N−1)) |
具体函数如下
void ProcessWindow(...,
const FeatureWindowFunction &window_function,
...){
... ...
window->MulElements(window_function.window)
}
... ...
FeatureWindowFunction::FeatureWindowFunction(const FrameExtractionOptions &opts) {
int32 frame_length = opts.WindowSize();
KALDI_ASSERT(frame_length > 0);
window.Resize(frame_length);
double a = M_2PI / (frame_length-1);
for (int32 i = 0; i < frame_length; i++) {
double i_fl = static_cast<double>(i);
if (opts.window_type == "hanning") {
window(i) = 0.5 - 0.5*cos(a * i_fl);
} else if (opts.window_type == "sine") {
// when you are checking ws wikipedia, please
// note that 0.5 * a = M_PI/(frame_length-1)
window(i) = sin(0.5 * a * i_fl);
} else if (opts.window_type == "hamming") {
window(i) = 0.54 - 0.46*cos(a * i_fl);
} else if (opts.window_type == "povey") { // like hamming but goes to zero at edges.
window(i) = pow(0.5 - 0.5*cos(a * i_fl), 0.85);
} else if (opts.window_type == "rectangular") {
window(i) = 1.0;
} else if (opts.window_type == "blackman") {
window(i) = opts.blackman_coeff - 0.5*cos(a * i_fl) +
(0.5 - opts.blackman_coeff) * cos(2 * a * i_fl);
} else {
KALDI_ERR << "Invalid window type " << opts.window_type;
}
}
}
struct MelBanksOptions {
... ...
explicit MelBanksOptions(int num_bins = 25)
: num_bins(num_bins), low_freq(20), high_freq(0), vtln_low(100),
vtln_high(-500), debug_mel(false), htk_mode(false) {}
... ...
| 函数中参数名及默认值 | conf 中设置示例 | 意义 |
|---|---|---|
| num_bins = 25 | –num-mel-bins=25 | 梅尔滤波器(倒三角)个数 |
| low_freq(20) | –low-freq=20 | 滤波器最低截止频率 |
| high_freq(0) | –high-freq=0 | 滤波器最高截止频率,注:若该值小于等于0,则 截止频率 = 采样率 - high_freq |
| vtln_low(100) | –vtln-low=100 | 分段线性VTLN翘曲函数的低拐点 |
| vtln_high(-500) | –vtln-high=-500 | 分段线性VTLN翘曲函数的高拐点,注:若该值小于0,则类似high_freq |
| debug_mel(false) | –debug-mel=false | 打印mel bin计算的调试信息 |
| htk_mode(false) | 无外部接口 | ———— |
注:0 <= low_freq < vltn_low < vltn_high < high_freq <= samp_freq
InverseMelScale表示Mel频谱到Hz频谱的转换
MelScale表示Hz频谱到Mel频谱的转换
MelBanks表示具体fbank提取过程
class MelBanks {
public:
static inline BaseFloat InverseMelScale(BaseFloat mel_freq) {
return 700.0f * (expf (mel_freq / 1127.0f) - 1.0f);
}
static inline BaseFloat MelScale(BaseFloat freq) {
return 1127.0f * logf (1.0f + freq / 700.0f);
}
static BaseFloat VtlnWarpFreq(BaseFloat vtln_low_cutoff,
BaseFloat vtln_high_cutoff, // discontinuities in warp func
BaseFloat low_freq,
BaseFloat high_freq, // upper+lower frequency cutoffs in
// the mel computation
BaseFloat vtln_warp_factor,
BaseFloat freq);
static BaseFloat VtlnWarpMelFreq(BaseFloat vtln_low_cutoff,
BaseFloat vtln_high_cutoff,
BaseFloat low_freq,
BaseFloat high_freq,
BaseFloat vtln_warp_factor,
BaseFloat mel_freq);
MelBanks(const MelBanksOptions &opts,
const FrameExtractionOptions &frame_opts,
BaseFloat vtln_warp_factor);
/// Compute Mel energies (note: not log enerties).
/// At input, "fft_energies" contains the FFT energies (not log).
void Compute(const VectorBase<BaseFloat> &fft_energies,
VectorBase<BaseFloat> *mel_energies_out) const;
int32 NumBins() const { return bins_.size(); }
// returns vector of central freq of each bin; needed by plp code.
const Vector<BaseFloat> &GetCenterFreqs() const { return center_freqs_; }
const std::vector<std::pair<int32, Vector<BaseFloat> > >& GetBins() const {
return bins_;
}
// Copy constructor
MelBanks(const MelBanks &other);
private:
// Disallow assignment
MelBanks &operator = (const MelBanks &other);
// center frequencies of bins, numbered from 0 ... num_bins-1.
// Needed by GetCenterFreqs().
Vector<BaseFloat> center_freqs_;
// the "bins_" vector is a vector, one for each bin, of a pair:
// (the first nonzero fft-bin), (the vector of weights).
std::vector<std::pair<int32, Vector<BaseFloat> > > bins_;
bool debug_;
bool htk_mode_;
};
相关参数有 low_freq, high_freq, num_fft_bins, fft_bin_width, mel_low_freq, mel_high_freq, mel_freq_delta, vtln_low, vtln_high,代码相对简单,不再累述
具体函数如下
MelBanks::MelBanks(const MelBanksOptions &opts,
const FrameExtractionOptions &frame_opts,
BaseFloat vtln_warp_factor):
htk_mode_(opts.htk_mode) {
int32 num_bins = opts.num_bins;
if (num_bins < 3) KALDI_ERR << "Must have at least 3 mel bins";
BaseFloat sample_freq = frame_opts.samp_freq;
int32 window_length_padded = frame_opts.PaddedWindowSize();
KALDI_ASSERT(window_length_padded % 2 == 0);
int32 num_fft_bins = window_length_padded / 2;
BaseFloat nyquist = 0.5 * sample_freq;
BaseFloat low_freq = opts.low_freq, high_freq;
if (opts.high_freq > 0.0)
high_freq = opts.high_freq;
else
high_freq = nyquist + opts.high_freq;
if (low_freq < 0.0 || low_freq >= nyquist
|| high_freq <= 0.0 || high_freq > nyquist
|| high_freq <= low_freq)
KALDI_ERR << "Bad values in options: low-freq " << low_freq
<< " and high-freq " << high_freq << " vs. nyquist "
<< nyquist;
BaseFloat fft_bin_width = sample_freq / window_length_padded;
// fft-bin width [think of it as Nyquist-freq / half-window-length]
BaseFloat mel_low_freq = MelScale(low_freq);
BaseFloat mel_high_freq = MelScale(high_freq);
debug_ = opts.debug_mel;
// divide by num_bins+1 in next line because of end-effects where the bins
// spread out to the sides.
BaseFloat mel_freq_delta = (mel_high_freq - mel_low_freq) / (num_bins+1);
BaseFloat vtln_low = opts.vtln_low,
vtln_high = opts.vtln_high;
if (vtln_high < 0.0) {
vtln_high += nyquist;
}
... ...
}
该值通过外部传参输入,一般是1.0,表示不做 vtln 相关操作,若做操作,会影响滤波器的权重分布,
其公式如下:
l
=
v
t
l
n
_
l
o
w
_
c
u
t
o
f
f
∗
m
a
x
(
1.0
,
v
t
l
n
_
w
a
r
p
_
f
a
c
t
o
r
)
h
=
v
t
l
n
_
h
i
g
h
_
c
u
t
o
f
f
∗
m
i
n
(
1.0
,
v
t
l
n
_
w
a
r
p
_
f
a
c
t
o
r
)
s
c
a
l
e
_
l
e
f
t
=
(
l
−
l
o
w
_
f
r
e
q
)
/
(
l
−
l
o
w
_
f
r
e
q
)
s
c
a
l
e
_
r
i
g
h
t
=
(
h
i
g
h
f
r
e
q
−
h
)
/
(
h
i
g
h
f
r
e
q
−
h
)
\begin{aligned} l & = vtln\_low\_cutoff * max(1.0, vtln\_warp\_factor) \\ h & = vtln\_high\_cutoff * min(1.0, vtln\_warp\_factor)\\ scale\_left & = (l - low\_freq) / (l - low\_freq)\\ scale\_right & = (high_freq - h) / (high_freq - h)\\ \end{aligned}
lhscale_leftscale_right=vtln_low_cutoff∗max(1.0,vtln_warp_factor)=vtln_high_cutoff∗min(1.0,vtln_warp_factor)=(l−low_freq)/(l−low_freq)=(highfreq−h)/(highfreq−h)
f
r
e
q
_
o
u
t
=
{
l
o
w
_
f
r
e
q
+
s
c
a
l
e
_
l
e
f
t
∗
(
f
r
e
q
−
l
o
w
_
f
r
e
q
)
f
r
e
q
<
l
s
c
a
l
e
∗
f
r
e
q
l
<
=
f
r
e
q
<
h
h
i
g
h
_
f
r
e
q
+
s
c
a
l
e
_
r
i
g
h
t
∗
(
f
r
e
q
−
h
i
g
h
_
f
r
e
q
)
f
r
e
q
>
=
h
freq\_out=\left\{ \begin{aligned} &low\_freq + scale\_left * (freq - low\_freq) & &freq < l \\ &scale * freq & &l <= freq < h\\ &high\_freq + scale\_right * (freq - high\_freq) & &freq >=h \end{aligned} \right.
freq_out=⎩⎪⎨⎪⎧low_freq+scale_left∗(freq−low_freq)scale∗freqhigh_freq+scale_right∗(freq−high_freq)freq<ll<=freq<hfreq>=h
具体函数如下
MelBanks::MelBanks(...):
htk_mode_(opts.htk_mode) {
... ...
for (int32 bin = 0; bin < num_bins; bin++) {
... ...
if (vtln_warp_factor != 1.0) {
left_mel = VtlnWarpMelFreq(vtln_low, vtln_high, low_freq, high_freq,
vtln_warp_factor, left_mel);
center_mel = VtlnWarpMelFreq(vtln_low, vtln_high, low_freq, high_freq,
vtln_warp_factor, center_mel);
right_mel = VtlnWarpMelFreq(vtln_low, vtln_high, low_freq, high_freq,
vtln_warp_factor, right_mel);
}
... ...
}
... ...
}
... ...
BaseFloat MelBanks::VtlnWarpMelFreq(...) {
return MelScale(VtlnWarpFreq(vtln_low_cutoff, vtln_high_cutoff,
low_freq, high_freq,
vtln_warp_factor, InverseMelScale(mel_freq)));
}
... ...
BaseFloat MelBanks::VtlnWarpFreq(...){
if (freq < low_freq || freq > high_freq) return freq; // in case this gets called
// for out-of-range frequencies, just return the freq.
KALDI_ASSERT(vtln_low_cutoff > low_freq &&
"be sure to set the --vtln-low option higher than --low-freq");
KALDI_ASSERT(vtln_high_cutoff < high_freq &&
"be sure to set the --vtln-high option lower than --high-freq [or negative]");
BaseFloat one = 1.0;
BaseFloat l = vtln_low_cutoff * std::max(one, vtln_warp_factor);
BaseFloat h = vtln_high_cutoff * std::min(one, vtln_warp_factor);
BaseFloat scale = 1.0 / vtln_warp_factor;
BaseFloat Fl = scale * l; // F(l);
BaseFloat Fh = scale * h; // F(h);
KALDI_ASSERT(l > low_freq && h < high_freq);
// slope of left part of the 3-piece linear function
BaseFloat scale_left = (Fl - low_freq) / (l - low_freq);
// [slope of center part is just "scale"]
// slope of right part of the 3-piece linear function
BaseFloat scale_right = (high_freq - Fh) / (high_freq - h);
if (freq < l) {
return low_freq + scale_left * (freq - low_freq);
} else if (freq < h) {
return scale * freq;
} else { // freq >= h
return high_freq + scale_right * (freq - high_freq);
}
}
若vtln_warp_factor==1.0,则不做mel权重的变形操作,为正常mel倒三角权重。
其公式为:
B
i
n
m
[
k
]
=
{
0
f
k
<
f
m
+
1
(
f
k
−
f
m
−
1
)
/
(
f
m
−
f
m
−
1
)
f
m
−
1
<
f
k
<
f
m
(
f
m
+
1
−
f
k
)
/
(
f
m
+
1
−
f
m
)
f
m
<
f
k
<
f
m
+
1
0
f
k
>
f
m
+
1
Bin_m[k]=\left\{ \begin{aligned} &0 & &f_k <f_{m+1}\\ &(f_k -f_{m-1})/(f_m-f_{m-1}) & &f_{m-1}<f_k<f_m\\ &(f_{m+1}-f_k)/(f_{m+1}-f_m) & &f_m<f_k<f_{m+1}\\ &0 & &f_k > f_{m+1} \end{aligned} \right.
Binm[k]=⎩⎪⎪⎪⎪⎨⎪⎪⎪⎪⎧0(fk−fm−1)/(fm−fm−1)(fm+1−fk)/(fm+1−fm)0fk<fm+1fm−1<fk<fmfm<fk<fm+1fk>fm+1
具体函数如下
MelBanks::MelBanks(...):
htk_mode_(opts.htk_mode) {
... ...
for (int32 bin = 0; bin < num_bins; bin++) {
BaseFloat left_mel = mel_low_freq + bin * mel_freq_delta,
center_mel = mel_low_freq + (bin + 1) * mel_freq_delta,
right_mel = mel_low_freq + (bin + 2) * mel_freq_delta;
if (vtln_warp_factor != 1.0) {
... ...
}
center_freqs_(bin) = InverseMelScale(center_mel);
// this_bin will be a vector of coefficients that is only
// nonzero where this mel bin is active.
Vector<BaseFloat> this_bin(num_fft_bins);
int32 first_index = -1, last_index = -1;
for (int32 i = 0; i < num_fft_bins; i++) {
BaseFloat freq = (fft_bin_width * i);
// Center frequency of this fft bin.
BaseFloat mel = MelScale(freq);
if (mel > left_mel && mel < right_mel) {
BaseFloat weight;
if (mel <= center_mel)
weight = (mel - left_mel) / (center_mel - left_mel);
else
weight = (right_mel-mel) / (right_mel-center_mel);
this_bin(i) = weight;
if (first_index == -1)
first_index = i;
last_index = i;
}
}
KALDI_ASSERT(first_index != -1 && last_index >= first_index
&& "You may have set --num-mel-bins too large.");
bins_[bin].first = first_index;
int32 size = last_index + 1 - first_index;
bins_[bin].second.Resize(size);
bins_[bin].second.CopyFromVec(this_bin.Range(first_index, size));
// Replicate a bug in HTK, for testing purposes.
if (opts.htk_mode && bin == 0 && mel_low_freq != 0.0)
bins_[bin].second(0) = 0.0;
}
... ...
}
struct FbankOptions {
... ...
FbankOptions(): mel_opts(23),
use_energy(false),
energy_floor(0.0),
raw_energy(true),
htk_compat(false),
use_log_fbank(true),
use_power(true) {}
... ...
}
| 函数中参数名及默认值 | conf 中设置示例 | 意义 |
|---|---|---|
| mel_opts(23) | –num-mel-bins=25 | 梅尔滤波器(倒三角)个数 |
| use_energy(false) | –use-energy=false | fbank输出增加能量维度 |
| energy_floor(0.0) | –energy-floor=0.0 | 对能量进行限制,当–use-energy=true时,起作用;仅在–dither=0.0时,才有必要,因为log时会报错 |
| raw_energy(true) | –raw-energy=true | 计算预加重和加窗之前的能量 |
| htk_compat(false) | –htk-compat=false | 不知道 |
| use_log_fbank(true) | –use-log-fbank=true | 对fbank特征取log |
| use_power(true) | –use-power=true | 若true,用频谱能量;若false,用频谱绝对值 |
FbankComputer中通过GetMelBanks调用mel滤波器相关
FbankComputer::FbankComputer(const FbankOptions &opts):
opts_(opts), srfft_(NULL) {
if (opts.energy_floor > 0.0)
log_energy_floor_ = Log(opts.energy_floor);
int32 padded_window_size = opts.frame_opts.PaddedWindowSize();
if ((padded_window_size & (padded_window_size-1)) == 0) // Is a power of two...
srfft_ = new SplitRadixRealFft<BaseFloat>(padded_window_size);
// We'll definitely need the filterbanks info for VTLN warping factor 1.0.
// [note: this call caches it.]
GetMelBanks(1.0);
}
fbank中,通过GetMelBanks(1.0)调用获得特征数据,其中传入 vtln_warp=1.0,参考 2.2.2 vtln_warp_factor,即不做vtln操作。
通过this_mel_banks = new MelBanks获得当前帧的fbank(MelBanks的类在src/feat/mel-computations.h中)
具体函数如下
const MelBanks* FbankComputer::GetMelBanks(BaseFloat vtln_warp) {
MelBanks *this_mel_banks = NULL;
std::map<BaseFloat, MelBanks*>::iterator iter = mel_banks_.find(vtln_warp);
if (iter == mel_banks_.end()) {
this_mel_banks = new MelBanks(opts_.mel_opts,
opts_.frame_opts,
vtln_warp);
mel_banks_[vtln_warp] = this_mel_banks;
} else {
this_mel_banks = iter->second;
}
return this_mel_banks;
}
在kaldi中,一般通过shell脚本调用*.o程序来实现具体过程,其中参数通过shell脚本传入。一般性的配置参数,在egs/*/s5/conf/下。如提取fbank特征时,一般设置在conf/fbank.conf中配置,而后通过compute-fbank-feats --config=$fbank_config来配置
例如16000Hz音频提取40个fbank的一般性fbank.conf中配置为
--num-mel-bins=40
--sample-frequency=16000
其相当于
--num-mel-bins=40
--sample-frequency=16000
--use-energy=false
--energy-floor=0.0
--raw-energy=true
--htk-compat=false
--use-log-fbank=true
--use-power=true
--low-freq=20
--high-freq=0
--debug-mel=false
--frame-shift=10
--frame-length=25
--dither=1.0
--preemphasis-coefficient=0.97
--remove-dc-offset=true
--window-type=povey
--round-to-power-of-two=true
--snip-edges=true
--allow-downsample=false
--allow-upsample=false
--max-feature-vectors=-1
http://d0evi1.com/sklearn/feature_extraction/ 1.介绍 sklearn.feature_extraction模块,可以用于从包含文本和图片的数据集中提取特征,以便支持机器学习算法使用。 注意:Feature extraction与Feature Selection是完全不同的:前者将专有数据(文本或图片)转换成机器学习中可用的数值型特征;后者则是用在这...
sklearn.feature_extraction模块主要处理从原始数据中特征提取,目前主要包括从文本或图像中提取特征方法。 sklearn.feature_extraction.DictVectorizer(dtype=<type ‘numpy.float64’>, separator=’=’,sparse=True, sort=Tr...
文章目录 EfficientNetB0 # 524w ESNet_x0_25 # 281w GhostNet_x0_5 # 257w GoogLeNet # 1153w HarDNet39_ds # 347w HRNet_W18_C # 2124w InceptionV3 # 2380w InceptionV4 # 4261w LeViT_128 # 878w MixNet_S # 410w Mo...
文章目录 Alexnet # 6110w CSPDarkNet53 # 2760w CSWinTransformer_tiny_224 # 2232w DarkNet53 # 4157w DeiT_base_distilled_patch16_224 # 8718w 这篇https://blog.csdn.net/x1131230123/article/details/125643918在介绍Pa...
文章目录 DenseNet264 # 3293w DenseNet121 # 789w DLA34 # 1572w DPN68 # 1254w DenseNet264 # 3293w {‘total_params’: 33736232, ‘trainable_params’: 32939176} DenseNet121 # 789w {‘...
1.基本文本处理技能 1.1 分词的概念(分词的正向最大、逆向最大、双向最大匹配法) 所谓词典正向最大匹配就是将一段字符串进行分隔,其中分隔 的长度有限制,然后将分隔的子字符串与字典中的词进行匹配,如果匹配成功则进行下一轮匹配,直到所有字符串处理完毕,否则将子字符串从末尾去除一个字,再进行匹配,如此反复。逆向匹配与此类似。 正向最大匹配法和逆向最大匹配法,都有其局限性,因此有人又提出了双向最大匹配...
...
链接:https://www.nowcoder.com/questionTerminal/5afcf93c419a4aa793e9b325d01957e2 来源:牛客网 小明是一名算法工程师,同时也是一名铲屎官。某天,他突发奇想,想从猫咪的视频里挖掘一些猫咪的运动信息。为了提取运动信息,他需要从视频的每一帧提取“猫咪特征”。一个猫咪特征是一个两维的...
...
Android 使用Lottie的三个小技巧 Shawn 文章目录 Android 使用Lottie的三个小技巧 I 开启硬件加速 II 通过添加AnimatorListener来控制动画行为 III 通过设置播放速度来实现动画倒放 I 开启硬件加速 开启硬件加速是个提升lottie动画表现效果的一个好办法,在我的老手机上,不开硬件加速就跟幻灯片一样. II 通过添加AnimatorListene...