graph

Introduction

Each node in the computational graph is called a node or layer. There are input nodes, output nodes, cost function nodes, etc. Each node need to have two functions, i.e. the forward pass and backward pass (back-propagation). Some nodes, such as short time fourier transform, only have forward pass function because we don't need the backward pass now. This folder contains all the forward (F_xxx) and backward (B_xxx) functions of the nodes, and also some supporting functions.

#List of nodes

Input nodes
- Input: no function associated. Allow the network to input
- Weight to activation: curr_layer = F_weight2activation(curr_layer). This is a specially designed node type. It's activation is the same as its weight. This node is used when we want to have the weights as inputs.
Cost nodes
- Consine distance: output = F_cosine(input_layers). Compute the cosine distance of the activations of the two input layers. Applied on each column vectors independently.
- Mean square error: cost = F_mean_square_error(input_layers, useMahaDist, CostLayer). MSE cost function for regression.
- Logistic: [cost, acc] = F_logistic(input_layers, CostLayer). The logistic cost function for two class classification problems.
- Cross entropy: [cost,acc] = F_cross_entropy(input_layers, CE_layer). The common cross entropy between posterior probabilities and label. Support scaling of each column vector.
- Multi cross entropy: [cost,acc] = F_multi_cross_entropy(input_layers, CE_layer). Multiple cross entropy cost function. Useful for simultaneously predicting multiple classification targets.
- GMM log likelihood: curr_layer = F_ll_gmm(input, curr_layer). Compute the log likelihood of the input vectors on a given GMM model.
- Log determinant: output = F_logdet(input). Compute the log determinant of input matrix, usually a covariance matrix.
- Joint cost: cost = F_jointCost(input_layer, curr_layer). Compute the joint cost between neighboring frames. This can be used to measure the smoothness of vector trajectories.
Common neural network nodes
- Affine transform: [output, validFrameMask] = F_affine_transform(input_layer, transform, bias). The most common node type of neural networks.
- Sigmoid: [output] = F_sigmoid(input_layer). Apply sigmoid activation elementwise.
- Tanh: [output] = F_tanh(input_layer). Apply tanh activation.
- Relu: output = F_relu(input_layer, curr_layer). Relu activation. May set a threshold.
- Temporal convolution: [output,X2] = F_tconv(input, curr_layer). Apply temporal convolution.
- Temporal max pooling: [output,idx,validFrameMask] = F_tmaxpool(input_layer, curr_layer). Apply temporal max pooling. Used together with temporal convolution.
- Word to vector: output = F_word2vec(input, W, singlePrecision). Apply transform W on sparse input (one hot).
- Convolution: output = F_convolution(input, weights, b). Not finished yet.
- LSTM: [LSTM_layer] = F_LSTM(input_layer, LSTM_layer). A single unidirectional LSTM layer.
- Softmax: [output] = F_softmax(input_layer). Apply softmax activation.
- Multi softmax: output = F_multi_softmax(input_layer, TaskVocabSizes). Useful for multiple classification tasks.
Signal processing nodes
- Enframe: output = F_enframe(input, frame_len, frame_shift). Put time domain signal into overlapping frames. Forward pass only.
- Short time Fourier transform: [fft_x, maskFFT] = F_stft(input_layer, curr_layer). Extract STFT from input signals. Forward only. Allow control over window type, frame length, frame shift, fft length, whether to use DC removal, etc.
- Compute GCC features: [gcc, maskGCC] = F_comp_gcc(input_layer, curr_layer). Compute generalized cross correlation between micriphones. Only forward pass available.
- Beamforming: [output] = F_beamforming(input_layers, curr_layer). Apply beamforming filter (frequency domain) to STFT coefficients.
- Filter: output = F_filter(input_layers). Apply filter (from the first input layer) to data (from the second input layer).
- Itakura Saito distance: output = F_Itakura_Saito(input_layers). Compute the IS distance between clean speech and noisy/enhanced speech vectors.
- Logarithm: [output,validFrameMask] = F_log(input_layer, const). Apply logarithm to each element. Option to add a small constant for stable gradient.
- MVDR beamforming with spatial covariance matrices: curr_layer = F_MVDR_spatialCov(input_layer, curr_layer). Estimate MVDR beamforming parameters from noise and speech spatial covariance matrices (SCM).
- Spatial covariance: output = F_SpatialCov(input_layer, curr_layer). Compute the spatial covariance of input.
- Spatial covariance with mask: output = F_SpatialCovMask(prev_layers, curr_layer). Compute the spatial covariance matrices (SCM) of input with the help of a mask. Produce both noise and speech SCMs.
- Spatial covariance with split mask: output = F_SpatialCovSplitMask(prev_layers, curr_layer). Same as SpatialCovMask, but now uses separate speech and noise masks.
- TDOA to beamforming weight: output = F_tdoa2weight(input, freq_bin). Convert TDOA to beamforming weight. need to supply freq_bin which tells us which frequency bins to use.
- Power spectrum: [output] = F_power_spectrum(input_layer). Compute power spectrum from complex Fourier coefficients.
- Power spectrum split: output = F_power_spectrum_split(input). Compute power spectrum from concatenated real and imaginary elements of Fourier coefficients.
- Real and imaginary parts to beamforming weights: [output,validFrameMask] = F_real_imag2BFweight(input_layer, freq_bin, online). Convert the real and imaginary parts (concatenated) to the corresponding complex numbers of beamforming filter. Obsolute, see realImag2complex.
- Real and imaginary parts to complex number: [output] = F_realImag2complex(input_layer). Convert the real and imaginary parts (concatenated) to the corresponding complex numbers.
- Covariance: output = F_cov(input). Compute the covariance matrix of the input trajectory.
- Dynamic features: [output,validFrameMask] = F_dynamic_feat(input_layer). Compute the first and second time derivatives of vector sequences. Implemented according to the delta and acceleration features of ASR.
Data manipulation nodes
- Concatenate: output = F_concatenate(prev_layers). Concatenate the activations of prev_layers, assuming that the first dimensions are the same.
- Transpose: [output] = F_transpose(input_layer, curr_layer). Transpose input matrix.
- Repeat a matrix: [output] = F_repmat(input_layer, curr_layer). Similar to repmat of Matlab.
- Reshape a matrix: [output] = F_reshape(input_layer, curr_layer). Similiar to reshape of Matlab.
- Permute: output = F_permute(input_layer, curr_layer). Change the order of the data dimensions.
- Splice: [output,validFrameMask] = F_splice(input_layer, context). Concatenate neighbouring frames to incorporate context information. For example, splice feature vectors before DNN based acoustic model.
- Extract dimensions: [output] = F_ExtractDims(input_layer, idx). Extract a subset of dimensions from the input, the subset is defined by "idx".
- Frame select: [output, validFrameMask] = F_frame_select(input_layer, curr_layer). Select certain column vectors (frames) of the input.
- Frame shift: [output, validFrameMask] = F_frame_shift(input_layer, curr_layer). Shift the column vectors (frames) of the input.
Mathematical nodes
- Elementwise add: output = F_add(prev_layers). Add the activation of all prev_layers, assuming that they all have exactly the same size.
- Max: output = F_max(input_layer, curr_layer). Elementwise max function.
- Mean: output = F_mean(input_layer, curr_layer). Elementwise mean pooling.
- Median: output = F_median(input_layer, curr_layer). Elementwise median pooling.
- Min: output = F_min(input_layer, curr_layer). Elementwise min pooling.
- Elementwise multiplication: [output, validFrameMask] = F_hadamard(input_layers). Multiply the activation of previous layers element-by-element.
- Trajectory mean subtraction: [output,validFrameMask] = F_cmn(input_layer). Subtract the mean of a trajectory from the trajectory itself. E.g. the cepstral mean normalization used in ASR systems.
- Index to vector: [output,validMask] = F_idx2vec(input_layer, curr_layer, single_precision). Convert class label into one-hot representation.
- Inner product: output = F_inner_product(input_layers). Compute inner product of two vectors. Applied to column vectors independently.
- Inner product normalized: output = F_inner_product_normalized(input_layers). Same as inner product, but normalized by the size of column vectors.
- Linear discriminant analysis: not finished.
- Sparse linear transform: output = F_sparse_affine_transform(input, transform, bias, singlePrecision). Specially implemented for sparse input for speedup. Usually used for word embedding in NLP tasks.
- Weighted average: [output, weights] = F_weighted_average(prev_layers). Obtain weighted average of the input. One previous layer provide the weight and the other provide the input data.
- Weighting: output = F_weighting(input, weight, bias). Not finished.
- Within Covariance: output = F_Within_Cov(input_layers). Not finished.

Name		Name	Last commit message	Last commit date
parent directory ..
obsolute		obsolute
AddSpMatMat.m		AddSpMatMat.m
AddSpMatMat_sparseonly.m		AddSpMatMat_sparseonly.m
AssignCostGradFrame.m		AssignCostGradFrame.m
B_LDA.m		B_LDA.m
B_LSTM.m		B_LSTM.m
B_LSTM_back.m		B_LSTM_back.m
B_MVDR_spatialCov.m		B_MVDR_spatialCov.m
B_MVDR_spatialCov_old.m		B_MVDR_spatialCov_old.m
B_SpatialCovMask.m		B_SpatialCovMask.m
B_SpatialCovSplitMask.m		B_SpatialCovSplitMask.m
B_Within_Cov.m		B_Within_Cov.m
B_add.m		B_add.m
B_affine_transform.m		B_affine_transform.m
B_beamforming.m		B_beamforming.m
B_beamforming_freeWeight.m		B_beamforming_freeWeight.m
B_beamforming_gainTDOA.m		B_beamforming_gainTDOA.m
B_beamforming_power.m		B_beamforming_power.m
B_cmn.m		B_cmn.m
B_concatenate.m		B_concatenate.m
B_cosine.m		B_cosine.m
B_cov.m		B_cov.m
B_cross_entropy.m		B_cross_entropy.m
B_dynamic_feat.m		B_dynamic_feat.m
B_filter.m		B_filter.m
B_frame_select.m		B_frame_select.m
B_hadamard.m		B_hadamard.m
B_inner_product_normalized.m		B_inner_product_normalized.m
B_jointCost.m		B_jointCost.m
B_ll_gmm.m		B_ll_gmm.m
B_log.m		B_log.m
B_logdet.m		B_logdet.m
B_logistic.m		B_logistic.m
B_max.m		B_max.m
B_mean.m		B_mean.m
B_mean_square_error.m		B_mean_square_error.m
B_multi_softmax_multi_cross_entropy.m		B_multi_softmax_multi_cross_entropy.m
B_power_spectrum.m		B_power_spectrum.m
B_power_spectrum_split.m		B_power_spectrum_split.m
B_realImag2complex.m		B_realImag2complex.m
B_real_imag2BFweight.m		B_real_imag2BFweight.m
B_real_imag2BFweight_beamforming_power.m		B_real_imag2BFweight_beamforming_power.m
B_relu.m		B_relu.m
B_repmat.m		B_repmat.m
B_reshape.m		B_reshape.m
B_sigmoid.m		B_sigmoid.m
B_softmax.m		B_softmax.m
B_softmax_cross_entropy.m		B_softmax_cross_entropy.m
B_splice.m		B_splice.m
B_splice_single_sentence.m		B_splice_single_sentence.m
B_splice_single_sentence2.m		B_splice_single_sentence2.m
B_tanh.m		B_tanh.m
B_tconv.m		B_tconv.m
B_tdoa2weight.m		B_tdoa2weight.m
B_tdoa2weight_beamforming_power.m		B_tdoa2weight_beamforming_power.m
B_tmaxpool.m		B_tmaxpool.m
B_transpose.m		B_transpose.m
B_weight2activation.m		B_weight2activation.m
B_weighted_average.m		B_weighted_average.m
B_weighting.m		B_weighting.m
B_word2vec.m		B_word2vec.m
CheckTrajectoryLength.m		CheckTrajectoryLength.m
DNN_Cost10.m		DNN_Cost10.m
DNN_cost_wrapper.m		DNN_cost_wrapper.m
DNN_update.m		DNN_update.m
ExpandContext.m		ExpandContext.m
ExpandContext_v2.m		ExpandContext_v2.m
ExtractVariableLengthTrajectory.m		ExtractVariableLengthTrajectory.m
F_ExtractDims.m		F_ExtractDims.m
F_Itakura_Saito.m		F_Itakura_Saito.m
F_LDA.m		F_LDA.m
F_LSTM.m		F_LSTM.m
F_LSTM_back.m		F_LSTM_back.m
F_MVDR_spatialCov.m		F_MVDR_spatialCov.m
F_SpatialCov.m		F_SpatialCov.m
F_SpatialCovMask.m		F_SpatialCovMask.m
F_SpatialCovSplitMask.m		F_SpatialCovSplitMask.m
F_Within_Cov.m		F_Within_Cov.m
F_add.m		F_add.m
F_affine_transform.m		F_affine_transform.m
F_beamforming.m		F_beamforming.m
F_cmn.m		F_cmn.m
F_comp_gcc.m		F_comp_gcc.m
F_concatenate.m		F_concatenate.m
F_convolution.m		F_convolution.m
F_cosine.m		F_cosine.m
F_cov.m		F_cov.m
F_cross_entropy.m		F_cross_entropy.m
F_dynamic_feat.m		F_dynamic_feat.m
F_enframe.m		F_enframe.m
F_filter.m		F_filter.m
F_frame_select.m		F_frame_select.m
F_frame_shift.m		F_frame_shift.m
F_hadamard.m		F_hadamard.m
F_idx2vec.m		F_idx2vec.m
F_inner_product.m		F_inner_product.m
F_inner_product_normalized.m		F_inner_product_normalized.m
F_jointCost.m		F_jointCost.m
F_largerThan.m		F_largerThan.m
F_ll_gmm.m		F_ll_gmm.m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

Introduction

FilesExpand file tree

graph

Directory actions

More options

Directory actions

More options

Latest commit

History

graph

Folders and files

parent directory

readme.md

Introduction