Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Introduction

Each node in the computational graph is called a node or layer. There are input nodes, output nodes, cost function nodes, etc. Each node need to have two functions, i.e. the forward pass and backward pass (back-propagation). Some nodes, such as short time fourier transform, only have forward pass function because we don't need the backward pass now. This folder contains all the forward (F_xxx) and backward (B_xxx) functions of the nodes, and also some supporting functions.

#List of nodes

  • Input nodes

    • Input: no function associated. Allow the network to input
    • Weight to activation: curr_layer = F_weight2activation(curr_layer). This is a specially designed node type. It's activation is the same as its weight. This node is used when we want to have the weights as inputs.
  • Cost nodes

    • Consine distance: output = F_cosine(input_layers). Compute the cosine distance of the activations of the two input layers. Applied on each column vectors independently.
    • Mean square error: cost = F_mean_square_error(input_layers, useMahaDist, CostLayer). MSE cost function for regression.
    • Logistic: [cost, acc] = F_logistic(input_layers, CostLayer). The logistic cost function for two class classification problems.
    • Cross entropy: [cost,acc] = F_cross_entropy(input_layers, CE_layer). The common cross entropy between posterior probabilities and label. Support scaling of each column vector.
    • Multi cross entropy: [cost,acc] = F_multi_cross_entropy(input_layers, CE_layer). Multiple cross entropy cost function. Useful for simultaneously predicting multiple classification targets.
    • GMM log likelihood: curr_layer = F_ll_gmm(input, curr_layer). Compute the log likelihood of the input vectors on a given GMM model.
    • Log determinant: output = F_logdet(input). Compute the log determinant of input matrix, usually a covariance matrix.
    • Joint cost: cost = F_jointCost(input_layer, curr_layer). Compute the joint cost between neighboring frames. This can be used to measure the smoothness of vector trajectories.
  • Common neural network nodes

    • Affine transform: [output, validFrameMask] = F_affine_transform(input_layer, transform, bias). The most common node type of neural networks.
    • Sigmoid: [output] = F_sigmoid(input_layer). Apply sigmoid activation elementwise.
    • Tanh: [output] = F_tanh(input_layer). Apply tanh activation.
    • Relu: output = F_relu(input_layer, curr_layer). Relu activation. May set a threshold.
    • Temporal convolution: [output,X2] = F_tconv(input, curr_layer). Apply temporal convolution.
    • Temporal max pooling: [output,idx,validFrameMask] = F_tmaxpool(input_layer, curr_layer). Apply temporal max pooling. Used together with temporal convolution.
    • Word to vector: output = F_word2vec(input, W, singlePrecision). Apply transform W on sparse input (one hot).
    • Convolution: output = F_convolution(input, weights, b). Not finished yet.
    • LSTM: [LSTM_layer] = F_LSTM(input_layer, LSTM_layer). A single unidirectional LSTM layer.
    • Softmax: [output] = F_softmax(input_layer). Apply softmax activation.
    • Multi softmax: output = F_multi_softmax(input_layer, TaskVocabSizes). Useful for multiple classification tasks.
  • Signal processing nodes

    • Enframe: output = F_enframe(input, frame_len, frame_shift). Put time domain signal into overlapping frames. Forward pass only.
    • Short time Fourier transform: [fft_x, maskFFT] = F_stft(input_layer, curr_layer). Extract STFT from input signals. Forward only. Allow control over window type, frame length, frame shift, fft length, whether to use DC removal, etc.
    • Compute GCC features: [gcc, maskGCC] = F_comp_gcc(input_layer, curr_layer). Compute generalized cross correlation between micriphones. Only forward pass available.
    • Beamforming: [output] = F_beamforming(input_layers, curr_layer). Apply beamforming filter (frequency domain) to STFT coefficients.
    • Filter: output = F_filter(input_layers). Apply filter (from the first input layer) to data (from the second input layer).
    • Itakura Saito distance: output = F_Itakura_Saito(input_layers). Compute the IS distance between clean speech and noisy/enhanced speech vectors.
    • Logarithm: [output,validFrameMask] = F_log(input_layer, const). Apply logarithm to each element. Option to add a small constant for stable gradient.
    • MVDR beamforming with spatial covariance matrices: curr_layer = F_MVDR_spatialCov(input_layer, curr_layer). Estimate MVDR beamforming parameters from noise and speech spatial covariance matrices (SCM).
    • Spatial covariance: output = F_SpatialCov(input_layer, curr_layer). Compute the spatial covariance of input.
    • Spatial covariance with mask: output = F_SpatialCovMask(prev_layers, curr_layer). Compute the spatial covariance matrices (SCM) of input with the help of a mask. Produce both noise and speech SCMs.
    • Spatial covariance with split mask: output = F_SpatialCovSplitMask(prev_layers, curr_layer). Same as SpatialCovMask, but now uses separate speech and noise masks.
    • TDOA to beamforming weight: output = F_tdoa2weight(input, freq_bin). Convert TDOA to beamforming weight. need to supply freq_bin which tells us which frequency bins to use.
    • Power spectrum: [output] = F_power_spectrum(input_layer). Compute power spectrum from complex Fourier coefficients.
    • Power spectrum split: output = F_power_spectrum_split(input). Compute power spectrum from concatenated real and imaginary elements of Fourier coefficients.
    • Real and imaginary parts to beamforming weights: [output,validFrameMask] = F_real_imag2BFweight(input_layer, freq_bin, online). Convert the real and imaginary parts (concatenated) to the corresponding complex numbers of beamforming filter. Obsolute, see realImag2complex.
    • Real and imaginary parts to complex number: [output] = F_realImag2complex(input_layer). Convert the real and imaginary parts (concatenated) to the corresponding complex numbers.
    • Covariance: output = F_cov(input). Compute the covariance matrix of the input trajectory.
    • Dynamic features: [output,validFrameMask] = F_dynamic_feat(input_layer). Compute the first and second time derivatives of vector sequences. Implemented according to the delta and acceleration features of ASR.
  • Data manipulation nodes

    • Concatenate: output = F_concatenate(prev_layers). Concatenate the activations of prev_layers, assuming that the first dimensions are the same.
    • Transpose: [output] = F_transpose(input_layer, curr_layer). Transpose input matrix.
    • Repeat a matrix: [output] = F_repmat(input_layer, curr_layer). Similar to repmat of Matlab.
    • Reshape a matrix: [output] = F_reshape(input_layer, curr_layer). Similiar to reshape of Matlab.
    • Permute: output = F_permute(input_layer, curr_layer). Change the order of the data dimensions.
    • Splice: [output,validFrameMask] = F_splice(input_layer, context). Concatenate neighbouring frames to incorporate context information. For example, splice feature vectors before DNN based acoustic model.
    • Extract dimensions: [output] = F_ExtractDims(input_layer, idx). Extract a subset of dimensions from the input, the subset is defined by "idx".
    • Frame select: [output, validFrameMask] = F_frame_select(input_layer, curr_layer). Select certain column vectors (frames) of the input.
    • Frame shift: [output, validFrameMask] = F_frame_shift(input_layer, curr_layer). Shift the column vectors (frames) of the input.
  • Mathematical nodes

    • Elementwise add: output = F_add(prev_layers). Add the activation of all prev_layers, assuming that they all have exactly the same size.
    • Max: output = F_max(input_layer, curr_layer). Elementwise max function.
    • Mean: output = F_mean(input_layer, curr_layer). Elementwise mean pooling.
    • Median: output = F_median(input_layer, curr_layer). Elementwise median pooling.
    • Min: output = F_min(input_layer, curr_layer). Elementwise min pooling.
    • Elementwise multiplication: [output, validFrameMask] = F_hadamard(input_layers). Multiply the activation of previous layers element-by-element.
    • Trajectory mean subtraction: [output,validFrameMask] = F_cmn(input_layer). Subtract the mean of a trajectory from the trajectory itself. E.g. the cepstral mean normalization used in ASR systems.
    • Index to vector: [output,validMask] = F_idx2vec(input_layer, curr_layer, single_precision). Convert class label into one-hot representation.
    • Inner product: output = F_inner_product(input_layers). Compute inner product of two vectors. Applied to column vectors independently.
    • Inner product normalized: output = F_inner_product_normalized(input_layers). Same as inner product, but normalized by the size of column vectors.
    • Linear discriminant analysis: not finished.
    • Sparse linear transform: output = F_sparse_affine_transform(input, transform, bias, singlePrecision). Specially implemented for sparse input for speedup. Usually used for word embedding in NLP tasks.
    • Weighted average: [output, weights] = F_weighted_average(prev_layers). Obtain weighted average of the input. One previous layer provide the weight and the other provide the input data.
    • Weighting: output = F_weighting(input, weight, bias). Not finished.
    • Within Covariance: output = F_Within_Cov(input_layers). Not finished.