2016年5月25日 星期三

[ammai] Deep neural networks for acoustic modeling in speech recognition

Title: Deep neural networks for acoustic modeling in speech recognition

Author: Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, and Brian Kingsbury



Novelties:

This paper perform neural network architecture on acoustic modeling.

Contributions:

The modern way for speech recognition is mostly use hidden Markov models, HMM, for temporal information.
They also use Gaussian mixture models, GMMsm to denote the fitness between states of HMM and frames, this model is known as GMM-HMM system.
This paper approaches a method to use deep belief nets with HMM, and shows the DNN-HMM system is better than GMM-HMM system in many aspects.

Technical Summarizes:

The restricted Boltzmann machine, RBM, contains a visible layer and a stochastic binary hidden layer. The two layers are connected by undirected connections. After training current RBM, the hidden layer is prepared for the next RBM as input data.
After that, the stack of RBMs can be seemed to be a deep belief net, DBN by set the direction of the undirected connections.
The final step is to add a softmax layer on it.

Experiments:

The testing part is on TIMIT dataset, which is small enough to try different methods and details on it.
The results shows that DNN methods outperform the old method on most of the aspects, despite it is harder for parallelize.

沒有留言:

張貼留言