2016年4月27日 星期三

[ammai] DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING

Date: April 21st, 2016

Title: DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING

Author: Song Han, Huizi Mao, William J. Dally



Novelties:

Compress deep neural networks to fit in embedded systems without loss precision.

Contributions:

They use three methods to compress deep neural networks:
1. Pruning the weights.
2. Using quantization to represent weights with less bits.
3. Huffman encoding the weights

Technical Summarizes:


They use three steps to compress the network, pruning, quantization,, and Huffman coding:


The first step is network pruning. It is to prune small-weight connections which are smaller than some thresholds. After pruning, there is a sparse network whose size is 9x and 13x smaller for AlexNet and VGG-19 model.
Then they use compressed sparse row/column format to reduce the numbers needed. Finally, they represent the index in difference of neighbor positions to compress more.

The second step is quantization and weight sharing. They distributed the weights into some bins, the weights in same bins share same weights. So they performed k-means clustering on the weights for each layer. They tried three different centroid initialization methods: fordy, density-based, and linear.
The result showed that linear initialization performed the best because it linearly split the [min,max] of the original weights. This method preserves larger weights which influences more in the network.

The last step is Huffman coding. It saves 20%~30% storage.

Experiments:


They saves 35x to 49x storage. They tried LeNet-300-100 and LeNet-5 on MNIST, AlexNet on ImageNet, and VGG-16 on ImageNet.

沒有留言:

張貼留言