Title: Sequence to Sequence – Video to Text
Author: Subhashini Venugopalan, et al.
Novelties:
This paper performs a approach for video captioning in a simple LSTM model.
Contributions:
The LSTM model can fit with different input and output size.
Its structure is very simple, but it performs well.
Technical Summarizes:
LSTM is a famous method for sequence to sequence task such as speech recognition, and translation.
They tried LSTM on video captioning task which can be seemed as a sequence to sequence task, too.
Their architecture, S2Vt is as follows:
They used a two-layer LSTM with encoding and decoding:
The encoding stage use RGB frames extracted from fc7 and optical flow features from fc6 as input, and concatenate the output with padding to become the input data for the decoding stage. The decoding stage starts with <BOS> and terminates with <EOS>.



沒有留言:
張貼留言