Hsuan-ting Chen AMMAI Note: [ammai] Sequence to Sequence

Title: Sequence to Sequence – Video to Text

Author: Subhashini Venugopalan, et al.

Novelties:

This paper performs a approach for video captioning in a simple LSTM model.

Contributions:

The LSTM model can fit with different input and output size.

Its structure is very simple, but it performs well.

Technical Summarizes:

LSTM is a famous method for sequence to sequence task such as speech recognition, and translation.

They tried LSTM on video captioning task which can be seemed as a sequence to sequence task, too.

Their architecture, S2Vt is as follows:

They used a two-layer LSTM with encoding and decoding:

The encoding stage use RGB frames extracted from fc7 and optical flow features from fc6 as input, and concatenate the output with padding to become the input data for the decoding stage. The decoding stage starts with <BOS> and terminates with <EOS>.

Experiments:

The experiments is on MSVD (Microsoft Video Description Corpus), MPII-MD (MPII Movie Description Corpus), and M-VAD (Montreal Video Annotation Dataset), evaluated by METEOR (Metric for Evaluation of Translation with Explicit Ordering).

Hsuan-ting Chen AMMAI Note

2016年6月2日星期四

[ammai] Sequence to Sequence – Video to Text

Title: Sequence to Sequence – Video to Text

Author: Subhashini Venugopalan, et al.

Novelties:

Contributions:

Technical Summarizes:

Experiments:

沒有留言:

張貼留言

2016年6月2日 星期四

[ammai] Sequence to Sequence – Video to Text

Title: Sequence to Sequence – Video to Text

Author: Subhashini Venugopalan, et al.

Novelties:

Contributions:

Technical Summarizes:

Experiments:

沒有留言:

張貼留言

2016年6月2日星期四