A beginner’ guide on attention-based models can be found here.

For learning on sequential inputs, attention-based methods can allow us to use variable input length.

In short, the idea is that in generating context \(c_i\), decoder RNN pay special attention on some but not all hidden states \(h_j\).

The attention mechanism is various. For example, different probabilities can be assigned to \(\textbf{h}\).

Ref.