The attention layer is used to compute a context vector using the encoder output for every word of the output sequence.
0 comments