Attention
1. Self Attention

1.1 Vector perspective:
The following shows how to calculate , given . For , it is similar.



1.2 Matrix perspective: (parallel computing)
The following shows how to calculate , given .




2. Multi-Head Self-Attention



Last updated