« Back to Glossary Index

Grouped-Query Attention is a technique used in large language models to improve efficiency and performance, especially as the model size increases.

Analogy: imagine a teacher answering questions:

  • Standard Attention: The teacher answers every question individually, no matter how similar they are.
  • Grouped-Query Attention: The teacher groups similar questions and gives one answer for all of them. This saves time and energy.
« Back to Glossary Index