GQA

« Back to Glossary Index

Grouped-Query Attention is a technique used in large language models to improve efficiency and performance, especially as the model size increases.

Analogy: imagine a teacher answering questions:

Standard Attention: The teacher answers every question individually, no matter how similar they are.
Grouped-Query Attention: The teacher groups similar questions and gives one answer for all of them. This saves time and energy.

Related Articles:

Meta Releases Open Source Llama 3.3 70B which outperforms GPT-4o

« Back to Glossary Index

stevenbaert.ai

Author archive Author website

18/12/2024

© 2025 AIworks — Powered by WordPress

Theme by Anders Noren — Up ↑