Low Rank Adaptation (LoRA) is a method for efficiently fine-tuning Large Language Models by reducing trainable parameters and memory usage without sacrificing performance.
LoRA and QLoRA are two methods for fine-tuning of LLMs for different tasks. They both use low-rank adapters, which are small matrices that can be added or removed to the base model without changing its original weights.
However, QLoRA also uses quantization, which is a technique to reduce the precision of the weights and save more memory and computation.
data:image/s3,"s3://crabby-images/f468e/f468e815a5bcc63d1bd7736594a8d9ba6617c71e" alt=""