Low Rank Adaptation (LoRA) is a method for efficiently fine-tuning Large Language Models by reducing trainable parameters and memory usage without sacrificing performance.
LoRA and QLoRA are two methods for fine-tuning of LLMs for different tasks. They both use low-rank adapters, which are small matrices that can be added or removed to the base model without changing its original weights.
However, QLoRA also uses quantization, which is a technique to reduce the precision of the weights and save more memory and computation.
