Courses

Articles

Economic Impact:

Conferences

News

Language Models

Prompt Engineering

RL for LLMs

Explainability

Evaluation

Tech stack

Datasets

Derivatives of LLaMA

  • E. Hu et al: LoRA: Low-Rank Adaptation of Large Language Models, 2021, github
    • LoRA freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture
    • Greatly reducing the number of trainable parameters for downstream tasks
    • LoRA makes training more efficient and lowers the hardware barrier to entry by up to 3 times when using adaptive optimizers
    • The simple linear design allows us to merge the trainable matrices with the frozen weights when deployed, introducing no inference latency compared to a fully fine-tuned model, by construction.
    • Should work with dense layers
    • Benefits
      • The most significant benefit comes from the reduction in memory and storage usage. For a large Transformer trained with Adam, we reduce that VRAM usage by up to 2/3 if r « dmodel as we do not need to store the optimizer states for the frozen parameters. On GPT-3 175B, we reduce the VRAM consumption during training from 1.2TB to 350GB. With r = 4 and only the query and value projection matrices being adapted, the checkpoint size is reduced by roughly 10,000× (from 350GB to 35MB)4. This allows us to train with significantly fewer GPUs and avoid I/O bottlenecks. Another benefit is that we can switch between tasks while deployed at a much lower cost by only swapping the LoRA weights as opposed to all the parameters. This allows for the creation of many customized models that can be swapped in and out on the fly on machines that store the pre-trained weights in VRAM. We also observe a 25% speedup during training on GPT-3 175B compared to full fine-tuning5 as we do not need to calculate the gradient for the vast majority of the parameters
  • C. Wu et al: PMC-LLaMA: Further Finetuning LLaMA on Medical Papers (2023)
  • aituts.com: Yubin: How to run Meta’s LLaMA on your computer (Windows, Linux tutorial) (Mar 2023)

Text Classification

Named Entity Recognition (NER)

Code Generation

Courses

Posts

Talks

Tools

Models

Scale of Models

Fine Tuning

Open Source Movement

Companies

Medtech

Sales, Marketing apps

Law

Other