Language Models

Full Stack LLM Bootcamp (2023)
- Charles Frye: Learn to Spell: Prompt Engineering
  - D. Dohan et al: The Language Model Cascades (2022)
  - The primary goal of prompting is subtractive; it focuses the mass of predictions to hone in on a specific world by conditioning the probabilistic model.
- Josh Tobin:
  - Prompt Engineering
  - LLMOps
- Harrison Chase: Agents
Maxime Labonne: Large Language Model Course, blog
- Inspired by DevOps-Roadmap
- Based on gists from younesbelkada
NYU CSCI 2590
- Hyung Won Chung: Instruction finetuning and RLHF lecture (2023)

Cameron Wolfe: Q-Learning for LLMs, twitter post (2024)

E. Hu et al: LoRA: Low-Rank Adaptation of Large Language Models, 2021, github
- LoRA freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture
- Greatly reducing the number of trainable parameters for downstream tasks
- LoRA makes training more efficient and lowers the hardware barrier to entry by up to 3 times when using adaptive optimizers
- The simple linear design allows us to merge the trainable matrices with the frozen weights when deployed, introducing no inference latency compared to a fully fine-tuned model, by construction.
- Should work with dense layers
- Benefits
  - The most significant benefit comes from the reduction in memory and storage usage. For a large Transformer trained with Adam, we reduce that VRAM usage by up to 2/3 if r « dmodel as we do not need to store the optimizer states for the frozen parameters. On GPT-3 175B, we reduce the VRAM consumption during training from 1.2TB to 350GB. With r = 4 and only the query and value projection matrices being adapted, the checkpoint size is reduced by roughly 10,000× (from 350GB to 35MB)4. This allows us to train with significantly fewer GPUs and avoid I/O bottlenecks. Another benefit is that we can switch between tasks while deployed at a much lower cost by only swapping the LoRA weights as opposed to all the parameters. This allows for the creation of many customized models that can be swapped in and out on the fly on machines that store the pre-trained weights in VRAM. We also observe a 25% speedup during training on GPT-3 175B compared to full fine-tuning5 as we do not need to calculate the gradient for the vast majority of the parameters
C. Wu et al: PMC-LLaMA: Further Finetuning LLaMA on Medical Papers (2023)
aituts.com: Yubin: How to run Meta’s LLaMA on your computer (Windows, Linux tutorial) (Mar 2023)

Langchain, see Langchain
LLamaHub, github
Chromadb, see Chromadb
Pinecone
Replit
Langview demo
Mosaic - open source LLM infrastructure for enterprise market.
- BioMedLM: a Domain-Specific Large Language Model for Biomedical Text (2022)
  - Data stored in S3
  - Training on cheaper cloud infrastructure than EC2
  - Data loading with mosaicml/streaming
OpenAI, see OpenAI
Unstructured
- How Unstructured Unlocked 100k+ Pages of IRS Manuals (2023)
Summarization
- WordTune
llama2-chatbot github by Matt Bornstein et al
SciSpace Copilot chrome extension
Ollama
Prompt Engineering
- Langsmith
- Humanloop
- Trubrics
- Agenta, open source

huggingface/peft, supports LORA, Prefix-Tuning, P-Tuning, Prompt Tuning, AdaLoRA. Runs on consumer hardware.
- Mark Tenenholtz: tweet: Everyone can fine-tune LLMs on a single GPU.
Databricks:
- Sean Owen: Fine-Tuning Large Language Models with Hugging Face and DeepSpeed (2023)
Maxime Labonne: Fine-Tune Your Own Llama 2 Model in a Colab Notebook (2023)

Courses

Articles

Economic Impact:

Conferences

News

Language Models

Prompt Engineering

Compound LLMs

RL for LLMs

Explainability

Evaluation

Tech stack

Datasets

Derivatives of LLaMA

Text Classification

Named Entity Recognition (NER)

Code Generation

Courses

Posts

Talks

Tools

Models

Scale of Models

Fine Tuning

RLHF, DPO

Evaluation

Open Source Movement

Companies

Medtech

Sales, Marketing apps

Law

Other