Language Models
Courses
- Full Stack LLM Bootcamp (2023)
- Charles Frye: Learn to Spell: Prompt Engineering
- D. Dohan et al: The Language Model Cascades (2022)
- The primary goal of prompting is subtractive; it focuses the mass of predictions to hone in on a specific world by conditioning the probabilistic model.
- Josh Tobin:
- Harrison Chase: Agents
- Charles Frye: Learn to Spell: Prompt Engineering
- Maxime Labonne: Large Language Model Course, blog
- Inspired by DevOps-Roadmap
- Based on gists from younesbelkada
- NYU CSCI 2590
- Hyung Won Chung: Instruction finetuning and RLHF lecture (2023)
Articles
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, J. Devlin et al (2019)
- The Annotated Transformer, A. Rush et al (2018)
- Attention Is All You Need, A. Vaswani et al (2017)
- Doing more with less: meta-reasoning and meta-learning in humans and machines (2023)
- R. Bommasani et al: On the Opportunities and Risks of Foundation Models (2022)
- Doug Lenat: Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc (2023)
- X. Li et al: Self-Alignment with Instruction Backtranslation, Meta (2023)
- Y. Wang et al: SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions (2023)
- H. Touvron et al: Llama 2: Open Foundation and Fine-Tuned Chat Models (2023)
- A.N. Lee et al: Platypus: Quick, Cheap, and Powerful Refinement of LLMs (2023)
- Y. Perlitz et al: Efficient Benchmarking (of Language Models) (2023). Refines HELM benchmark to rule out low quality results early on.
- B. Roziere et al: Code Llama: Open Foundation Models for Code (2023)
- N. Houlsby et al: Parameter-Efficient Transfer Learning for NLP (2023): use adapters for model tuning.
- X.L. Li, P. Liang: Prefix-Tuning: Optimizing Continuous Prompts for Generation (2021)
- B. Lester et al: The Power of Scale for Parameter-Efficient Prompt Tuning (2021), a simplification of the prev article on prefix tuning
- DeepSeek-V3 Technical Report (2024)
- Cameron Wolfe: Scaling Laws for LLMs: From GPT-3 to o3 (2024)
- Nathan Lambert: OpenAI’s o3: The grand finale of AI in 2024 (2024)
- S. Welleck et al: From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models (2024)
Economic Impact:
- McKinsey
Conferences
News
- Belle Lin, WSJ: Companies Weigh Growing Power of Cloud Providers Amid AI Boom (2023)
Language Models
- Training Compute-Optimal Large Language Models, J. Hoffman et al (2022)
- PaLM: Scaling Language Modeling with Pathways, A. Chowdhery et al (2022)
- LLaMA: Open and Efficient Foundation Language Models, G. Lample et al, Meta (2023)
- Code Llama: Open Foundation Models for Code (2023)
- Eugene Yan:
- Rishabh Agarwal et al: Language Modeling Reading List (to Start Your Paper Club) (2024), Google Deep Mind
- Cameron Wolfe: LLaMa-3 “kitchen-sink” approach, tweet (2024)
- Andrej Karpathy: Let’s reproduce GPT-2 (124M) (2024)
- Sebastian Raschka: Build a Large Language Model (From Scratch) (2024)
- Xu Owen He: Mixture of A Million Experts (2024)
Prompt Engineering
- Many-Shot In-Context Learning
- Cameron Wolfe: Modern Advances in Prompt Engineering (2024)
- promptingguide.ai
Compound LLMs
- DSPy
- DSPy docs
- Matt Yates, Sephora, Prompt Engineering is Dead - Build LLM Applications with DSPy Framework (2024)
- L. Chen, B. Hanin et al: Are More LM Calls All You Need? Towards the Scaling Properties of Compound AI Systems (2024)
RL for LLMs
- Cameron Wolfe: Q-Learning for LLMs, twitter post (2024)
Explainability
- OpenAI: S. Bills et al: Language models can explain neurons in language models (2023)
Evaluation
- Eugene Yan: Patterns for Building LLM-based Systems & Products (2023)
- Lianmin Zheng et al: Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena (2023)
- Cameron Wolfe:
- Y. Chang et al: A Survey on Evaluation of Large Language Models (2023)
Tech stack
- Matt Bornstein, Rajko Radovanovic: Emerging Architectures for LLM Applications (2023)
- Andrei’s tweet
- A16z starter kits
- A tool stack for building AI apps with JavaScript, github, [discored]](https://discord.com/invite/PQUmTBTGmT)
- A tool stack for building AI companions
- A chatbot build on Meta’s Llama2 open source model
- OpenAI
- LLama-Factory
- Aishwarya Naresh Reganti LinkedIn post
- LiteLLM
Datasets
Derivatives of LLaMA
- E. Hu et al: LoRA: Low-Rank Adaptation of Large Language Models, 2021, github
- LoRA freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture
- Greatly reducing the number of trainable parameters for downstream tasks
- LoRA makes training more efficient and lowers the hardware barrier to entry by up to 3 times when using adaptive optimizers
- The simple linear design allows us to merge the trainable matrices with the frozen weights when deployed, introducing
no inference latency
compared to a fully fine-tuned model, by construction. - Should work with dense layers
- Benefits
- The most significant benefit comes from the reduction in memory and storage usage. For a large Transformer trained with Adam, we reduce that VRAM usage by up to 2/3 if r « dmodel as we do not need to store the optimizer states for the frozen parameters. On GPT-3 175B, we reduce the VRAM consumption during training from 1.2TB to 350GB. With r = 4 and only the query and value projection matrices being adapted, the checkpoint size is reduced by roughly 10,000× (from 350GB to 35MB)4. This allows us to train with significantly fewer GPUs and avoid I/O bottlenecks. Another benefit is that we can switch between tasks while deployed at a much lower cost by only swapping the LoRA weights as opposed to all the parameters. This allows for the creation of many customized models that can be swapped in and out on the fly on machines that store the pre-trained weights in VRAM. We also observe a 25% speedup during training on GPT-3 175B compared to full fine-tuning5 as we do not need to calculate the gradient for the vast majority of the parameters
- C. Wu et al: PMC-LLaMA: Further Finetuning LLaMA on Medical Papers (2023)
- aituts.com: Yubin: How to run Meta’s LLaMA on your computer (Windows, Linux tutorial) (Mar 2023)
Text Classification
- L. Tunstall, N. Reimers et al: Efficient Few-Shot Learning Without Prompts (2022), youtube
- Phil Schmid: Outperform OpenAI GPT-3 with SetFit for text-classification (2022)
- SetFit: Efficient Few-Shot Learning Without Prompts
- Atharva Ingle, Weights and Biases: SetFit: Efficient Few-Shot Learning Without Prompts
- Nils Rehmers, Cohere: High quality text classification with few training examples with SetFit (2023)
Named Entity Recognition (NER)
- NER extraction using LangChain and LLMs codes explained (2023)
- Patrick Meyer: Entity Recognition with LLM: A Complete Evaluation (2023)
- Entity Extraction: LLMs Versus Classical Neural Model + Live-Updating Knowledge Graph (2023)
- Google Natural Language API demo
- Named Entity Recognition With HuggingFace Using PyTorch and W&B (2023)
- Tirendaz AI: Named Entity Recognition with Hugging Face 🤗 NLP Tutorial For Beginners (2023)
- SpaCy
Code Generation
- Ziyin Zhang et al: Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code (2023), github
Courses
- A. Karpathy: Let’s build GPT: from scratch, in code, spelled out. (2023)
- Github: nanoGPT, Andrei’s fork
- Notebook: gpt_dev.ipynb
- Tokenizers: sentencepiece, tiktoken
Posts
- Veysel Kocaman: Introduction to Spark NLP: Foundations and Basic Components (2019)
- Lilian Weng
- Controllable Neural Text Generation (Jan 2021)
- Prompt Engineering (Mar 2023)
- K. Rink: Leverage LLMs Like GPT to Analyze Your Documents or Transcripts (2023)
- Jay Alammar: The Illustrated Transformer (2019)
- Sebastian Raschka:
- Understanding and Coding Self-Attention, Multi-Head Attention, Cross-Attention, and Causal-Attention in LLMs (2024)
- Understanding Large Language Models (2023). A Cross-Section of the Most Relevant Literature To Get Up to Speed.
- LLM Training: RLHF and Its Alternatives (2023)
- Databricks:
- S. Vivek: Build A Custom AI Based ChatBot Using Langchain, Weviate, and Streamlit github
- Ravi Theja: LlamaIndex: Harnessing the Power of Text2SQL and RAG to Analyze Product Reviews (2023)
- Eugene Yan:
- LlamaIndex: Jerry Liu: 8 key considerations for building production-grade LLM apps over your data
- Ori Eldarov: McKinsey’s Lilli: A Wake-Up Call for AI Startups (2022)
- Rick Lamers:
- ArsTechnica: B Edwards: 10X coders beware: Meta’s new AI model boosts coding and debugging for free (2023)
- Cameron Wolfe
- Project Pro: BERT NLP Model Explained for Complete Beginners
- F. Gichere: Sentiment Analysis of App Reviews: A Comparison of BERT, spaCy, TextBlob, and NLTK (2023)
- B. Etienne: A Complete Guide to Write your own Transformers (2024)
Talks
- Advanced Natural Language Processing with Apache Spark NLP (2021)
- Yannic Kilcher
- GPT-3: Language Models are Few-Shot Learners (Paper Explained)
- PI School: Lukasz Kaiser: Attention is all you need; Attention neural network models (2018)
- John Schulman - Reinforcement Learning from Human Feedback: Progress and Challenges (2023)
- Mapping the future of truly Open Models and Training Dolly for $30 — with Mike Conover of Databricks (2023)
- Latent Space:
- Latent Space Live: Responding to the Leaked Google vs OpenAI strategy memo (2023)
- Training Mosaic’s “llongboi” MPT-7B in 9 days for $200k with an empty logbook, how to prep good data for your training, and the future of open models (2023)
- The Mathematics of Training LLMs — with Quentin Anthony of Eleuther AI
- James Briggs
- Llama2: AI Developer Handbook (2023)
- 4 bit quantization reduces VRAM reqirement 8x
- Llama2 70B model with 4 bit quantization requires 35GB VRAM. Use A100 (AWS p4d) which has 40GB VRAM.
- Llama2 13B with 4 bit quantization requires requires 7GB VRAM. Use T4 which has 16G VRAM.
- Llama2 7B requires 3.5GB VRAM.
- Langchain, Llama2 quantized, Pinecone: Better Llama 2 with Retrieval Augmented Generation (RAG) (2023)
- Llama2: AI Developer Handbook (2023)
- LangChain Talk (2023)
- Yann LeCun: Objective-Driven AI (2023)
- Ilya Sutskever: An observation on Generalization (2023)
- Kamalraj M M:
- Aleksa Gordic: Will LLMs kill Search? Nils Reimers (director of ML at Cohere(2023)
- Connor Shorten: MemGPT Explained! (2023), paper
- Connor Shorten: Charles Packer on MemGPT (2023)
- OpenAI: A Survey of Techniques for Maximizing LLM Performance (2023)
- Qingyun Wu et al: AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation (2023)
Tools
- Langchain, see Langchain
- LLamaHub, github
- Chromadb, see Chromadb
- Pinecone
- Replit
- Langview demo
- Mosaic - open source LLM infrastructure for enterprise market.
- BioMedLM: a Domain-Specific Large Language Model for Biomedical Text (2022)
- Data stored in S3
- Training on cheaper cloud infrastructure than EC2
- Data loading with mosaicml/streaming
- BioMedLM: a Domain-Specific Large Language Model for Biomedical Text (2022)
- OpenAI, see OpenAI
- Unstructured
- Summarization
- llama2-chatbot github by Matt Bornstein et al
- SciSpace Copilot chrome extension
- Ollama
- Prompt Engineering
Models
- OpenAlpaca
- alpaca-lora, huggingface
- Vicuna FastChat
- Chat with Open Large Language Models
- Run Vicuna-13B On Your Local Computer, tutorial (GPU)
- Nischal Harohalli Padmanabha: A step-by-step guide to running Vicuna-13B Large Language Model on your GPU / CPU machine (2023)
- g4dn.4xlarge EC2 instance
- Memory - 64GB
- GPU - Not mandatory, but advised. Tesla T4 - 16GB
- CPU - 16 Core
- Disk space - 200 GB
- g4dn.4xlarge EC2 instance
- Simon Willison’s Weblog:
- ggerganov/llama.cpp
- ggerganov: What is the meaning of hacked?
- gorilla
- Ben Wodecky: Meet Gorilla: The AI Model That Beats GPT-4 at API Calls (2022)
- toolformer
Scale of Models
- Harm de Vries: Go smol or go home, Apr 2023
- Andrej Karpathy tweet: The ‘Chincilla trap’ and the deVries post
- oobabooga/text-generation-webui
- Tim Dettmers tweet: Next week: bitsandbytes 4-bit closed beta that allows you to finetune 30B/65B LLaMA models on a single 24/48 GB GPU (no degradation vs full fine-tuning in 16-bit), May 2023
Fine Tuning
- huggingface/peft, supports LORA, Prefix-Tuning, P-Tuning, Prompt Tuning, AdaLoRA. Runs on consumer hardware.
- Mark Tenenholtz: tweet: Everyone can fine-tune LLMs on a single GPU.
- Databricks:
- Sean Owen: Fine-Tuning Large Language Models with Hugging Face and DeepSpeed (2023)
- Maxime Labonne: Fine-Tune Your Own Llama 2 Model in a Colab Notebook (2023)
RLHF, DPO
- HuggingFace: Reinforcement Learning from Human Feedback: From Zero to chatGPT (2023), blog post, slides
- HuggingFace: Aligning LLMs with Direct Preference Optimization (2024)
Evaluation
- Arthur AI: LLMs for Evaluating LLMs(2024)
Open Source Movement
- Andrej Karpathy tweet: Roughly speaking the story as of now, Apr 2023
Companies
- Hugging Face
- Llamaindes: see LlamaIndex
- Weaviate: see Weaviate Software Stack
Medtech
Sales, Marketing apps
- AI for Salespeople
- Chrystalknows (Chrome extension)