Machine Learning
Books, Lecture Notes
- Michael Nielsen: Neural Networks and Deep Learning
- Artificial Intelligence: A Modern Approach, by Russell and Norvig (4th edition, 2020)
- Deep learning theory lecture notes, Matus Telgarsky
- Fundamentals of Machine Learning for Predictive Data Analytics, J.D. Kelleher et al (2020)
- Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, B. Scholkopf, A. Smola (2001)
- Bio-Inspired Artificial Intelligence: Theories, Methods, and Technologies, D. Floreano, C. Mattiussi (2008)
- Deep Learning, I. Goodfellow, Y. Bengio, A. Courville (2016)
- Deep Learning Systems: Algorithms, Compilers, and Processors for Large-Scale Production, Andres Rodriguez (2020)
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, by Aurelien Geron (3rd ed, 2022) (Jupyter notebooks)
- Python Machine Learning Cookbook, Chris Albon (2018)
- Learning Machine Learning, M. Ekman (2021), github
- fast.ai, Deep Learning for Coders with Fastai and PyTorch, J. Howard and S. Gugger (2020)
- Advanced Applied Deep Learning, CNNs and Object Detection, by U. Michelucci
- Python Deep Learning, Exploring deep learning techniques and neural network architectures with PyTorch, Keras, and TensorFlow, I. Vasiliev el al (2019), github
- M. Kochenderfer et al: Algorithms for Decision Making (2022)
- IJCAI keynote talk: Automated Decision Making for Safety Critical Applications (2021)
- A. Zheng, A. Casari: Feature Engineering (2018)
- S. Raschka et al: Machine Learning with PyTorch and Scikit-Learn, github, Andrei’s fork
- Jake VanderPlas: Python Data Science Handbook, colab
Books - ML from the Probabilistic Perspective
- Pattern Recognition and Machine Learning, C. Bishop, pdf
- Kevin P. Murphy
- Machine Learning: A Probabilistic Perspective, Kevin Murphy (2012)
- Probabilistic Machine Learning: An Introduction (2022)
- Probabilistic Machine Learning: Advanced Topics (2023)
- Statistical Learning Theory, Vladimir Vapnik (1998)
- The Nature of Statistical Learning Theory, V. Vapnik(1998)
- Bayesian Networks and Decision Graphs, T.D. Nielsen, F.V. Jensen (2007)
Courses
- P. Abbeel: Foundations of Deep RL in 6 Lectures (2021), slides
- University of Amsterdam: UVA Deep Learning Course, UVA Deep Learning Tutorials
- Yann LeCun
- Deep Learning course at College de France (2016)
- Deep Learning Course at CDS, Andrei’s notes
- CMU
- Advanced NLP 2022
- 10-704: Information Processing and Learning (Spring 2012)
- G. Hinton: Neural Networks for Machine Learning (2012)
- G. Hulten: Machine Learning Course (2021). Andrei’s notes.
- Berkeley
- CS287-FA19: Advanced Robotics (2020), youtube, P. Abbeel
- CS294-158-SP20: Deep Unsuperviser Learning (Spring 2020), youtube (Spring 2019)
- Caltech
- Jeremy Bernstein: Neural Architecture Design (Spring 2021)
- Cornell
- Volodymyr Kuleshov: CS 5787: Applied Machine Learning, github (2020)
- DeepMind: David Silver: Introduction to Reinforcement Learning
- FastAI: Practical Deep Learning for Coders, Jeremy Howard et al.
- Andrej Karpathy: Neural Networks: Zero to Hero
- Hugo Larochelle: math heavy Neural networks class, Université de Sherbrooke]
- MIT
- Oxford
- Deep Learning for Natural Language Processing (2016-2017)
- Sebastian Raschka:
- Introduction to Machine Learning - Tree-based Methods, Model Evaluation, and Feature Selection
- Introduction to Deep Learning, 170 Video Lectures from Adaptive Linear Neurons to Zero-shot Classification with Transformers
- Github: stat453-deep-learning-ss21, deeplearning-models, Andrei’s fork
- Stanford
- CS221: Artificial Intelligence: Principles and Techniques (Autumn 2019), syllabus,video
- CS229 - Machine Learning, Autumn 2018 video , slides, Summer 2021 video, slides, Andrei’s notes
- CS231n: Convolutional Neural Networks for Visual Recognition (Spring 2017), syllabus, 2016 video, 2017 videos, 2017 slides, 2021 slides
- Karpathy’s ConvNetJS CIFAR-10 demo
- CS236: Deep Generative Models (slides only)
- CS224n:Natural Language Processing with Deep Learning Winter 2017 video, Winter 2019 video, Winter 2021 video
- CS224W: Machine Learning with Graphs Spring 2021 video
- C. Huen: A survivor’s guide to AI courses at Stanford (2020)
Conferences
MLSys
Paper Archives
Discord Servers
Events
- In-Person AI Events in Greater Boston #BostonAIevents
- The list is curated by Dan Elton and Paul Baier.
Tools
- Jax
- PyTorch
- scikit-learn
- Spark.ml and Apache Ignite ML
- TensorFlow
- Others: Caffe(Berkeley), Caffe2(Facebook), MXNet(Amazon), CNTK(Microsoft), Paddle(Baidu)
- FBLearner Flow
- Gradio, demos on Colab. Can be embedded in Huggingface.
- MosaicML
GPUs
See GPUs page
Tutorials
- Robbie Allen: Over 200 of the Best Machine Learning, NLP, and Python Tutorials (2018)
Videos
- 3Blue1Brown Series
- sentdex: Deep Learning and Neural Networks with Python and Pytorch
- Welch Labs
- MIT 6.5191 Introduction to Deep Learning (2020)
- Convolutional Neural Networks, Alexander Amini
- Deep Generative Modeling, Ava Soleimany
- Deep Reinforcement Learning, Alexander Amini
- J. Frankle, M. Carbin:The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks (2019)
- G. Hinton: What is wrong with convolutional neural nets ? (2017)
- G. Hinton: Artificial Intelligence: Turning our understanding of the mind right side up (2017)
- Y. LeCun: The Epistemology of Deep Learning, IAS (2019)
- Normalized Nerd
Interviews
- Lex Fridman
- Whisper captions by Andrej Karpathy
- Ian Goodfellow: Generative Adversarial Networks (GANs), Lex Fridman Podcast #19 (2019)
- Yann LeCun: Deep Learning, ConvNets, and Self-Supervised Learning, Lex Firdman Podcast #36 (2019)
- Stephen Wolfram: Cellular Automata, Computation, and Physics, Lex Fridman Podcast #89 (2020)
- David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning, Lex Fridman Podcast #86 (2020)
- Andrew Ng: Deep Learning, Education, and Real-World AI, Lex Fridman Podcast #73 (2020)
- Yann LeCun: Dark Matter of Intelligence and Self-Supervised Learning, Lex Fridman Podcast #258 (2022)
- Demis Hassabis: DeepMind - AI, Superintelligence & the Future of Humanity (2022)
- Chris Lattner: Future of Programming and AI (2023)
- Aravind Srinivas: Perplexity CEO on Future of AI, Search & the Internet (2024)
- Swix: Bringing ML to the data, and Minimum Viable DevRel — Montana Low, PostgresML (2023)
- Elad Gil:
- Fireside Chat: Emad Mostaque, CEO of Stability AI (2022)
- Andrej Karpathy:
- Stephanie Zhan, Sequoia Capital: Making AI accessible with Andrej Karpathy (2024)
Linear Discriminant Analysis (LDA)
- G. Chen: Math 253: Mathematical Methods for Data Visualization: Lec 11, LDA, explains math behind LDA
- B. Ghojogh: Eigenvalue and Generalized Eigenvalue Problems: Tutorial (2022)
Expectation Maximization
- V. Lavrenko: Mixture Models (2014)
- Andrew Ng, Stanford CS229: L14: Expectation-Maximization (2019)
Folding
Distributed Training
- Medium: Training Neural Nets on Larger Batches: Practical Tips for 1-GPU, Multi-GPU & Distributed setups, Thomas Wolf (2018)
- Towards Data Science: Distributed Neural Network Training In Pytorch, Nilesh Vijayrania (2020)
- Serge-Paul Carrasco: Distributed and Declarative Deep Learning Systems (2021)
- Ludwig: A type-based declarative deep learning toolbox
- Horovod
- Docs
- Horovod: Multi-GPU and multi-node data parallelism
- Deep Learning at Scale with Horovod feat. Travis Addair, Stanford MLSys Seminar Episode 10 (2021)
- determined.ai
- DRAGON: A Dynamic Scheduling and Scaling Controller for Managing Distributed Deep Learning Jobs in Kubernetes Cluster, C. Lin et al (2019)
- Analysis and Comparison of Distributed Training Techniques for Deep Neural Networks in a Dynamic Environment, E. Gebremeskel (2018)
- Bringing HPC Techniques to Deep Learning, Andrew Gibiansky (2017)
- Fast Multi-GPU collectives with NCCL, Nathan Luehr, NVidia (2016)
- Fully Sharded Data Parallel: faster AI training with fewer GPUs, M. Ott et al (2021)
Optimization
- CS231n Lecture 7 (2017)
- Practical Recommendations for Gradient-Based Training of Deep Architectures, Y. Bengio (2012)
- On the importance of initialization and momentum in deep learning, I. Sutskever et al (2013)
- Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, Y. Dauphin et al (2014)
- optim.Adam vs optim.SGD. Let’s dive in
- fast.ai: AdamW and Super-convergence is now the fastest way to train neural nets , by S. Gugger and J. Howard (2018)
- S. Raschka L12: Learning rates and advanced optimization algorithms (2020)
- P. Wirth: Which Optimizer should I use for my ML Project? (2020)
Network tuning
- How to Avoid Overfitting in Deep Learning Neural Networks, J. Brownlee (2018)
Autoencoders and Variational Autoencoders
- D.P. Kingma, M. Welling Auto-Encoding Variational Bayes (2015)
- Sebastian Raschka: Introduction to Deep Learning (2021)
- Valerio Velardo: The Sound of AI
- A. Geron: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (Chap 17)
- CS229:
- L20 - Variational Autoencoders (Summer 2019)
- The EM algorithm
- M2L:
- D. Bei et al: Variational Inference: A Review for Statisticians
- London Machine Learning Meetup: Max Welling - Make VAEs Great Again: Unifying VAEs and Flows (2020)
GANs
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, by Aurelien Geron (3rd ed, 2022), Chap 17
- S. Reed et al: Generative Adversarial Text to Image Synthesis (2016)
- T. Karras et al: A Style-Based Generator Architecture for Generative Adversarial Networks (2018)
Diffusion
- Stable Diffusion: DALL-E 2 For Free, For Everyone!
- Tutorial on diffusion (3.5 hrs)
- MIT 6.S192 - Lecture 20: Generative art using diffusion, by Prafulla Dhariwal
- M2L:
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, by Aurelien Geron (3rd ed, 2022), Chap 17
- Tutorial on Denoising Diffusion-based Generative Modeling: Foundations and Applications (3.5 hrs)
- MIT 6.S192 - Lecture 20: Generative art using diffusion, Prafulla Dhariwal, OpenAI
- Stable Diffusion: DALL-E 2 For Free, For Everyone! (2022)
- Huggingface: Stable Diffusion 2.1 Demo
- OpenAI’s A. Nichols et al: Point·E: A System for Generating 3D Point Clouds from Complex Prompts (2022)
- Lil’Log: What are Diffusion Models? (2021)
- M. Welling, Y.W. Teh: Bayesian Learning via Stochastic Gradient Langevin Dynamics (2011). Compared to standard SGD, stochastic gradient Langevin dynamics injects Gaussian noise into the parameter updates to avoid collapses into local minima.
Multimodal AI; Text to image
- Yannic Kilcher: OpenAI CLIP: ConnectingText and Images (Paper Explained) (2022)
- Meta: R. Girdhar et al: ImageBind: a new way to ‘link’ AI across the senses (2023)
- OpenAI: Heewoo Jun, Alex Nichol: Shap-E: Generating Conditional 3D Implicit Functions (2023), openai/shap-e
Convolutional Nets
- Rethinking the Inception Architecture for Computer Vision, C. Szegedy el at (2015)
- Blog: Speeding up Convolutional Neural Networks, Alex Burlacu (2018)
- M. Tang et al: EfficientNetV2: Smaller Models and Faster Training (2021), github
Image style transfer
- Texture Synthesis Using Convolutional Neural Networks, L.A.Gatys et al (2016) code
- Image Style Transfer Using Convolutional Neural Networks, L.A.Gatys et al (2016), torch models by J.C.Johnson
- Perceptual Losses for Real-Time Style Transfer and Super-Resolution, J. Johnson et al (2016)
Geometric Deep Learning
- Michael Bronstein: Geometric Deep Learning: the Erlangen Programme of ML (2021)
- Geometric Deep Learning: Grids, Groups, Graphs,Geodesics, and Gauges, M. Bronstein et al (2021)
- ML Street Talk #60: Geometric Deep Learning Blueprint (2021)
- Max Welling
- ML Street Talk #36: Max Welling: Quantum, Manifolds & Symmetries in ML (2021)
- IAS Seminar on Theoretical ML: Graph Nets: The Next Generation (2020)
- Machine Learning Street Talk #75: Emergence with Danielle Grattarola (2022)
Categorical Deep Learning
- Bruno Gavranovic et al: Categorical Deep Learning: An Algebraic Theory of Architectures (2024)
Reinforcement Learning
- R. Sutton, A. Barto: Reinforcement Learning, second edition: An Introduction (2018)
- OpenAI Spinning Up
Chat Agents
- L. Ouyang el al: Training language models to follow instructions with human feedback (2022)
- TalkRL: John Schulman interview (2022)
AGI
- AIXI
- Marcus Hutter: Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability (2005)
- Shane Legg: Machine Super Intelligence (2008)
- AI Channel: DeepMind’s Shane Legg - Machine Super Intelligence (2009)
- J. Bach: When Artificial Intelligence Becomes General Enough to Understand Itself. Commentary on Pei Wang’s paper “On Defining Artificial Intelligence” (2020)
- A. Franz et al: A theory of incremental compression (2020)
- Y. LeCun: A Path Towards Autonomous Machine Intelligence, draft (2022), tweet
- Silver, Singh, Sutton: Reward is enough (2021)
- D. Ha, J. Schmidthuber: World Models (2018)
Causality
- Causality for Machine Learning, Bernhard Schölkopf (2019)
- Y. Bengio talk: Deep Learning Cognition (2020)
Natural Language Processing (NLP)
- torchtext Release Notes, examples
- Tutorial: Migrate torchtext from the legacy API to the new API
- J. Geiping, T. Goldstein: Cramming: Training a Language Model on a Single GPU in One Day (2022), github, review by Lucas Beyer
RNN
- Towards Data Science: Animated RNN, LSTM and GRU, by R. Karim (2018)
- Towards Data Science: Counting No. of Parameters in Deep Learning Models by Hand, by R. Karim (2019)
Attention, Transformers
- MIT 6.S191: Recurrent Neural Networks and Transformers (2022)
- Leo Dirac: LSTM is dead. Long Live Transformers! (2019)
- Sebastian Raschka: L19.5.1 The Transformer Architecture
- Towards Data Science: How to code The Transformer in Pytorch, by S. Lynn-Evans (2018)
- Lucas Beyer: Transformers, Mediterranean ML Summer School 2022 seminar
- Lil’Log: Large Transformer Model Inference Optimization (2023)
- Papers
- J. von Oswald et al: Transformers learn in-context by gradient descent (2022)
- R. Pope et al: Efficiently scaling transformer inference (2022)
Language Models
Transformers and Lidars
Energy Based Models
- A Tutorial on Energy-Based Learning, Y. LeCun et al (2006)
- Y. LeCun: Energy-Based Self-Supervised Learning, IPAM (2019), slides
- The Physics of Energy-Based Models, P. Huembeli et al (2021)
- A. Dawid et al: Modern applications of machine learning in quantum sciences (2022)
- M.A. Carreira-Perpinan, G.E. Hinton: On Contrastive Divergence Learning (2005)
- B.A. Cipra: An Introduction to the Ising Model (1987), AMM Monthly. Finally I can understand what the Ising Model is about.
- E. Aurell, M. Ekberg: Inverse Ising inference using all the data (2012)
Dataset Pruning
- Surya Ganguli: Statistical mechanics of neural networks (2022), 2nd part
- Jonathan Frankle: Jonathan Frankle: Neural Network Pruning and Training (2023)
Contrastive Learning
- G. Hinton: The Forward-Forward Algorithm: Some Preliminary Investigations, talk (2022)
News Ranking
- X. Ni et al: Prioritizing Original News on Facebook (2021)
Identifying Harmful Content
- Facebook: Harmful content can evolve quickly. Our new AI system adapts to tackle it. (2021)
- S. Wang et al: Entailment as Few-Shot Learner (2021)
- Facebook: How AI is getting better at detecting hate speech (2020)
Large Model Training
- Lil’Log: How to Train Really Large Models on Many GPUs? (2021)
- Lilian Weng, Greg Brockman: Techniques for Training Large Neural Networks (2022)
- S. Li et al: PyTorch Distributed: Experiences on Accelerating Data Parallel Training (2020)
Articles
- How neural networks learn from experience, G. Hinton (1992)
- Neural networks and physical systems with emergent collective computational abilities, J. J. Hopfield (1982)
- Reducing the Dimensionality of Data with Neural Networks, G. E. Hinton and R. R. Salakhutdinov (2006), using Bolzmann machines to initialize weights close to a good solution
- ImageNet Classification with Deep Convolutional Neural Networks, Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton (2012), describes the AlexNet Conv network.
- Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, C. Finn, P. Abbeel, S. Levine (2017)
- S. Khodadadeh: Model Agnostic Meta Learning (2018)
- Deep Learning Explainer: Toward Efficient Learning: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (2020)
- Meta-Learning with Implicit Gradients, A. Rajeswaran et al (2019), video
- The Mechanics of n-Player Differentiable Games, D. Balduzzi et al (2018)
- Simple, Distributed, and AcceleratedProbabilistic Programming, D. Tran et al (2018)
- Machine Theory of Mind, N.C. Rabinowitz et al (2018)
- Recent Advances in Deep Learning for Object Detection, X. Wu (2019)
- Online Bayesian Goal Inference for Boundedly-Rational Planning Agents, T. Zhi-Xuan et al (2020)
- Open Problems in Cooperative AI, A. Dafoe et al (2020)
- Rethinking the maturity of artificial intelligence in safety-critical settings, M.L. Cummings, 2019
- Sendtex tutorials at https://pythonprogramming.net
- Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis, Bel-Nun, Hoefler (2018), video
- Medium: RegNet or How to methodologically design effective networks, Chris Ha (2000)
- Hands-on Bayesian Neural Networks - a Tutorial for Deep Learning Users, L. V. Jospin et al (2021)
- R. Ghugare et al: Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective (2022), talk, code
- J. Tenenbaum et al: 3DP3: 3D Scene Perception via Probabilistic Programming (2021)
- Why do tree-based models still outperform deep learning on tabular data?, Léo Grinsztajn et al (2022)
- M. Richardson, P. Domingos: Building Large Knowledge Bases by Mass Collaboration (2023)
- S. Raschka: Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning (2020)
Posts
- Colah’s blog
- distill.pub
- MuZero and The Evolution of AlphaGo to MuZero
- McGill COMP-424 Intro to AI Lecture Notes (Doina Precup, 2013), Lecture 16, Why does a finite MDP optimal policy exist?
- Open AI: [Safety Gym] (https://openai.com/blog/safety-gym/) (2019)
- OpenAI Baselines, a set of high-quality implementations of reinforcement learning algorithms
- ConvnetJS demo: Toy 2d classification with 2-layer neural network, A. Karpathy
- Adrian Rosebrock tutorials
- The Important Definitions of Rafael Padilla repo is a very good introduction to the relationship between IOU, precision/recall, PR curve, average precision
- Ben Dickson: The challenges of applied machine learning (2021)
- Jonathan Hui: How to start a Deep Learning project? (2018)
- Georgii Evtushenko: Multi-GPU Programming
- Mihail Eric: MLOps Is a Mess But That’s to be Expected (2022)
- Matt Turk: Red Hot: The 2021 Machine Learning, AI and Data (MAD) Landscape
- A. Kumar, I. Kostrikov, S. Levine: Should I Use Offline RL or Imitation Learning? (2022)
- Giuliano Giacaglia: How Transformers Work (2019)
- Sebastian Raschka Ahead of AI (2022)
- Simon Willison Weblog: Large language models are having their Stable Diffusion moment
- Google leak: “We have no moat, and neither does OpenAI” (2023)
- There’s an AI for that
- Derrick Harris, Matt Bornstein, Guido Appenzeller: The AI Canon (2023)
- NeurIPS 2023
- A Guide to NeurIPS 2023 — 7 Research Areas & 10 Spotlight Papers to See, blog post
- Tim Dettmers et al: QLoRA: Efficient Finetuning of Quantized LLMs
- R. Rafailov et al: Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- S. Malladi et al: Fine-Tuning Language Models with Just Forward Passes, ML in 2 summary
- Niklas Muennighoff et al: Scaling Data-Constrained Language Models
- Kingma and Gao: Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation
Web sites
MLOps for LNP
- D. Sculley et al: Hidden Technical Debt in Machine Learning Systems (2015)
- Berkeley: Full Stack Deep Learning: Lecture 6: MLOps Infrastructure & Tooling
- P. Barham, A. Chowdhery, J. Dean et al: Pathways: Asynchronous Distributed Dataflow for ML (2022)
- std::bodun::blog: Pathways: Google’s New ML System (2022)
- NVIDIA NeMo Megatron
- Microsoft DeepSpeed
- Natan Benaich: State of AI 2024
People
Other
- Artificial Intelligence
- Autonomous Agents
- Cloud Data Platform
- Cognitive Science
- Computation Theory
- Computer Vision
- Document Classification
- Finance
- GPUs
- Information Theory
- Language Models
- Machine Learning
- Meta Learning
- MLOps
- Probabilities and Statistics
- Robotics
- Self Driving Cars
- Computational Topology