Deep Networks have revolutionized computer vision, language technology, robotics and control. They have a growing impact in many other areas of science and engineering, and increasingly, on commerce and society. They do not however, fully follow any currently known compact set of theoretical principles. In Yann Lecun's words they require "an interplay between intuitive insights, theoretical modeling, practical implementations, empirical studies, and scientific analyses." This is a fancy way of saying “we don’t understand this stuff nearly well enough, but we have no choice but to muddle through anyway.” This course attempts to cover that ground and show students how to muddle through even as we aspire to do more. That said, we will be leveraging the substantial, though still tentative, understanding that we have gained in the past few years. It isn't 2015 anymore... We know a lot more than we used to.
Lectures are webcast by the department and recordings will be posted to a youtube playlist. You must be logged into your @berkeley.edu account to access the videos. Lectures can have a substantial amount (if not all) of the content covered on the whiteboard or handwriting on a tablet (not on presentation slides). Whether material is presented on slides or on the board, students are expected to take timely handwritten notes (before the next lecture or discussion occurs) and use those to study.
Because Deep Learning is rapidly evolving field, the material covered in this course can change substantially from semester to semester. Our goal is to bring you as close to the current frontier as we can --- while staying within what we understand most stably so that you have a solid foundation for the future. If interested in materials from previous iterations of this course, please see here: [Sp21], [Fa22], [Sp23], [Sp25].
Do not refer to material from Fall 2024 since that is completely unvetted.W | Date | Lecture Topic | Resources | Discussion Section | Homework |
---|---|---|---|---|---|
0 | Aug 28 | Introduction/Administrivia | No Discussion |
HW0 - Basics Written Code |
|
1 | Sep 2 | Basic Principles | Dis 1: SGD and Visualizing an MLP |
HW1 Written Code |
|
Sep 4 | Optimization: implicit regularization, SGD, and momentum | ||||
2 | Sep 9 | Optimization: Adam, taking a locally linear perspective, and what is a feature anyway | Dis 2: Backprop Review, Local Linearity, and Visualizing the impact of different basic optimizers. |
HW2 Written Code |
|
Sep 11 | Optimizers | ||||
3 | Sep 16 | Optimizers: insights from induced matrix norms | Dis 3: RMS Norm, the locally linear view of optimizers, and how different optimizers converge to different solutions. |
HW3 Written Code |
|
Sep 18 | muP: maximal update parameteriation | ||||
4 | Sep 23 | Optimizers: MuON | Dis 4: muP, Newton-Schulz iterations |
HW4 Written Code |
|
Sep 25 | Conv-nets: basics | ||||
5 | Sep 30 | Data Augmentation, Dropout, and ResNets |
HW5 Written Code |
||
Oct 2 | Fully convolutional nets and U-nets | ||||
6 | Oct 7 | Graph Neural Nets |
HW6 Written Code |
||
Oct 9 | RNNs and self-supervision | ||||
7 | Oct 14 | State-space models |
HW7 Written Code |
||
Oct 16 | State-space models | ||||
8 | Oct 21 | Attention |
HW8 Written Code |
||
Oct 23 | Transformers | ||||
9 | Oct 28 | Transformers |
HW9 Written Code |
||
Oct 30 | Transformers and fine-tuning | ||||
10 | Nov 4 | Prompting and Embeddings |
HW10 Written Code |
||
Nov 6 | PEFT: Soft-prompting and LoRA | ||||
11 | Nov 11 | Veteran's Day Holiday: No class. |
HW11 Written Code |
||
Nov 13 | Meta-learning and Transfer Learning | ||||
12 | Nov 18 | Buffer (for slip) |
HW12 Written Code |
||
Nov 20 | Generative Models | ||||
13 | Nov 25 | Generative models |
HW13 Written Code |
||
Nov 27 | Thanksgiving Holiday: No class | ||||
14 | Dec 2 | Generative Models |
HW14 Written Code |
||
Dec 4 | Generative Models | ||||
15 | Dec 9 | RRR Week | |||
Dec 11 | RRR Week |