Understanding Machine Learning: From Theory to Algorithms

cs.huji.ac.il

129 points by Anon84 4 hours ago

Anyone who wants to demystify ML should read: The StatQuest Illustrated Guide to Machine Learning [0] By Josh Starmer. To this day I haven't found a teacher who could express complex ideas as clearly and concisely as Starmer does. It's written in an almost children's book like format that is very easy to read and understand. He also just published a book on NN that is just as good. Highly recommend even if you are already an expert as it will give you great ways to teach and communicate complex ideas in ML.

[0]: https://www.goodreads.com/book/show/75622146-the-statquest-i...

joshdavham an hour ago

I haven't read that book, but I can personally attest to Josh Starmer's StatQuest Youtube channel[1] being awesome! I used his lessons as a supplement to my studies when I was studying statistics in uni.
[1]: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw
- gavinray an hour ago
  
  This is the 2nd or 3rd time in the last few weeks I've seen this person recommended. Must be something to that.
  
  j_bum 30 minutes ago
  
  He’s great. I learned a ton from him when I was starting my computational biology studies as a grad student.
kenjackson 38 minutes ago

I would've thought that NN and ML would be taught together. Does he assume with the NN book that you already have a certain level of ML understanding?
- m11a 23 minutes ago
  
  Most ML is disjoint from the current NN trends, IMO. Compare Bishop's PRML to his Deep Learning textbook. First couple chapters are copy+paste preliminaries (probability, statistics, Gaussians, other maths background), and then they completely diverge. I'm not sure how useful classical ML is for understanding NNs.
grep_it 19 minutes ago

Thanks for the recommendation. Purchased both them!
RealityVoid an hour ago

I have it in my bookshelf! I bought it on a whim, used, along with other CS books, but didn't think it's that good! I will try reading it. Thanks.

johnsutor an hour ago

https://bloomberg.github.io/foml/#home This course is my personal favorite.

pajamasam 2 hours ago

I would recommend https://udlbook.github.io/udlbook/ instead if you're looking to learn about modern generative AI.

miltava an hour ago

Thanks for the recommendation. Have you looked at Bishop’s Deep learning book (https://www.bishopbook.com/)? How would you compare both? Thanks again
- m11a 18 minutes ago
  
  You'll be happy with either. Bishop's approach is historically more mathematical (cf his 2006 PRML text), and you see that in the preliminaries chapters of Deep Learning, but there's less of this as the book goes on.
  I've read chapters from both. Much overlaps, but sometimes one book or the other explains a concept better or provides different perspectives or details.
smath 2 hours ago

+1 for Simon prince’s UDL book. Very clearly written

janis1234 2 hours ago

Book is 10 years old, isn't it outdated?

nyrikki an hour ago

Even Russel and Norvig is still applicable for the fundamentals, and with the rise of agenic efforts would be extremely helpful.
The updates to even the Bias/Variance Dilemma (Geman 1992) are minor if you look at the original paper:
https://www.dam.brown.edu/people/documents/bias-variance.pdf
They were dealing with small datasets or infinite datasets, and double decent only really works when the patterns in your test set are similar enough to those in your training set.
While you do need to be mindful about some of the the older opinions, the fundamentals are the same.
For fine tuning or RL, the same problems with small datasets or infinite datasets, where concept classes for training data may be novel, that 1992 paper still applies and will bite you if you assume it is universally invalid.
Most of the foundational concepts are from the mid 20th century.
The availability of mass amounts of data and new discoveries have modified the assumptions and tooling way more than invalidating previous research. Skim that paper and you will see they simply dismissed the mass data and compute we have today as impractical at the time.
Find the book that works best for you, learn the concepts and build tacit experience.
Lots of efforts are trying to incorporate symbolic and other methods too.
IMHO Building breadth and depth is what will save time and help you find opportunities, knowledge of the fundamentals is critical for that.
0cf8612b2e1e 2 hours ago

Have not read the book, but only deep learning has had such wild advancement that a decade would change anything. The fundamentals of ML training/testing, variance/bias, etc are the same. The classical algorithms still have their place. The only modern advancement which might not be present would be XGBoost style forests.
- TechDebtDevin an hour ago
  
  Machine Learning concepts have been around forever, they just used to call them statistics ;0
janalsncm an hour ago

Depends on what your goal is. If you’re just curious about ML, probably none of the info will be wrong. But it’s also really not engaging with the most interesting problems engineers are tackling today, unlike an 11 year old chemistry book for example (I think). So as interview material or to break into the field it’s not going to be the most useful.
cubefox an hour ago

I have read parts of it. It arguably was already "outdated" back then, as it mostly focused on abstract mathematical theory of questionable value instead of cutting edge "deep learning".
- mikedelfino 17 minutes ago
  
  Any recommendations?
antegamisou 2 hours ago

Nope, and AIMA/PRML/ESL are still king!
Apart from these 3 you literally need nothing else for the very fundamentals and even advanced topics.

joshdavham an hour ago

What other books do people recommend?

cubefox 3 hours ago

I have read parts of it years ago. As far as I remember, this is very theoretical (lots of statistical learning theory, including some IMHO mistaken treatment of Vapnik's theory of structural risk minimization), with strong focus on theory and basicasically zero focus on applications. Which would be completely outdated by now anyway, as the book is from 2014, an eternity in AI.

I don't think many people will want to read it today. As far as I know, mathematical theories like SLT have been of little use for the invention of transformers or for explaining why neural networks don't overfit despite large VC dimension.

Edit: I think the title "From theory to machine learning" sums up what was wrong with this theory-first approach. Basically, people with interest in math but with no interest in software engineering got interested in ML and invented various abstract "learning theories", e.g. statistical learning theory (SLT). Which had very little to do with what you can do in practice. Meanwhile, engineers ignored those theories and got their hands dirty on actual neural network implementations while trying to figure out how their performance can be improved, which led to things like CNNs and later transformers.

I remember Vapnik (the V in VC dimension) complaining in the preface to one of his books about the prevalent (alleged) extremism of focussing on practice only while ignoring all those beautiful math theories. As far as I know, it has now turned out that these theories just were far too weak to explain the actual complexity of approaches that do work in practice. It has clearly turned out that machine learning is a branch of engineering, not a branch of mathematics or theoretical computer science.

The title of this book encapsulates the mistaken hope that first people will learn those abstract learning theories, they get inspired, and promptly invent new algorithms. But that's not what happened. SLT is barely able to model supervised learning, let alone reinforcement learning or self-supervised learning. As I mentioned, they can't even explain why neural networks are robust to overfitting. Other learning theories (like computational/algorithmic learning theory, or fantasy stuff like Solomonoff induction / Kolmogorov complexity) are even more detached from reality.

lamename 2 hours ago

I watched a discussion the other day on this "NNs don't overfit point". I realize yes certain aspects are surprising, and in many cases with the right size and diversity in a dataset scaling laws prevail, but my experience with real datasets training from scratch (not fine tuning pretrained models), and impression has always been that NNs definitely can overfit if you don't have large quantities of data. My gut assumption is that original theories were not demonstrated to be true in certain circumstances (i.e. certain dataset characteristics), but that's never mentioned in shorthand these days when data sets size is often assumed to be huge.
(Before anyone laughs this off, this is still an actual problem in the real world for non-FAANG companies who have niche problems or cannot use open-but-non-commercial datasets. Not everything can be solved with foundational/frontier models.)
Please point me to these papers because I'm still learning.
- cubefox 2 hours ago
  
  Yes they can overfit. SLT assumed that this is caused by large VC dimension. Which apparently isn't true because there exist various techniques/hacks which effectively combat overfitting while not actually reducing the very large VC dimension of those neural networks. Basically, the theory predicts they always overfit, while in reality they mostly work surprisingly well. That's often the case in ML engineering: people discover things work well and others don't, while not being exactly sure why. The famous Chinchilla scaling law was an empirical discovery, not a theoretical prediction, because theories like SLT are far too weak to make interesting predictions like that. Engineering is basically decades ahead of those pure-theory learning theories.
  > Please point me to these papers because I'm still learning.
  Not sure which papers you have in mind. To be clear, I'm not an expert, just an interested layman. I just wanted to highlight the stark difference between the apparently failed pure math approach I learned years ago in a college class, and the actual ML papers that are released today, with major practical breakthroughs on a regular basis. Similarly practical papers were always available, just from very different people, e.g. LeCun or people at DeepMind, not from theoretical computer science department people who wrote text books like the one here. Back in the day it wasn't very clear (to me) that those practice guys were really onto something while the theory guys were a dead end.
kadushka an hour ago

Theory is still needed if you want to understand things like variational inference (which is in turn needed to understand things like diffusion models). It’s just like physics - you need math theories to understand things like quantum mechanics, because otherwise it might not make sense.