From 0253857ace6fb111cf12710bd7cb1d34aa6bf22b Mon Sep 17 00:00:00 2001 From: Charles Martin Date: Sun, 25 May 2025 10:57:02 -0700 Subject: [PATCH 1/6] Update README.md added videos, talks, paper links --- README.md | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 0470fa7..11460d1 100644 --- a/README.md +++ b/README.md @@ -545,7 +545,10 @@ WeightWatcher has been featured in top journals like JMLR and Nature: #### Latest papers and talks -- [SETOL: A Semi-Empirical Theory of (Deep) Learning] (in progress) +- [Grokking and Generalization Collapse: Insights from +HTSR theory(available upon request] + +- [SETOL: A Semi-Empirical Theory of (Deep) Learning (draft)] (https://github.com/CalculatedContent/setol_paper/blob/main/setol_draft.pdf) - [Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics](https://arxiv.org/abs/2106.00734) @@ -591,7 +594,11 @@ and has been presented at Stanford, UC Berkeley, KDD, etc: - [KDD 2019 Workshop: Statistical Mechanics Methods for Discovering Knowledge from Production-Scale Neural Networks](https://dl.acm.org/doi/abs/10.1145/3292500.3332294) -- [KDD 2019 Workshop: Slides](https://www.stat.berkeley.edu/~mmahoney/talks/dnn_kdd19_fin.pdf) +- [KDD 2019 Workshop: Slides](https://www.stat.berkeley.edu/~mmahoney/talks/dnn_kdd19_fin.pdf) + +#### NeurIPS 2023 +- [Heavy-Tailed Self-Regularization in Deep Neural Networks](https://neurips.cc/virtual/2023/83033) + @@ -600,7 +607,7 @@ and has been presented at Stanford, UC Berkeley, KDD, etc: WeightWatcher has also been featured at local meetups and many popular podcasts -#### Popular Popdcasts and Blogs +#### Popular Podcasts and Blogs - [This Week in ML](https://twimlai.com/meetups/implicit-self-regularization-in-deep-neural-networks/) @@ -622,12 +629,18 @@ WeightWatcher has also been featured at local meetups and many popular podcasts - [Latest Results](https://www.youtube.com/watch?v=rojbXvK9mJg) + + + #### 2021 Short Presentations - [MLC Research Jam March 2021](presentations/ww_5min_talk.pdf) - [PyTorch2021 Poster April 2021](presentations/pytorch2021_poster.pdf) +#### TEDx Talk +- [The Emergence of Signatures of Artificial General Intelligence ](https://www.youtube.com/watch?v=5dBEzqTlq-Y) + #### Recent talk(s) by Mike Mahoney, UC Berekely - [IARAI, the Institute for Advanced Research in Artificial Intelligence](https://www.youtube.com/watch?v=Pirni67ZmRQ) From 21338e519ba1a72d5e5ea20160aa4f9595cdd705 Mon Sep 17 00:00:00 2001 From: Charles Martin Date: Sun, 25 May 2025 11:04:40 -0700 Subject: [PATCH 2/6] Update README.md updated vido list --- README.md | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 11460d1..01e3a2a 100644 --- a/README.md +++ b/README.md @@ -543,10 +543,10 @@ This tool is based on state-of-the-art research done in collaboration with UC Be WeightWatcher has been featured in top journals like JMLR and Nature: -#### Latest papers and talks + +### Latest papers and talks -- [Grokking and Generalization Collapse: Insights from -HTSR theory(available upon request] +- [Grokking and Generalization Collapse: Insights from HTSR theory (available upon request)] - [SETOL: A Semi-Empirical Theory of (Deep) Learning (draft)] (https://github.com/CalculatedContent/setol_paper/blob/main/setol_draft.pdf) @@ -625,9 +625,18 @@ WeightWatcher has also been featured at local meetups and many popular podcasts - [Applied AI Community](https://www.youtube.com/watch?v=xLZOf2IDLkc&feature=youtu.be) +- [UCL Financial Computing (2022)](https://www.youtube.com/watch?v=sOXROWJ70Pg) + - [Practical AI](https://changelog.com/practicalai/194) -- [Latest Results](https://www.youtube.com/watch?v=rojbXvK9mJg) +- [AI Nation 2023](https://www.youtube.com/watch?v=rojbXvK9mJg) + +- [ICCF 2024](https://youtu.be/_c0-_ru0sZc) + +- [Data Science at Home (2025)](https://www.youtube.com/watch?v=iv7Pv3StHms) + +- [Cohere for AI 2025](https://www.youtube.com/watch?v=NXqO4nDNIwo) + From 940ffea78c5a337d4e99ac7f1af4dd7541fee1f1 Mon Sep 17 00:00:00 2001 From: Charles Martin Date: Sun, 25 May 2025 11:07:36 -0700 Subject: [PATCH 3/6] Update README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 01e3a2a..c8a5c71 100644 --- a/README.md +++ b/README.md @@ -637,7 +637,9 @@ WeightWatcher has also been featured at local meetups and many popular podcasts - [Cohere for AI 2025](https://www.youtube.com/watch?v=NXqO4nDNIwo) +- [The FreeStyle Podcast](https://www.youtube.com/watch?v=hb0YrwQ3K2Q) + and many more From 6944a413ea1a7e0d4a06423db1f20139d432ca8c Mon Sep 17 00:00:00 2001 From: Charles Martin Date: Sun, 25 May 2025 11:09:13 -0700 Subject: [PATCH 4/6] Update README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index c8a5c71..3244ee0 100644 --- a/README.md +++ b/README.md @@ -550,6 +550,8 @@ WeightWatcher has been featured in top journals like JMLR and Nature: - [SETOL: A Semi-Empirical Theory of (Deep) Learning (draft)] (https://github.com/CalculatedContent/setol_paper/blob/main/setol_draft.pdf) +- [Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training (NeurIPS 2023 Spotlight Paper)(https://arxiv.org/abs/2312.00359) + - [Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics](https://arxiv.org/abs/2106.00734) - [Evaluating natural language processing models with robust generalization metrics that do not need access to any training or testing data](https://arxiv.org/abs/2202.02842) From d6fd015dc5ff00e95d5b7612333dfab15966ad0d Mon Sep 17 00:00:00 2001 From: Charles Martin Date: Wed, 4 Jun 2025 21:06:26 -0700 Subject: [PATCH 5/6] Update README.md --- README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 3244ee0..3e43be5 100644 --- a/README.md +++ b/README.md @@ -439,7 +439,7 @@ details = watcher.distances(initial_model, trained_model) --- -#### compatability with version 0.2.x +#### Compatibility with version 0.2.x The new 0.4.x version of WeightWatcher treats each layer as a single, unified set of eigenvalues. In contrast, the 0.2.x versions split the Conv2D layers into n slices, one for each receptive field. @@ -641,6 +641,9 @@ WeightWatcher has also been featured at local meetups and many popular podcasts - [The FreeStyle Podcast](https://www.youtube.com/watch?v=hb0YrwQ3K2Q) +- [This Week in ML AI Podcast](https://twimlai.com/podcast/twimlai/grokking-generalization-collapse-and-the-dynamics-of-training-deep-neural-networks/) + + and many more From 8f4186e0670a54961318b546f0c31b4befd6a388 Mon Sep 17 00:00:00 2001 From: RichardScottOZ Date: Thu, 6 Nov 2025 06:37:16 +1030 Subject: [PATCH 6/6] Fix typo in PEFT / LORA models section Corrected a typo in the README regarding the BA low rank matrix. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 3e43be5..457a947 100644 --- a/README.md +++ b/README.md @@ -108,7 +108,7 @@ watcher.distances(model_1, model_2) ## PEFT / LORA models (experimental) To analyze an PEFT / LORA fine-tuned model, specify the peft option. - - peft = True: Forms the BA low rank matric and analyzes the delta layers, with 'lora_BA" tag in name + - peft = True: Forms the BA low rank matrix and analyzes the delta layers, with 'lora_BA" tag in name ```details = watcher.analyze(peft='peft_only')```