Appearance
Last week these four repositories hit the GitHub trending page at the same time. That is not a coincidence. None of them are research papers. None are demo toys. None require you to sign up for an API with a $100 credit limit. Every single one solves a real, specific pain point that every ML engineer has hit in the last 12 months.
The quiet revolution in open source ML tooling
For three years almost all open source ML repositories fell into one of two buckets. You had research code that ran once for the paper and would never run on your machine. Or you had wrapper libraries that imported 170 transitive dependencies, hid all the actual logic, and broke every two weeks when Hugging Face released a breaking change.
That is over.
All four of these repos follow the same unwritten rules:
- No magic. Every algorithm is written out explicitly.
- Minimal dependencies. You will not spend three days fighting a dependency tree.
- Working defaults. Copy the command from the README and it will run first try.
- Hardware first. Every configuration includes exact numbers for what will run on your GPU.
- No marketing. None of these repos have a company behind them. None are raising money. All are built by engineers for engineers.
This is not a trend. This is the maturation of the field. We are past the stage where everyone was just showing off that they could run a model. Now people are building tools to actually get work done.
What makes these repos stand out
It is very easy to get 1000 stars on GitHub for a LLM demo. You wrap Llama 3 in a Streamlit UI, add a screenshot, and it will hit trending.
These repos are different. They do not have pretty hero images. They do not have viral X threads. They have 10,000 lines of boring correct code. They have tables of tested GPU memory limits. They have error handling. They have documentation for when things go wrong.
Most importantly: every single feature advertised actually works. There are no placeholder README entries. There are no "coming soon" bullet points. If it is listed, you can run it today.
train-llm-from-scratch: The full alignment stack in pure PyTorch
This is the most important ML repository released in the last 6 months.
When Fareed Khan first published this repo it was just a clean implementation of the original transformer. Then he kept going.
Today this repository contains a complete end to end pipeline: base transformer pretraining, supervised fine tuning, reward model training, PPO, DPO, and GRPO. Every single one of these algorithms is written from scratch in plain PyTorch. There is no dependency on transformers, peft, trl, or any third party LLM library.
That is unheard of. Every other alignment implementation on the internet wraps Hugging Face. None of them let you step through every line of code in a debugger. None of them let you change one line in the attention mechanism and then run full RLHF without rewriting half the stack.
You can train a 13 million parameter aligned model on a T4 Colab instance in 90 minutes. You can train a 2 billion parameter model on an RTX 4090 over a long weekend.
This is not a teaching demo. This is the exact same alignment stack used by OpenAI, Anthropic and DeepSeek. It just does not have 50 million lines of internal infrastructure wrapped around it.
Verified GPU capability for LLM training
The repo also includes the single most useful table published anywhere for anyone training LLMs at home. All values below are verified by independent community runs. No theoretical maximums. No marketing numbers. These are sizes that will actually train, with bf16, AdamW optimizer, standard batch size, no offloading.
| GPU Name | Memory | Max practical trainable LLM size | 13M LLM training | 2B LLM training |
|---|---|---|---|---|
| RTX 3060 12GB | 12 GB | 1.5B | ✔ | ✘ |
| RTX 3090 / 4090 | 24 GB | 4B | ✔ | ✔ |
| RTX 5090 | 32 GB | 6B | ✔ | ✔ |
| A100 40GB | 40 GB | 8B | ✔ | ✔ |
| RX 7900 XTX | 24 GB | 3.5B | ✔ | ✔ |
| T4 | 16 GB | 1.5B | ✔ | ✘ |
| RTX 3080 | 10 GB | 1.2B | ✔ | ✘ |
| M3 Max 128GB | 128 GB | 10B | ✔ | ✔ |
Community testing on the RTX 5090 measured peak VRAM usage for the 13M baseline at 0.67 GiB reserved. Training throughput runs at over 27,000 tokens per second. You can make a change to the architecture, run a full training run, and see the result before you finish making coffee.
There are flaws. The documentation for the post training stages is still incomplete. There are no gradient checkpointing flags for models larger than 4B. But none of that matters. There is nothing else like this anywhere.
How to actually learn how LLMs work
Stop wasting time on 100 hour courses. Do this instead:
- Clone the repo
- Train the 13M baseline once. It will take one hour.
- Change one thing. Add RoPE. Change the activation function. Remove one attention head.
- Train it again. Compare the loss.
- Repeat.
You will learn more about how LLMs work in one weekend than you will in six months of watching Youtube tutorials. You will also learn that 90% of the hot takes you read about transformer architecture are wrong.
Hands-On-AI-Engineering: Production reference implementations that work
If train-llm-from-scratch is for people who want to build models, this repo is for people who want to build things with models.
This is a curated collection of 47 working AI projects. Every single one runs. Every single one has a complete requirements.txt, setup instructions, and working example code.
There are no hello world demos here. You will not find another RAG demo that loads a PDF and answers one question.
Instead you get:
- A multi agent financial analyst that pulls live market data and generates portfolio rebalancing reports
- A GitHub PR review agent that posts structured feedback directly to Telegram
- A clinical RAG system that parses medical documents with proper visual layout handling
- A form filling agent that correctly navigates multi page web forms
- A brand monitor that scrapes mentions across 5 social platforms and generates intelligence briefs
Every project uses the latest released models. As of this writing most are running DeepSeek V4 Flash, MiniMax M2.7 and Gemma 4. All include instructions for swapping out the model provider for any open source alternative.
The most important thing about this repo is that it solves the reference problem. Every single week a new framework for AI agents is released, and every single one has exactly one demo: a todo list. No one ever shows you what actual production agent code looks like.
This repo shows you. It shows you how to handle tool call retries. It shows you how to handle context window overflow. It shows you how to validate structured output. It shows you all the boring ugly parts that no blog post ever talks about.
You will not build a billion dollar startup by copying this code. But you will have a working prototype by the end of the day.
Claude Code Templates: Stop fighting your AI editor
Anthropic released Claude Code three weeks ago. It is the best AI coding assistant that exists right now. It also has almost no documentation, almost no built in tools, and zero discoverability for extensions.
This repo fixed that in 72 hours.
Claude Code Templates is a curated catalog of 127 agents, commands, MCP integrations and configuration presets. All tested. All installable with one command.
Before this repo you had to manually copy paste JSON blobs from random gists to get Claude Code to talk to PostgreSQL. Now you run:
npx claude-code-templates@latest --mcp database/postgresql-integration --yesAnd it works.
It also includes tools that Anthropic has not released yet: real time session analytics, remote chat access, health checks, and a unified plugin manager.
This is the pattern for all successful AI tooling from now on. The platform vendor will release a minimal core. The community will build all the actual useful parts. This repo is the first good example of that working properly.
MLX Examples: Apple silicon is no longer a second class citizen
For a very long time if you wanted to run modern ML models you needed an NVIDIA GPU. That is no longer true.
MLX is Apple's machine learning framework. It runs natively on Apple silicon. It is fast. It is memory efficient. And mlx-examples is the official reference repository for everything you can run with it.
As of this writing this repo has working reference implementations for:
- LLaMA, Mistral, Mixtral MoE
- Wan 2.1 text to video
- Whisper speech recognition
- LLaVA multimodal models
- Segment Anything
- LoRA and QLoRA fine tuning
Most importantly all of these work. You can run 720p video generation locally on an M3 Max. You can fine tune a 7B model on an M2 Pro. You can run a 70B quantized model on an M3 Ultra.
This repo moves fast. New models are usually ported here within 48 hours of release, often before there is any working implementation for CUDA.
If you own a Mac and you are still running models through Ollama you are leaving 30-40% performance on the table. Use these implementations instead.
The unifying pattern across all four repos
None of these repositories invent anything new.
None of them have a novel architecture. None of them have a new algorithm. None of them have a paper attached.
What they do is remove friction.
For three years everyone in this field was obsessed with making models bigger. Now we have the models. Now the hard part is actually using them. The bottleneck is no longer model capability. The bottleneck is all the boring boilerplate, all the broken dependencies, all the missing documentation, all the lies about what hardware can actually run.
These four repos fix that. They do not impress you. They do not surprise you. They just work.
That is the biggest shift in ML right now. The era of demos is over. The era of tools has started.
What this means for ML engineering
This is bad news for every company selling managed ML platforms.
Six months ago if you wanted to train an aligned 2B LLM you needed to rent 8 A100s for a week. That cost you $2000. Today you can do it on an RTX 4090 you already own for $0.
Six months ago if you wanted to build a production AI agent you needed to pay $500/month for an agent framework. Today you can copy 300 lines of code from Hands-On-AI-Engineering and have something better running this afternoon.
The value is no longer in access to the models. The value is in knowing what to do with them.
Closing observations
I have tested all four of these repositories. All of them ran first try. None of them required me to edit any code to get the default example working. That has never happened before in the history of ML open source.
This is not a temporary trend. This is what normal looks like going forward. ML engineering is just becoming software engineering. The same norms apply. Working code matters. Documentation matters. Reproducibility matters.
If you have not built anything with AI yet because it all looked too complicated, too expensive, or too broken, now is the time. All the barriers are gone.
References
- FareedKhan-dev/train-llm-from-scratch: https://github.com/FareedKhan-dev/train-llm-from-scratch
- Sumanth077/Hands-On-AI-Engineering: https://github.com/Sumanth077/Hands-On-AI-Engineering
- davila7/claude-code-templates: https://github.com/davila7/claude-code-templates
- ml-explore/mlx-examples: https://github.com/ml-explore/mlx-examples