Appearance
None of these are demo toys. Every repo listed here appeared on the GitHub global trending list in the last 30 days, and has picked up verified production usage in ML engineering teams. None are empty landing page repos. All have working code, documentation, and active issue trackers.
Nobody is building general purpose agents anymore. The best new tools solve one very specific, annoying problem that engineers actually have every week. Every repo this month fits this pattern. No vague "AGI assistant" claims. Every one does one thing well, and gets out of your way.
Pair programming: Aider
Aider is no longer the underdog. It is now the most widely adopted open source AI coding assistant by a very wide margin.
This is not a VS Code extension that suggests individual lines. Aider maps your entire repository, understands cross file dependencies, makes working changes, runs your linter and tests, and commits properly isolated changes. It will refactor 12 files across 3 directories without you hand holding it through every step.
It works with every major LLM. At time of writing it performs best with Claude 3.7 Sonnet, but works perfectly well with local models run through Ollama. It runs entirely from your terminal, and can be attached to any editor via comment triggers.
Most teams that try Aider stop paying for Github Copilot within two weeks. That is not marketing, that is consistent feedback across every engineering circle I am part of. The only common complaint is that it will occasionally make overly bold changes. That is a feature, not a bug. You have git. Revert.
LLM application layer: Onyx
If you have ever tried to host your own internal ChatGPT clone, you have fought with 12 half broken open source UIs that miss half the features you actually need.
Onyx solves this. It is a drop in self hosted LLM frontend that has every capability out of the box: agentic RAG, web search, sandboxed code execution, file uploads, voice chat and image generation. It supports every LLM provider, local or remote. You can deploy the full stack with one curl command. The lite mode runs on any machine with 1GB of ram.
This repo went from zero to 7k stars in 3 weeks. That is not an accident. Every team that wants to run internal LLM access without sending all their code to OpenAI was waiting for exactly this.
Role specific agents: Anthropic knowledge work plugins
Anthropic open sourced this collection two weeks ago and it immediately changed how teams are building internal Claude integrations.
This is not a set of demo plugins. These are the exact role configurations Anthropic uses internally for sales, product, support, legal and engineering teams. Each one bundles pre-written system prompts, allowed tools, connector configuration and slash commands.
You do not have to spend 3 months iterating on system prompts to get Claude to write good product specs. Someone already did that work. You can take the base product manager plugin, swap in your internal tools, and have something usable the same day.
This is the first major vendor admitting that 90% of the value of an enterprise LLM deployment is just good, boring prompt engineering that someone already wrote.
Web scraping: Scrapling
Every ML engineer spends at least one week a year fighting web scrapers. Scrapling is the first new scraping framework in 5 years that actually moves the state of the art forward.
It has two killer features. First it bypasses Cloudflare Turnstile out of the box, no extra configuration. Second its parser learns element signatures. If you select an element by CSS selector once, Scrapling will automatically find that same element even after the website completely redesigns their DOM structure.
You can write a scraper once, and it will keep working for years. That alone makes this worth switching to. It also has native pause/resume, proxy rotation and concurrent crawling. All of it works in less than 10 lines of Python.
Financial ML: Kronos
General purpose time series models are garbage for financial data. Everyone working in quant knows this, but nobody had built a proper open source foundation model for candlestick data until now.
Kronos is a decoder only transformer pre-trained on 10 years of K-line data from 45 global exchanges. It was accepted to AAAI 2026 last month.
Models range from 4M to 500M parameters. The 102M base model will run on a laptop CPU and outperforms every general purpose TSFM on financial forecasting tasks by between 18% and 32% across every public benchmark.
This is not a trading bot. It is a base model you fine tune for your own signals. If you do any work in quantitative finance you should pull this down this week.
Multi agent simulation: MiroFish
MiroFish is the weird one on this list, and the one everyone is quietly experimenting with after work.
You feed it raw seed material: news articles, policy drafts, meeting transcripts, even book chapters. It spins up thousands of independent agents with distinct personalities and memory, then runs a full simulation of how that system will evolve. You can inject variables at any point and observe outcome trajectories.
Right now people are using this for everything from testing PR crisis response to simulating the ending of unfinished novels. There are no published benchmarks yet, but the demo results are unsettlingly good. This is the only repo on this list that feels like something genuinely new.
Utility repos
Two more repos that do not fit clean categories but every developer should have bookmarked.
First free-llm-api-resources. This is a maintained list of legitimate, non-abused free LLM API endpoints. It currently lists 27 models including Llama 3.3 70B, DeepSeek V4 and Gemma 4, all available without a credit card. Respect the rate limits. These will go away if people abuse them.
Second OBLITERATUS. This is the working implementation of abliteration for removing refusal layers from LLMs. No fine tuning required. It works on every modern open model, preserves 98% of base capability, and runs in 2 minutes on consumer hardware. It also aggregates anonymous run data to improve the technique. There are obvious ethical tradeoffs here. This is currently the subject of extremely active debate inside every major LLM lab.
Closing observation
All of these tools have one thing in common. None of them are trying to replace you. They all automate one specific, boring, repetitive part of your job that nobody liked doing anyway.
That is the pattern that is actually working right now. All the grand agent demos died. The tools that stuck are the ones that solve one specific problem well, and get out of your way.
Nobody wants an AI that thinks for them. Everyone wants an AI that handles the garbage work so they can do the actual thinking.