A fully curated, expanded reference list. Two important corrections to keep in mind.
| Resource | What It Is | Why It's Great |
|---|---|---|
| OSSU Computer Science | Full, free CS degree roadmap | Structured degree-equivalent path using free university courses |
| roadmap.sh | Curated skill maps for ML, Python, backend, frontend | Interactive, community-maintained; great for orientation |
| Developer Roadmap | Practical skill maps and study plans | Most-starred roadmap repo on GitHub |
| AI-ML Roadmap from Scratch | 0-to-100 roadmap covering ML, DL, GenAI, NLP, RL | One of the most comprehensive 2025–26 community roadmaps |
| Microsoft generative-ai-for-beginners | 21-lesson course by Microsoft | Covers prompt engineering, RAG, agents, deployment — free and hands-on |
| online-ml-university | FREE ML/DS/CS courses from top universities | MIT, Stanford, CMU, Google — all free, organized by topic |
| Resource | What It Is | Notes |
|---|---|---|
| arXiv.org | Preprint server — fastest way to follow new research | Filter by cs.LG, cs.AI, cs.CL for ML/LLM papers |
| Papers with Code | Papers + reproducible code + SOTA leaderboards | Best place to find implementations of research you've read |
| Distill.pub | Interactive ML explainers — archive only | Last updated 2021 — still excellent for foundational ML concepts |
| Semantic Scholar | AI-powered academic search | Better than Google Scholar for finding related papers and citation graphs |
| Connected Papers | Visual paper graph explorer | Maps how papers relate — great for literature reviews |
| Hugging Face Blog | Latest model releases and research | Where Llama 4, Qwen3, DeepSeek V3 releases are first documented |
| Anthropic Research Blog | Claude safety and mechanistic interpretability research | Best source for alignment and interpretability papers |
| Google DeepMind Research | Gemini architecture, AlphaFold, Gemma papers | Official primary source for Google's AI research |
| Course | What It Covers | Link |
|---|---|---|
| Stanford CS231n | Convolutional networks — lecture notes and assignments | cs231n.stanford.edu |
| Stanford CS224N | NLP with Deep Learning — transformers, LLMs from scratch | cs224n.stanford.edu |
| MIT 6.S191 | Introduction to Deep Learning — updated annually | introtodeeplearning.com |
| fast.ai | Practical deep learning — top-down approach, real projects first | fast.ai — permanently free |
| CS50 AI (Harvard) | Search, knowledge, uncertainty, neural nets, NLP | cs50.harvard.edu/ai — free audit |
| CS50 Python (Harvard) | Python fundamentals from scratch — best beginner course | cs50.harvard.edu/python |
| online-ml-university | Aggregated MIT, Stanford, CMU, Google courses — all free | github.com/azminewasi/online-ml-university |
| Book | What It Covers |
|---|---|
| Deep Learning (Goodfellow et al.) | Canonical ML textbook — free at deeplearningbook.org |
| Dive into Deep Learning (d2l.ai) | Interactive textbook with runnable code — updated to include transformers and LLMs |
| Probabilistic ML (Kevin Murphy) | Advanced probabilistic framework for ML — free PDF at probml.github.io |
| The Little Book of Deep Learning | 200-page PDF by François Fleuret — excellent concise reference |
| Free Programming Books (EbookFoundation) | 500+ free textbooks and guides in every language and topic |
| ML Cheatsheet | Compact math and algorithm reference — ml-cheatsheet.readthedocs.io |
| Platform | What You Get |
|---|---|
| Hugging Face Datasets | 100,000+ datasets with one-line Python loading |
| Kaggle Datasets | 50,000+ datasets, competitions, and community notebooks |
| AWS Open Data Registry | Massive public datasets (satellite imagery, genomics, climate) free on S3 |
| Google Dataset Search | Meta-search across 25+ million datasets from any domain |
| Papers with Code Datasets | Datasets tied directly to benchmark tasks and SOTA results |
| Common Crawl | Petabyte-scale web crawl data — what most LLMs are trained on |
| OpenML | 20,000+ datasets with experiment tracking and benchmarking |
| Framework | Best For | 2026 Status |
|---|---|---|
| Hugging Face Transformers | Loading, fine-tuning, deploying any open model | De facto standard |
| LangChain | Rapid prototyping, broad ecosystem, agent workflows | Most stars — use langchain-ai/langchain |
| LlamaIndex | RAG-heavy apps, document indexing, knowledge bases | 20–30% faster for retrieval |
| Haystack (deepset) | Production RAG, pipeline auditability, observability | Best for regulated/enterprise environments |
| DSPy | Optimizing prompts programmatically | New paradigm — LLM pipelines as trainable programs |
| CrewAI | Multi-agent orchestration | Rapidly growing for agent teams |
| AutoGen (Microsoft) | Multi-agent conversations and agentic workflows | Best for complex multi-step agent systems |
| FastMCP | Build MCP servers in Python | @mcp.tool() decorator — minimal code |
| Tool | Best For | Notes |
|---|---|---|
| Chroma | Local RAG for student projects | Easiest local vector DB; zero config; Python-native |
| Qdrant | Production-grade | Free tier; great Rust performance |
| Pinecone | Managed vector DB | Free starter tier; most popular in production |
| Weaviate | Multimodal vector DB | Open-source; self-hostable |
| FAISS (Facebook) | CPU-efficient similarity search | The classic; no cloud needed |
| Paper | arXiv ID | Why It Matters |
|---|---|---|
| Attention Is All You Need (2017) | 1706.03762 |
The original transformer paper — read this first |
| BERT (2018) | 1810.04805 |
Bidirectional transformers for NLP |
| GPT-3 (2020) | 2005.14165 |
Few-shot learners — the paper that changed everything |
| InstructGPT (2022) | 2203.02155 |
RLHF — explains how ChatGPT was trained |
| Chain-of-Thought Prompting (2022) | 2201.11903 |
Shows how "think step by step" emerges from scale |
| DeepSeek-R1 (2025) | 2501.12948 |
How reasoning/thinking models are trained |