Useful Sites & Repos — The Engineering Student's AI Toolkit

Guides & Roadmaps

Resource	What It Is	Why It's Great
OSSU Computer Science	Full, free CS degree roadmap	Structured degree-equivalent path using free university courses
roadmap.sh	Curated skill maps for ML, Python, backend, frontend	Interactive, community-maintained; great for orientation
Developer Roadmap	Practical skill maps and study plans	Most-starred roadmap repo on GitHub
AI-ML Roadmap from Scratch	0-to-100 roadmap covering ML, DL, GenAI, NLP, RL	One of the most comprehensive 2025–26 community roadmaps
Microsoft generative-ai-for-beginners	21-lesson course by Microsoft	Covers prompt engineering, RAG, agents, deployment — free and hands-on
online-ml-university	FREE ML/DS/CS courses from top universities	MIT, Stanford, CMU, Google — all free, organized by topic

Research Resources

Resource	What It Is	Notes
arXiv.org	Preprint server — fastest way to follow new research	Filter by cs.LG, cs.AI, cs.CL for ML/LLM papers
Papers with Code	Papers + reproducible code + SOTA leaderboards	Best place to find implementations of research you've read
Distill.pub	Interactive ML explainers — archive only	Last updated 2021 — still excellent for foundational ML concepts
Semantic Scholar	AI-powered academic search	Better than Google Scholar for finding related papers and citation graphs
Connected Papers	Visual paper graph explorer	Maps how papers relate — great for literature reviews
Hugging Face Blog	Latest model releases and research	Where Llama 4, Qwen3, DeepSeek V3 releases are first documented
Anthropic Research Blog	Claude safety and mechanistic interpretability research	Best source for alignment and interpretability papers
Google DeepMind Research	Gemini architecture, AlphaFold, Gemma papers	Official primary source for Google's AI research

University Courses (Free)

Course	What It Covers	Link
Stanford CS231n	Convolutional networks — lecture notes and assignments	cs231n.stanford.edu
Stanford CS224N	NLP with Deep Learning — transformers, LLMs from scratch	cs224n.stanford.edu
MIT 6.S191	Introduction to Deep Learning — updated annually	introtodeeplearning.com
fast.ai	Practical deep learning — top-down approach, real projects first	fast.ai — permanently free
CS50 AI (Harvard)	Search, knowledge, uncertainty, neural nets, NLP	cs50.harvard.edu/ai — free audit
CS50 Python (Harvard)	Python fundamentals from scratch — best beginner course	cs50.harvard.edu/python
online-ml-university	Aggregated MIT, Stanford, CMU, Google courses — all free	github.com/azminewasi/online-ml-university

Tutorials & Video Learning

3Blue1Brown (YouTube)

Best visual explanations of linear algebra, calculus, neural networks, transformers

Andrej Karpathy (YouTube)

Build GPT from scratch, makemore series — the clearest LLM coding tutorials available

deeplearning.ai

Andrew Ng's structured specializations + The Batch newsletter for weekly AI news

Yannic Kilcher (YouTube)

Deep paper walkthroughs — best for understanding landmark ML papers

Sentdex (YouTube)

Python and ML tutorials — practical, no fluff

Google Colab

Share and run notebooks — zero setup; great for following tutorials

Books & Reading (Free)

Book	What It Covers
Deep Learning (Goodfellow et al.)	Canonical ML textbook — free at deeplearningbook.org
Dive into Deep Learning (d2l.ai)	Interactive textbook with runnable code — updated to include transformers and LLMs
Probabilistic ML (Kevin Murphy)	Advanced probabilistic framework for ML — free PDF at probml.github.io
The Little Book of Deep Learning	200-page PDF by François Fleuret — excellent concise reference
Free Programming Books (EbookFoundation)	500+ free textbooks and guides in every language and topic
ML Cheatsheet	Compact math and algorithm reference — ml-cheatsheet.readthedocs.io

Datasets

Platform	What You Get
Hugging Face Datasets	100,000+ datasets with one-line Python loading
Kaggle Datasets	50,000+ datasets, competitions, and community notebooks
AWS Open Data Registry	Massive public datasets (satellite imagery, genomics, climate) free on S3
Google Dataset Search	Meta-search across 25+ million datasets from any domain
Papers with Code Datasets	Datasets tied directly to benchmark tasks and SOTA results
Common Crawl	Petabyte-scale web crawl data — what most LLMs are trained on
OpenML	20,000+ datasets with experiment tracking and benchmarking

LLM Frameworks

Framework	Best For	2026 Status
Hugging Face Transformers	Loading, fine-tuning, deploying any open model	De facto standard
LangChain	Rapid prototyping, broad ecosystem, agent workflows	Most stars — use langchain-ai/langchain
LlamaIndex	RAG-heavy apps, document indexing, knowledge bases	20–30% faster for retrieval
Haystack (deepset)	Production RAG, pipeline auditability, observability	Best for regulated/enterprise environments
DSPy	Optimizing prompts programmatically	New paradigm — LLM pipelines as trainable programs
CrewAI	Multi-agent orchestration	Rapidly growing for agent teams
AutoGen (Microsoft)	Multi-agent conversations and agentic workflows	Best for complex multi-step agent systems
FastMCP	Build MCP servers in Python	@mcp.tool() decorator — minimal code

Inference & Deployment

Ollama

Pull and run local GGUF models with one command

llama.cpp

Low-level C++ engine under Ollama and LM Studio

vLLM

High-throughput GPU serving — best for production

TGI (Hugging Face)

Text Generation Inference — production-grade model server

OpenLLM (BentoML)

Deploy any open model with REST/gRPC API in one command

FastAPI

Serving models or wrappers with async Python APIs

Dev Utilities

Gradio

Build a UI for any ML model in 3 lines of Python

Streamlit

Build data apps and chatbot UIs quickly

OpenAI Cookbook

Practical RAG, fine-tuning, and prompt examples

Weights & Biases (wandb)

Experiment tracking, model versioning — free for academic

MLflow

Open-source experiment tracking and model registry

Unsloth

Fine-tune LLMs 2×–5× faster on consumer GPUs with LoRA

Must-Bookmark GitHub Repos

Learning Roadmaps

microsoft/generative-ai-for-beginners — 21-lesson structured GenAI course, #1 for beginners in 2026
microsoft/AI-For-Beginners — 12-week curriculum: symbolic AI, neural nets, CV, NLP, ethics
dair-ai/Prompt-Engineering-Guide — most comprehensive prompt engineering reference
mlabonne/llm-course — LLM course from fundamentals through fine-tuning
aishwaryanr/awesome-generative-ai-guide — paper summaries and code

Transformer Fundamentals

lucidrains/x-transformers — clean, research-grade transformer implementations
karpathy/nanoGPT — train a GPT from scratch in 300 lines
karpathy/llm.c — GPT-2 in pure C — no Python, no PyTorch
huggingface/transformers — de-facto Python library for LLMs
ggerganov/llama.cpp — CPU-friendly GGUF inference

Specialized Learning Resources

Prompt Engineering & NLP

promptingguide.ai — CoT, few-shot, RAG, and more
learnprompting.org — interactive prompt engineering course with exercises
huggingface.co/learn — HF's official NLP, RL, CV courses — all free
cs.d2l.ai — Dive into Deep Learning: interactive textbook

Cybersecurity × AI

OWASP Top 10 for LLMs — official security risks specific to LLM applications
LLM Security (llmsecurity.net) — curated attack vectors, jailbreaks, defenses
PortSwigger Web Security Academy — free interactive labs: SQL injection, XSS, SSRF
HackTricks — comprehensive pentesting playbook

Game Dev × AI

GDQuest (YouTube) — best Godot tutorials; open-source game demos
GameAIPro.com — free online book series on game AI techniques
Sebastian Lague (YouTube) — procedural terrain, chess AI, pathfinding
The Coding Train (YouTube) — creative coding, genetic algorithms in visual demos

Math Foundations

3Blue1Brown: Essence of Linear Algebra — 15-episode series, makes matrix math visual
3Blue1Brown: Neural Networks — "What is a neural network?" — best conceptual intro ever
Khan Academy — free calculus, probability, statistics
betterexplained.com — intuition-first math; great for engineers

Vector Databases & RAG Infrastructure

Tool	Best For	Notes
Chroma	Local RAG for student projects	Easiest local vector DB; zero config; Python-native
Qdrant	Production-grade	Free tier; great Rust performance
Pinecone	Managed vector DB	Free starter tier; most popular in production
Weaviate	Multimodal vector DB	Open-source; self-hostable
FAISS (Facebook)	CPU-efficient similarity search	The classic; no cloud needed

Important Landmark Papers to Read

Paper	arXiv ID	Why It Matters
Attention Is All You Need (2017)	`1706.03762`	The original transformer paper — read this first
BERT (2018)	`1810.04805`	Bidirectional transformers for NLP
GPT-3 (2020)	`2005.14165`	Few-shot learners — the paper that changed everything
InstructGPT (2022)	`2203.02155`	RLHF — explains how ChatGPT was trained
Chain-of-Thought Prompting (2022)	`2201.11903`	Shows how "think step by step" emerges from scale
DeepSeek-R1 (2025)	`2501.12948`	How reasoning/thinking models are trained

Newsletters & Channels

The Batch (deeplearning.ai)

Andrew Ng's weekly AI newsletter — best signal-to-noise in the space

Import AI (Jack Clark)

Weekly research commentary on AI progress

AI Explained (YouTube)

Fast, accurate breakdowns of major AI papers and products

Lex Fridman Podcast

Long-form interviews with AI researchers (Hinton, LeCun, Bengio)

Practical AI Podcast

Applied AI discussions for developers

The Gradient

Long-form research commentary — between pop-science and academic

Guides & parcours d'apprentissage

Ressource	Type	Pourquoi c'est utile
OSSU Computer Science	Parcours de licence CS entière, gratuite	Chemin structuré, niveau licence, 100 % gratuit, basé sur des cours universitaires
roadmap.sh	Cartes de compétences ML, Python, backend, frontend	Cartes interactives, communautaires, idéales pour se repérer
Developer Roadmap	Cartes de compétences et plans d'étude pratiques	Dépôt le plus étoilé GitHub pour les roadmap dev
AI-ML Roadmap from Scratch	Roadmap 0 à 100 : ML, DL, GenAI, NLP, RL	L'une des roadmaps communautaires les plus complètes (2025–2026)
microsoft/generative-ai-for-beginners	21 leçons par Microsoft	Prompting, RAG, agents, déploiement — pratique et gratuit
online-ml-university	Cours ML/DS/CS gratuits de grandes universités	MIT, Stanford, CMU, Google — tous gratuits, rangés par sujet

Ressources de recherche

Ressource	Ce que c'est	Notes
arXiv.org	Serveur de prépublications — façon la plus rapide de suivre la Recherche	Filtre avec cs.LG, cs.AI, cs.CL pour papiers IA/LLM
Papers with Code	Papiers + code + tableau d'État de l'Art	Meilleur endroit pour trouver des implémentations des papiers que tu lis
Distill.pub	Expliqueurs ML interactifs — archivé	Mise à jour 2021 — mais excellent pour bases ML
Semantic Scholar	Moteur de recherche académique IA-powered	Meilleur que Google Scholar pour trouver des papiers connexes et graphes de citations
Connected Papers	Explorateur graphique de papiers	Montre comment les papiers s'articulent — idéal pour revues de littérature
Blog Hugging Face	Nouveaux modèles et recherches	Llama 4, Qwen3, DeepSeek V3 sont souvent documentés ici en premier
Blog recherche Anthropic	Sécurité Claude + interprétabilité mécanistique	Meilleure source pour papiers sur alignement et interprétabilité
Google DeepMind Research	Gemini, AlphaFold, Gemma, etc.	Source officielle de la recherche IA de Google

Cours universitaires (gratuits)

Cours	Ce qu'il couvre	Lien
Stanford CS231n	Réseaux convolutionnels — notes + TP	cs231n.stanford.edu
Stanford CS224N	NLP avec Deep Learning — transformers, LLMs from scratch	cs224n.stanford.edu
MIT 6.S191	Introduction au Deep Learning — mis à jour chaque année	introtodeeplearning.com
fast.ai	Deep learning pratique — approche "top-down"	fast.ai — perpétuellement gratuit
CS50 AI (Harvard)	Recherche, connaissances, incertitude, réseaux, NLP	cs50.harvard.edu/ai — audit gratuit
CS50 Python (Harvard)	Fondamentaux Python — meilleur cours débutant	cs50.harvard.edu/python
online-ml-university	Agrégation de cours MIT, Stanford, CMU, Google	github.com/azminewasi/online-ml-university

Tutos & vidéos

3Blue1Brown (YouTube)

Les meilleures explications visuelles d'algèbre linéaire, calcul, réseaux, transformers.

Andrej Karpathy (YouTube)

"Build GPT from scratch", "makemore" — les tuto LLM les plus clairs existants.

deeplearning.ai

Spécialisations structurées d'Andrew Ng + newsletter The Batch pour l'actualité IA.

Yannic Kilcher (YouTube)

Décryptage de papiers ML — excellent pour comprendre les travaux de référence.

Sentdex (YouTube)

Tutoriels Python et ML — pratique, sans fioritures.

Google Colab

Partage et exécution de notebooks sans installation — parfait pour suivre des tutos.

Livres & lecture (gratuits)

Livre	Ce qu'il couvre
Deep Learning (Goodfellow et al.)	Textbook de référence ML — gratuit sur deeplearningbook.org
Dive into Deep Learning	Textbook interactif avec code exécutable (d2l.ai) — inclut transformers et LLMs
Probabilistic ML (Kevin Murphy)	Framework probabiliste avancé — PDF gratuit sur probml.github.io
The Little Book of Deep Learning	200 pages par François Fleuret — synthèse concise de DL
Free Programming Books (EbookFoundation)	500+ livres gratuits sur tous les langages et sujets
ML Cheatsheet	Référence compacte maths et algorithmes — ml-cheatsheet.readthedocs.io

Datasets

Plateforme	Ce que tu obtiens
Hugging Face Datasets	100 000+ datasets accessibles en une ligne Python
Kaggle Datasets	50 000+ datasets, compétitions et notebooks communautaires
AWS Open Data Registry	Gros jeux de données publics (imagerie satellitaire, génomique, climat), gratuits sur S3
Google Dataset Search	Moteur de recherche couvrant 25M+ datasets dans tous les domaines
Papers with Code Datasets	Datasets directement liés aux benchmarks et SOTA
Common Crawl	Données de crawl web à l'échelle pétaoctets — base d'entraînement des LLMs
OpenML	20 000+ datasets + suivi d'expérimentations et benchmarks

Frameworks pour LLMs

Framework	Idéal pour	Statut 2026
Hugging Face Transformers	Charger, finetuner, déployer tout modèle open	Standard de facto
LangChain	Prototypage rapide, écosystème large, workflows agentiques	Le plus stars — utilise langchain-ai/langchain
LlamaIndex	Applications fortes RAG, indexation de documents, bases de connaissances	20–30 % plus rapide sur la récupération
Haystack (deepset)	RAG en production, auditabilité, observabilité	Idéal pour environnements réglementés / entreprise
DSPy	Optimiser prompts et pipelines LLM de façon programmable	Nouveau paradigme — des LLM comme des programmes entraînables
CrewAI	Orchestration multi-agents	Croissance rapide pour équipes d'agents
AutoGen (Microsoft)	Conversations multi-agents et workflows agentiques	Parfait pour systèmes multi-étapes complexes
FastMCP	Construire des serveurs MCP en Python	@mcp.tool() — code minimal

Inférence & déploiement

Ollama

Télécharge et exécute des modèles GGUF en une commande

llama.cpp

Moteur C++ bas-niveau derrière Ollama et LM Studio

vLLM

Inférence GPU haute débit — idéal pour la production

TGI (Hugging Face)

Text Generation Inference — serveur modèle prêt-production

OpenLLM (BentoML)

Déploie n'importe quel modèle open avec API REST/GRPC en une commande

FastAPI

Servir des modèles ou wrappers avec API asynchrones Python

Utilitaires de dev

Gradio

3 lignes Python pour UI sur n'importe quel modèle ML

Streamlit

Apps data et chat UI rapidement

OpenAI Cookbook

Exemples pratiques : RAG, finetuning, prompts

Weights & Biases (wandb)

Suivi d'expériences, versionnement de modèles — gratuit pour l'académique

MLflow

Suivi d'expériences + registry de modèles, open source

Unsloth

Finetuning LLM 2×–5× plus rapide sur GPU grand public avec LoRA — indispensable

Repos GitHub à bookmarker (2026)

Roadmaps d'apprentissage

microsoft/generative-ai-for-beginners — 21 leçons, cours structuré GenAI — le n°1 pour débutants en 2026.
microsoft/AI-For-Beginners — 12 semaines : IA symbolique, réseaux, CV, NLP, éthique.
dair-ai/Prompt-Engineering-Guide — La référence la plus complète sur le prompt engineering.
mlabonne/llm-course — Cours LLM, des fondamentaux au finetuning.
aishwaryanr/awesome-generative-ai-guide — Résumés de papiers et liens de code à ne pas rater.

Fondamentaux des Transformers

lucidrains/x-transformers — Implémentation de transformers propres, niveau recherche.
karpathy/nanoGPT — Entraîner un GPT de zéro en 300 lignes.
karpathy/llm.c — GPT-2 pur C — sans Python, sans PyTorch.
huggingface/transformers — Bibliothèque Python de facto pour LLM.
ggerganov/llama.cpp — Inférence GGUF ultra-légère, CPU-friendly.

Ressources spécialisées

Prompt Engineering & NLP

promptingguide.ai — CoT, few-shot, RAG, etc.
learnprompting.org — cours interactif + exercices.
huggingface.co/learn — cursus NLP, RL, CV gratuits.
cs.d2l.ai — "Dive into Deep Learning" en version interactive.

Cybersécurité × IA

OWASP Top 10 for LLMs — risques de sécurité spécifiques aux apps LLM.
LLM Security (llmsecurity.net) — vecteurs d'attaque, jailbreaks, défenses.
PortSwigger Web Security Academy — labs interactifs gratuits (SQLi, XSS, SSRF).
HackTricks — playbooks de pentesting complets.

Game dev × IA

GDQuest (YouTube) — meilleurs tutos Godot, demos open source.
GameAIPro.com — livres gratuits en ligne sur IA de jeu.
Sebastian Lague (YouTube) — terrain procédural, IA d'échecs, pathfinding.
The Coding Train (YouTube) — creative coding, algorithmes génétiques.

Fondations mathématiques

3Blue1Brown : Essence of Linear Algebra — 15 épisodes, maths matricielles visuelles.
3Blue1Brown : Neural Networks — "What is a neural network?" — intro très claire.
Khan Academy — calcul, proba, stats gratuits.
betterexplained.com — maths intuitives, parfait pour ingénieurs.

Vector Databases & infra RAG

Outil	Idéal pour	Notes
Chroma	RAG local pour projets étudiants	DB vectorielle simple, 0 config, Python-native
Qdrant	Production	Free tier ; très performant, en Rust
Pinecone	DB vectorielle gérée	Starter tier gratuit ; très populaire en production
Weaviate	DB vectorielle multimodale	Open source, auto-hébergable
FAISS (Facebook)	Recherche de similarité CPU-efficace	Le classique, pas besoin de cloud

Papiers de référence à lire

Papier	arXiv ID	Impact
Attention Is All You Need (2017)	`1706.03762`	Article original des transformers — à lire en premier
BERT (2018)	`1810.04805`	Transformers bidirectionnels pour NLP
GPT-3 (2020)	`2005.14165`	Apprentissage few-shot — le papier qui a tout changé
InstructGPT (2022)	`2203.02155`	RLHF — explique comment ChatGPT a été entraîné
Chain-of-Thought Prompting (2022)	`2201.11903`	Comment "think step by step" émerge à grande échelle
DeepSeek-R1 (2025)	`2501.12948`	Comment entraîner des modèles de raisonnement / pensée

Newsletters & chaînes

The Batch (deeplearning.ai)

Newsletter hebdo d'Andrew Ng — meilleur ratio signal/bruit en IA.

Import AI (Jack Clark)

Commentary hebdo sur l'avancement de la Recherche IA.

AI Explained (YouTube)

Découpages rapides et précis de gros papiers et produits IA.

Lex Fridman Podcast

Entretiens long-format avec Hinton, LeCun, Bengio, etc.

Practical AI Podcast

Discussions IA appliquée, orientées développeurs.

The Gradient

Article de fond entre vulgarisation et contenu académique.

Useful Sites &
GitHub Repos You Should Bookmark

Sites utiles &
dépôts GitHub à bookmarker

Guides & Roadmaps

Research Resources

University Courses (Free)

Tutorials & Video Learning

Books & Reading (Free)

Datasets

LLM Frameworks

Inference & Deployment

Dev Utilities

Must-Bookmark GitHub Repos

Specialized Learning Resources

Vector Databases & RAG Infrastructure

Important Landmark Papers to Read

Newsletters & Channels

Guides & parcours d'apprentissage

Ressources de recherche

Cours universitaires (gratuits)

Tutos & vidéos

Livres & lecture (gratuits)

Datasets

Frameworks pour LLMs

Inférence & déploiement

Utilitaires de dev

Repos GitHub à bookmarker (2026)

Ressources spécialisées

Vector Databases & infra RAG

Papiers de référence à lire

Newsletters & chaînes

Useful Sites &GitHub Repos You Should Bookmark

Sites utiles &dépôts GitHub à bookmarker

Guides & Roadmaps

Research Resources

University Courses (Free)

Tutorials & Video Learning

Books & Reading (Free)

Datasets

LLM Frameworks

Inference & Deployment

Dev Utilities

Must-Bookmark GitHub Repos

Specialized Learning Resources

Vector Databases & RAG Infrastructure

Important Landmark Papers to Read

Newsletters & Channels

Guides & parcours d'apprentissage

Ressources de recherche

Cours universitaires (gratuits)

Tutos & vidéos

Livres & lecture (gratuits)

Datasets

Frameworks pour LLMs

Inférence & déploiement

Utilitaires de dev

Repos GitHub à bookmarker (2026)

Ressources spécialisées

Vector Databases & infra RAG

Papiers de référence à lire

Newsletters & chaînes

Useful Sites &
GitHub Repos You Should Bookmark

Sites utiles &
dépôts GitHub à bookmarker