Machine learning · Module 12
Reinforcement learning
Learn to act by trial and error, guided by reward.
Agent → takes action → environment responds → agent receives reward → agent updates policy.
Loop millions of times. The policy is the learned strategy.
Game-playing
AlphaGo, chess engines, StarCraft AIs.
Robotics
Walking, grasping, dexterous manipulation.
Trading
Allocating capital across actions under uncertainty (Egbert’s domain).
LLM fine-tuning
RLHF — reward signal comes from human preference judgements.
Reinforcement learning loop demo
A local toy loop that logs action, environment response, reward, and policy update steps named in this module.
const steps = ['action', 'environment response', 'reward', 'policy update'];
for (const step of steps) {
console.log('reinforcement loop:', step);
}Reinforcement learning check
0 of 1 questions completed locally.
Reader progress is stored locally in this browser.
Source slide 13