Machine learning · Module 12

Reinforcement learning

Learn to act by trial and error, guided by reward.

Agent → takes action → environment responds → agent receives reward → agent updates policy.

Loop millions of times. The policy is the learned strategy.

Game-playing

AlphaGo, chess engines, StarCraft AIs.

Robotics

Walking, grasping, dexterous manipulation.

Trading

Allocating capital across actions under uncertainty (Egbert’s domain).

LLM fine-tuning

RLHF — reward signal comes from human preference judgements.

Applied frame

Name the input, target, feedback signal, and evaluation split before choosing a model family.

Diagram cue

The diagram traces data into a learned function and back out through evaluation.

Reinforcement-learning loop from state to action, reward, and policy update — Reinforcement learning improves behavior through feedback from action.

Reinforcement learning animated explainer — Owned Edwy animated explainer for Reinforcement learning.

Reinforcement learning loop demo

A local toy loop that logs action, environment response, reward, and policy update steps named in this module.

const steps = ['action', 'environment response', 'reward', 'policy update'];
for (const step of steps) {
  console.log('reinforcement loop:', step);
}

Run the demo to see sandboxed console output.

Reinforcement learning check

0 of 1 questions completed locally.

1. What guides a reinforcement-learning agent toward better behavior?

A reward signal from the environmentOnly static input-output labelsRandom actions without feedback

Answer feedback appears here.

Reader progress is stored locally in this browser.

Source slide 13