What is Reinforcement learning from human feedback (RLHF)?

Reinforcement learning from human feedback (RLHF) is a way for computers to learn by getting advice from people, just like how you might ask your mom what to do when you're not sure.

Imagine you’re playing a video game where you have to collect coins and avoid monsters. At first, you don’t know the best way to win. So every time you play, you ask your mom for tips, like “should I go left or right?” She gives you advice based on how well you did. Over time, you start to figure out the best path by using her feedback.

RLHF works the same way: a computer tries different actions and gets points (or rewards) for doing well. But instead of just getting points, it also asks humans like your mom for advice, that’s the feedback. The more advice it gets, the better it becomes at making smart choices on its own.

How It Helps Computers Learn

Think of RLHF as having a teacher who gives you hints when you're solving a puzzle. At first, you need lots of hints to get it right. But with each hint, you start to understand the pattern and can solve the puzzle faster, just like how computers learn from people!

Take the quiz →

Examples

  1. A child learns to ride a bike by watching their parent and getting advice when they fall.
  2. A dog learns tricks by getting treats when it does what the owner wants.
  3. A robot learns to walk by trying different steps and being told if it's right.

Ask a question

See also

Discussion

Recent activity