What are q-learning can sometimes overestimate values?

Q-learning is like when you're learning how to choose the best candy from a jar, sometimes you pick one that looks really good, but it turns out not as sweet as you thought.

Q-learning helps a robot or computer learn which choices are the best by trying different options and remembering what worked before. But just like you might think a big candy is the sweetest only to find out it's not, q-learning can sometimes overestimate values, meaning it thinks an option is better than it really is.

Why does this happen?

Imagine you're playing a game where you choose between two doors. Behind one door is your favorite snack, say, chocolate, and behind the other might be something less exciting, like fruit loops. At first, you don't know which door has what. But as you play more, you learn which door usually gives you the best treat.

Sometimes, though, if you get a really good treat from one door once or twice, you might think that door is always going to give you the best treat, even if it doesn’t. That’s like overestimating its value.

This can confuse your robot friend and make it pick the wrong choice again later, just like how you might pick fruit loops thinking they're chocolate!

Take the quiz →

Examples

  1. Imagine a robot trying to find the best path in a maze. It might think one door leads directly to the exit, but it's actually blocked by a wall.
  2. A dog learns that a certain action gets it a treat, but sometimes it gets overexcited and thinks every similar action will get it a treat too.
  3. A student guesses the answer is right on a test, but ends up getting more wrong answers because they overestimated their knowledge.

Ask a question

See also

Discussion

Recent activity