What are bandit algorithms?

Imagine you're at a candy store where each machine gives out different kinds of candies, but you don't know which one is the best until you try it.

Bandit algorithms are like smart kids who figure out which candy machine to use by trying them one by one and learning what works best.

How It Works

At first, a kid might just pick a random machine. But as they get more candies, they start to notice: this machine gives more of their favorite kind! So they begin choosing it more often.

It’s like playing a game where you try different choices and keep track of what gives you the best reward, in this case, delicious candy.

Why It Matters

This idea is used all over the place, from video games to online ads. A website might use bandit algorithms to decide which ad to show you, based on how many times people click on it.

So next time you're picking a candy machine or clicking an ad, remember: there’s a clever kid (or computer) learning what works best, just like you!

Take the quiz →

Examples

  1. A kid trying different ice cream flavors to find their favorite without eating all the options at once.
  2. A restaurant testing new dishes by serving them to random customers.
  3. A student choosing between studying for math or history based on which subject they think will help more.

Ask a question

See also

Discussion

Recent activity