You choose between different options by trying them out and learning what works best, just like a kid picking candies from a jar without knowing which one tastes the best.
Imagine you're at a candy store with 5 jars of different candies. You don't know which jar has the best candy, so you start picking randomly, sometimes getting a sweet treat, sometimes getting something yucky. But as you go on, you begin to remember which jars gave you good candies before.
That’s how the epsilon greedy algorithm works in multi-armed bandit problems: it's like being that kid who tries new candies but also remembers their favorites.
How It Chooses
Most of the time, the kid picks from the jar they think has the best candy, just like the algorithm choosing the option with the highest reward so far. But sometimes, the kid decides to try a new jar, this is called exploration.
That "sometimes" is where epsilon comes in: it's like saying “1 out of 10 times, I'll try something new.” If epsilon is 0.1 (or 10%), then 10% of the time, the kid will pick a random jar instead of their favorite one.
This balance between exploration and exploitation helps find the best candy, or in real life, the best choice, without getting stuck on something that might not be the best forever.
Examples
- A student switching between study methods to get the best results
- A shopper trying out different stores for the best deals
Ask a question
See also
- How to Master the Instagram Algorithm?
- How TikTok's Algorithm Figures You Out | WSJ?
- What is an Algorithm?
- Who is Algorithm Design?
- What's an algorithm? - David J. Malan?