What is Visual question answering (VQA)?

Visual question answering, or VQA, is like having a friend who can look at a picture and tell you what's going on in it, just by asking questions.

Imagine you're playing with your favorite toy, say a colorful puzzle. You show the puzzle to your friend, and then you ask them: "What shape is this piece?" or "Is this red or blue?" Your friend looks at the puzzle and answers you. That’s like VQA, it's when a computer can look at a picture and answer questions about it.

How It Works

Think of it like this: The computer sees a picture, just like you see your toy. Then someone asks it a question, like "What color is the ball?" The computer uses what it sees and what it knows to figure out the answer, just like you would!

Sometimes the questions are simple, like "How many cars do you see?" Other times, they might be trickier, like "Is this person happy or sad?" But no matter how easy or hard the question is, VQA helps the computer understand both the picture and the question to give a smart answer.

Take the quiz →

Examples

  1. A child sees a picture of a cat and is asked, 'What color is the cat?'
  2. A robot looks at a painting and answers, 'This is a sunset.'
  3. A person with glasses reads an image and says, 'I see a car in the street.'

Ask a question

See also

Discussion

Recent activity