Back to Demos

Image + Audio Query

Mix-and-Match

Ask a question about an image using your voice.

Why On-Device?

Combining multiple heavy modalities (images and audio) locally avoids massive upload times and creates a seamless interactive experience.

Interactive Demo

Full Multimodal AI Required

This demo requires an On-Device Language Model that supports BOTH Vision and Audio processing simultaneously.

Multimodal Assistant
Select Image

Provide an image and an audio question to begin.

Implementation Code

demo.js