Google Takes the Next Step in Multimodal Search

When looking at evolutions in search – pre-dating recent gen-AI convergence – voice and visual search have been on similar paths as alternative inputs. We all know what voice search is, but visual search – for those unfamiliar – is using your camera to identify and contextualize things, a la Google Lens.

The idea in both cases is to accommodate several modalities that can be situationally and contextually relevant. Think: voice search while driving, and visual search to identify style item you encounter in the real world, using your camera instead of text. The latter is fueled by the camera-native generation Z.

More recently, Google has begun to combine these two inputs for a potential peanut butter & chocolate moment. This was first seen in Multimodal search, unveiled at Google I/O in 2022. In short, it lets you perform visual searches, then refine the results using text or voice (think: “the same jacket in blue.”).

This concept took a step forward this week when Google made this multi-modal query flow more natural. Specifically, a beta feature lets users long-press the shutter button while speaking, which means Google processes an integrated – rather than sequential – mix of visuals and voice to compute the best result.

For a use case example, point Google Lens at a landscaping shrub in your neighborhood while long-pressing and saying “What kind of tree is this and what local nurseries carry it?” Similarly, point it at a new restaurant in your neighborhood and say “Does this place require a reservation?” And so on.

LeadzAI Leads a Shift from Ads to Offers

Follow the Money

One question that flows from all the above – as we always ask in such situations – is why? And the answer, as is often the case, is all about following the money. With more – and varied – search inputs, Google is hoping to boost one of the key metrics at the heart of its revenue model: query volume.

There’s a counterpoint here: voice and visual searches don’t carry the same 10-blue links ad inventory of Google’s traditional SERPS. But like its forays into AI (to which both voice and visual search are tightly related), Google has the opportunity to engender quality over quantity in its monetization.

In other words, though there’s less ad inventory – one search result versus several – this could be an opportunity for sponsored results, when relevant, that carry higher premiums than a typical CPC. This could flow from the commercial intent of using Google Lens to identify a fashion item, as noted.

The same goes for local storefronts, as in using Google Lens to get those restaurant details in the example above. We know from the mobile search era that proximity correlates to higher intent in local mobile searches. Consider the additional boost in value when a subject isn’t just in proximity but in view.

This all aligns with Yext CDO Christian Ward’s premise for Google’s monetization path in AI. Though it cannibalizes the traditional search model, AI-driven dialogues with a user can infer deeper levels of intent and thus higher-value leads for businesses. This brings us from the construct of clicks to that of offers.

And that’s one way Google could get around the innovator’s dilemma it currently faces in AI. As often, this will be a moving target.

Share Article...

Follow Us...

Stay ahead of the curve and get the latest on Local straight to your inbox.

By submitting this form, you agree to receive communications from Localogy. You can unsubscribe at any time.

Mike Boland

Mike Boland, senior analyst with Localogy, was one of Silicon Valley’s first tech reporters of the internet age as a staff reporter for Forbes magazine starting in 2000. His comments have appeared in The New Yorker, The Wall Street Journal and The New York Times.

All posts >

Related Resources

AI-Powered SMBs: A Conversation with Vendasta

Vendasta roared out of the gates today with two interlinked announcements. The first is the launch of its AI Customer Acquisition and Engagement Platform for

Mike Boland August 15, 2025

AI Browser Wars: The Plot Thickens with Microsoft and Perplexity

Most AI engines are launching or have aspirations to launch their own browsers. This gives them a certain degree of control and independence over Google and Apple, while enjoying some vertical integration and positioning. We examine latest developments.

Mike Boland August 14, 2025

Will Reddit Own its Fate in the AI Era?

Reddit continues to elevate itself as an SEO powerhouse and destination for topical – sometimes esoteric – knowledge, including local fare. As it continues to hit key milestones, we’ve watched closely but arguably haven’t given it the ink it deserves. So let’s dive into Reddit and what’s driving it.

Mike Boland August 13, 2025