Gemini co-lead Oriol Vinyals addresses criticism of Google Deepmind’s staged multimodal demo



summary
Summary

  • Added video demo and statement from Gemini co-lead Oriol Vinyals

Update from December 9, 2023:

Gemini co-lead Oriol Vinyals addressed criticism of Google’s staged Gemini hands-on demo on X, stating that “all the user prompts and outputs in the video are real, shortened for brevity.”

The staged video people criticized supposedly “illustrates what the multimodal user experiences built with Gemini could look like,” and it was made “to inspire developers,” according to Vinyals.

He even took the time to demo the developer environment, generating AI output with a combination of images and prompts similar to what Google showed in the video.

Ad

Ad

Video: Oriol Vinyals via X

It’s not real-time video analytics combined with speech, as Google showed in the video below. But it does show that the underlying capabilities needed for such a use case are part of Gemini Pro and Ultra – which isn’t surprising since we already know such capabilities from GPT-4 vision.

Original article from December 8, 2023:

Google took a fake-it-till-you-make-it approach to demonstrating Gemini’s multimodal capabilities

A staged demo video leaves developers and employees in doubt about the true capabilities of Google’s new Gemini language model.

In the video, titled “Hands-on with Gemini: Interacting with multimodal AI,” Google shows off the AI model’s impressive voice interaction and real-time visual response capabilities.

Recommendation

on Google’s developer blog.

Gemini fake demo faces internal criticism

According to sources from Bloomberg and The Information, Google employees have expressed concern and criticism internally about the demo video. One Google employee stated that the video painted an unrealistic picture of how easy it is to achieve impressive results with Gemini.

The staged demo also became the subject of memes and jokes within the company, with employees sharing images and comments poking fun at the discrepancies between the video and the actual AI system.

Despite the controversy surrounding the demo video, Google insists that all user input and output shown in the video is real, even if the video suggests a real-time implementation that does not yet exist.

Eli Collins, vice president of products at Google DeepMind, told Bloomberg that the duck-drawing demo is still in the research stage and not yet part of Google’s products.

“It’s a new era for us,” Collins told Bloomberg. “We’re breaking ground from a research perspective. This is V1. It’s just the beginning.”

Google also published benchmark results in a misleading way. It compared a top score on the well-known language understanding benchmark MMLU using a more complex prompt method (CoT@32) with the standard benchmark method tested by OpenAI using GPT-4 (5-shot). Using the 5-shot prompt method with Gemini Ultra on MMLU, Google’s largest model performs 2.7% worse than GPT-4.

Although Gemini achieved the best overall MMLU score with CoT@32, the way it presents this result is questionable. It shows, as does the fake real-time video, that Google has tried at all costs to portray Gemini as superior to GPT-4, rather than about equal, which is probably closer to the truth.



Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top