Google's Gemini AI Struggles with Large Data Sets, Studies Reveal

Google claims that its main generative AI models, Gemini 1.5 Pro, and 1.5 Flash, can handle and analyze massive volumes of data. The tech giant has emphasized the models' "long context" capabilities in press conferences and demos, saying they can summarize hundreds of pages of papers or search video footage. Recent research, however, suggests these models may not live up to expectations.

Two investigations assessed Google's Gemini models' performance on datasets as long as "War and Peace." The results were disappointing. One study found the models answered document-based exams 40% to 50% correctly, as reported by TechCrunch.

Researchers Put Gemini AI to The Test

UMass Amherst postdoctoral researcher Marzena Karpinska, co-author of one study, said models Gemini 1.5 Pro can technically process long contexts, however, they have observed "many cases indicating that the models don't actually 'understand' the content."

A model can examine a "context window" of input data, like text, before producing an output. The latest Gemini can handle 2 million tokens, 1.4 million texts, two hours of video, or 22 hours of audio. It has the most context capability among commercial models.

According to the report, Google showcased Gemini's long-context skills earlier this year by having Gemini 1.5 Pro scan the Apollo 11 moon landing televised transcript for humor and match sequences to a pencil sketch. Google DeepMind research VP Oriol Vinyals called the model "magical."

In a study by the Allen Institute for AI and Princeton, models were tasked with evaluating the truthfulness or falsity of assertions in contemporary fiction books. The models were required to verify assertions with specific information and plot points. Flash answered 20% of questions in a 260,000-word book, whereas Gemini 1.5 Pro answered 46.7%.

Karpinska stated that AI models had trouble confirming information that required "considering larger portions of the book," as well as the whole book, "compared to claims that can be solved by retrieving sentence-level evidence."

A second UC Santa Barbara study examined Gemini 1.5 Flash's video reasoning. They collected photos and object-related queries. Flash could only transcribe digits from a sequence of photos 50% of the time, decreasing to 30% with more images.

Michael Saxon, a UC Santa Barbara PhD student and research co-author, said, "On real question-answering tasks over images, it appears particularly hard for all the models we tested. A modest amount of reasoning-recognizing a number in a frame and reading it-may shatter the model.

Google And Apple Explore Deal To Power IPhone Features With Gemini AI — In this photo illustration, Gemini Ai is seen on an iPad on March 18, 2024 in New York City. Michael M. Santiago/Getty Images

AI Set To Transform Industries Amid Risks

Despite not being peer-reviewed and evaluating previous models with lower context windows, the research disputes Google's marketing claims. None of the models being assessed, including OpenAI and Anthropic, did well, but Google's context window focus has garnered attention.

Generative AI is under assessment as businesses and investors grow disillusioned with its limits. Boston Consulting Group surveys show that CEOs are wary about generative AI's productivity advantages due to mistakes and data security.

Early this month, current and former workers from top AI firms, including Microsoft-backed OpenAI and Alphabet's Google DeepMind, raised severe worries about AI's vulnerabilities, per Reuters.

Eleven OpenAI and two Google DeepMind workers wrote an open letter criticizing AI companies' financial incentives, saying they inhibit supervision.

The letter warned that unmanaged AI might spread disinformation, weaken autonomous AI systems, and worsen inequities, which could lead to "human extinction." Researchers also discovered that OpenAI and Microsoft image generators created election-related misinformation despite prohibitions.

The group also emphasized that governments cannot trust AI businesses to freely disclose the capabilities and limits of their systems due to their "weak obligations".

Moreover, they urged AI businesses to allow current and former employees to raise risk-related issues and avoid confidentiality agreements that hinder criticism.

Despite these concerns, recent advances in AI are expected to transform technology and industry, per a report from The Motley Fool. These advanced systems may generate fresh material, streamline tedious activities, and automate procedures with instructions, boosting productivity and lowering costs.

Currently, investors are looking beyond hardware-focused AI adoption to the growing AI-enhanced software sector. Bloomberg Intelligence expects generative AI software sales to reach $280 billion by 2032, up 18,647%.