Google’s claims about Gemini’s data-analyzing abilities may be overstated.

Admin

Google’s claims about Gemini’s data-analyzing abilities may be overstated.

abilities, aren't, as good, data-analyzing, Gemini's, Google claims



Google’s flagship generative AI models, Gemini 1.5 Pro and 1.5 Flash, have been touted as having the ability to process and analyze vast amounts of data. Google claims that with their “long context,” these models can accomplish tasks such as summarizing lengthy documents and searching for specific scenes in film footage. However, recent research suggests that these models may not actually be as proficient in these areas as Google claims.

Two separate studies delved into the capabilities of Google’s Gemini models and others when it comes to understanding and making sense of large datasets. Both studies found that Gemini 1.5 Pro and 1.5 Flash struggled to answer questions about extensive datasets accurately. In fact, in a series of document-based tests, the models provided the correct answer only around 40% to 50% of the time.

According to Marzena Karpinska, a postdoc at UMass Amherst and co-author of one of the studies, the models can technically process long contexts, but they don’t truly understand the content. This raises concerns about the effectiveness of these models when it comes to extracting useful insights from extensive data.

The context window of a model refers to the input data it considers before generating output. Gemini’s context window has been expanded in the latest versions to include up to 2 million tokens, the largest context of any commercially available model. Tokens are subdivided bits of raw data. However, even with this increased capacity, the models still struggle to understand and accurately process large amounts of information.

In one of the studies, the models were tested on their ability to evaluate true/false statements about works of fiction. The researchers chose recent books to ensure that the models couldn’t rely on prior knowledge. The models were asked to determine whether each statement was true or false and explain their reasoning. The results showed that Gemini 1.5 Pro answered the statements correctly only 46.7% of the time, while Flash only managed to answer correctly 20% of the time. These results indicate that a coin toss would be more reliable in answering questions about the books than Google’s AI models.

The second study tested the ability of Gemini 1.5 Flash to “reason over” videos, meaning to search through and answer questions about the content of the videos. The researchers created a dataset of images paired with questions, and Flash was tasked with answering the questions based on the images. The results showed that Flash struggled with these tasks, with an accuracy rate of around 50% for transcribing six handwritten digits from a series of images.

These findings challenge Google’s claims about the capabilities of its Gemini models. While the studies focused on the 1-million-token context releases rather than the 2-million-token releases, they still raise concerns about the models’ performance. Google has been criticized for overpromising with Gemini and failing to deliver on its claims.

Generative AI, as a field, is also facing increased scrutiny due to its limitations. Surveys have shown that many executives are skeptical of the productivity gains that generative AI can bring and express concerns about potential mistakes and data compromises. The decline in generative AI dealmaking further highlights the growing dissatisfaction with the technology.

In order to address the hype around generative AI, the researchers believe that better benchmarks and third-party critique are needed. Current benchmarks often focus on simplistic tasks and fail to assess the models’ ability to answer complex questions and understand long-context information.

Overall, the research suggests that Google’s Gemini models may not live up to the company’s claims. While they can process large amounts of data, they struggle to understand and make sense of it effectively. The field of generative AI as a whole needs to improve its benchmarks and rely on third-party evaluations to provide a more accurate assessment of the models’ capabilities.



Source link

Leave a Comment