Google’s Gemini surpasses human health coaches

better health coach, Google Gemini, Humans, proves

Google Gemini, the large language model (LLM), has been making significant advancements in various fields such as security, coding, and debugging in its short six-month lifespan. Now, it is surpassing humans in providing sleep and fitness advice. Researchers at Google have introduced the Personal Health Large Language Model (PH-LLM), a version of Gemini that has been fine-tuned to analyze and reason on personal health data from wearables like smartwatches and heart rate monitors. In their experiments, the model outperformed health and fitness experts with years of experience in answering questions and making predictions related to personal health.

Wearable technology has become a crucial tool for individuals to monitor and make changes to their health. These devices collect a vast amount of data related to sleep, physical activity, cardiometabolic health, and stress. However, this data is rarely utilized in clinical settings, primarily due to its lack of context and the computational requirements for storage and analysis. Additionally, interpreting this data can be challenging. While large language models have excelled in medical question-answering and analysis of health records, they have struggled to reason and provide recommendations based on wearable data.

The development of PH-LLM represents a breakthrough in training large language models to provide recommendations, answer professional examination questions, and predict self-reported sleep disruption and sleep impairment results. The model achieved impressive results, surpassing average scores from human experts. In sleep exams, it achieved 79%, while in fitness exams, it achieved 88%, compared to the human experts’ average scores of 71% and 76% respectively.

The researchers provided several examples to showcase the capabilities of PH-LLM. In one instance, they prompted the model to list the most important insights for a user who was having trouble falling asleep. The model correctly identified the importance of adequate deep sleep for physical recovery and recommended maintaining a cool and dark bedroom, avoiding naps, and having a consistent sleep schedule. In another example, when asked about the type of muscular contraction that occurs during the slow, controlled, downward phase of a bench press, the model correctly responded with “eccentric.”

To achieve these results, the researchers created and curated three datasets that tested personalized insights and recommendations based on physical activity, sleep patterns, and physiological responses. They collaborated with domain experts to create 857 real-world case studies related to sleep and fitness. These case studies incorporated wearable sensor data, demographic information, and expert analysis.

While PH-LLM shows promise, there is still much work to be done to ensure its reliability, safety, and equity in personal health applications. The model-generated responses were not always consistent, confabulations varied across case studies, and the model sometimes erred on the side of caution. In one instance, the model failed to identify under-sleeping as a potential cause of harm. The researchers acknowledge the need for further refinements, including reducing confabulations, considering unique health circumstances not captured by sensor information, and ensuring that the training data reflects the diverse population.

Despite these challenges, the researchers believe that the results from this study represent an important step towards large language models delivering personalized information and recommendations to support individuals in achieving their health goals. The potential of PH-LLM and similar models in the field of personal health is promising, and with further development and evaluation, they can play a significant role in revolutionizing healthcare and empowering individuals to take control of their well-being.

Source link

Leave a Comment