Researchers at Arizona State University conducted a study comparing Scarlett Johansson’s voice to OpenAI’s voice model called “Sky.” The study aimed to determine whether Johansson’s voice was similar to the voice generated by the AI model. The analysis involved using AI models specifically designed to analyze vocal similarities and comparing Sky’s voice to the voices of approximately 600 other actresses.
According to NPR, the commissioning party for the study, the research team at Arizona State University found that Johansson’s voice was more similar to Sky’s voice than 98% of the other actresses examined. This finding suggests a significant resemblance between the two voices, indicating that at least from an acoustic standpoint, they share several commonalities.
However, the researchers also noted that the models sometimes indicated that the voices of actresses Anne Hathaway and Keri Russell were more similar to Sky’s voice than Johansson’s voice. This observation suggests that while Johansson’s voice exhibits similarities to Sky’s voice, it is not identical. Visar Berisha, the professor leading the analysis, emphasized the similarity between Johansson’s voice and Sky’s voice but acknowledged the likelihood of slight differences between the two.
Visar Berisha is known for his work on OriginStory, a microphone that won an FTC challenge for watermarking voice recordings to distinguish them from AI-generated voices. This background in voice technology highlights Berisha’s expertise in the field and lends credibility to his analysis.
In their investigation, the researchers discovered various similarities between the vocal tracts of Sky and Johansson. The vocal tract refers to the different anatomical structures involved in producing a person’s voice, namely the throat, mouth, and nasal passages. The analysis showed that the lengths of Sky’s and Johansson’s vocal tracts were identical. This finding suggests that the physical characteristics of their vocal tracts are very similar, leading to shared acoustic properties in their voices.
Despite the shared similarities, there were some differences between the voices. Sky’s voice was found to be slightly higher-pitched and more expressive than Johansson’s voice. On the other hand, Johansson’s voice was described as slightly more breathy compared to the model’s voice. These nuanced differences highlight the unique qualities of each voice and indicate that while they have commonalities, they are not indistinguishable.
To gain further insights into the study’s methodology and limitations, we reached out to Visar Berisha for additional information. Any response received from him will be updated in this analysis. OpenAI, the organization behind the voice model, has not responded to a request for comment at this time.
OpenAI CEO Sam Altman and CTO Mira Murati have both denied that Sky’s voice is intended to sound like Scarlett Johansson. During the GPT-4o demonstration, Altman published a one-word post that read “her,” leading to speculation that the model was designed to mimic Johansson’s voice. Johansson confirmed that Altman had approached her to provide her voice for the model, but she declined the offer. Altman allegedly made another attempt to secure Johansson’s voice just two days before the demonstration, indicating a persistent desire to incorporate her voice into the AI model.
In conclusion, the study conducted by Arizona State University researchers suggests a significant resemblance between Scarlett Johansson’s voice and OpenAI’s voice model, Sky. While Johansson’s voice was found to be more similar to Sky’s than 98% of the other actresses examined, there were slight differences between the two voices. The analysis highlighted various commonalities, including the identical lengths of their vocal tracts. However, differences in pitch, expressiveness, and breathiness distinguished Johansson’s voice from the model’s voice. The study provides valuable insights into the acoustic similarities and unique characteristics of human and AI-generated voices, shedding light on the capabilities and limitations of voice synthesis technology.
Source link