Youtube Videos Were Secretly Used By OpenAI To Train ChatGPT?


Shreya Bose

Shreya Bose

Jun 20, 2023


According to reports OpenAI made use of YouTube to train its speech-to-text AI language model Whisperby scraping its data.

Youtube Videos Were Secretly Used By OpenAI To Train ChatGPT?

Using YouTube 

Some of the training data derived from Whisper ultimately contributed to the development of GPT-4, which is the language model behind ChatGPT.

According to a report in The Information, OpenAI “has secretly used data from the site (YouTube) to train some of its artificial intelligence models”.

AI models need tons of data for training and YouTube is the single biggest and richest source of imagery, audio and text transcripts on the web. 

Google’s Gemini

Google researchers have also been using YouTube data to train and refine its own large-language model called Gemini.

Sundar Pichai, the CEO of Google noted, “Gemini was created from the ground up to be multimodal, highly efficient at tool and API integrations, and built to enable future innovations, like memory and planning.” 

He further added that it offers “impressive capabilities not seen in prior models.”

The value of video content for AI training purposes has also been acknowledged by Meta.

Using video data

Yann LeCun, the AI chief at Meta Platforms, has emphasized the significance of video training data in his work.

LeCun stated that a hierarchical Joint Embedding Predictive Architecture could potentially learn about the world by watching videos and interacting with its environment.

His point highlights the importance of video in enabling AI models to “think” more like humans, as opposed to relying solely on text data for training.

Violates rules

YouTube does not permit use of its data for such purposes.

Its terms of service ban using content for anything other than “personal, non-commercial use.”

Hence, training a commercially oriented AI model using such content could potentially violate the site’s rules.

Controversy

It’s an open secret in the AI industry that all are scraping the web and OpenAI reportedly “scraped” YouTube data to train its AI models which are now a rage in the world.

This has provoked debates and disputes as major technology companies increasingly move to improve their AI capabilities or offer AI-powered services.

Despite the lawsuits filed against text-to-image generator firms for violating artists’ copyright, large language models continue to be developed in secrecy with no information or transparency about their training data content.


Shreya Bose
Shreya Bose
  • 609 Posts

Subscribe Now!

Get latest news and views related to startups, tech and business

You Might Also Like

Technology
Sep. 8, 2022

Samsung Launches The Wall All-In-One and Flip Pro: Is This The Future Of Display Technology?

Samsung has launched The Wall All-In-One – the modular MicroLED it says is revolutionizing the future of display and the Flip Pro, which is an interactive display. Both were unveiled at the InfoComm India 2022 which is India’s Professional AudioVisual (Pro AV) and Systems Integration Technology Exhibition. This took place in Mumbai from September 5-7. […]

Technology
Jul. 28, 2022

Google Street View Launches In India Across These 10 Indians Cities! Plans To Expand To 700,000 Kms, 50 Cities In 2 Years

Google’s Street View is finally available in India a decade after it was prevented from capturing data for its Street View services. Second coming Street view offers a 360-degree interactive panorama feature initially for 10 Indian cities with data from local partners Tech Mahindra and Mumbai-based Genesis International. Its entry into India is facilitated by […]

Technology
Jul. 10, 2022

This Electricity-Free Cooler Developed By IIT Researchers Can Replace Air Conditioners! How It Work?

Indian Institute of Technology Guwahati researchers have built a ‘Radiative Cooler’ which does not require electricity to operate. This is an affordable and efficient ‘passive’ radiative cooling system that can serve as an alternative to ACs. The coating material is an electricity-free cooling system that can be applied in the rooftops and functions during both […]

Technology
Jun. 27, 2022

This New Battery Will Enable Electric Cars To Run 1000 Kms In Single Charge! Will This Transform Electric Mobility?

Chinese company Contemporary Amperex Technology Co Ltd (CATL) has developed a battery that gives EVs a driving range of over 1,000 km or 621 miles on a single charge. About the company The company is China’s leading automotive lithium-ion battery maker. Tesla, Volkswagen, BMW, and Nio are among the companies for which CATL supplies batteries. […]

Recent Posts

Related Videos

   

Subscribe Now!

Get latest news and views related to startups, tech and business

who's online