Are we running out of human-generated data?
I was reading through #thestateofAI 2023 report over this weekend – highly recommend for the density of content it manages to wrap into a deck
One of the points it alludes to is scalability of data – “It’s unclear how long human-generated data can sustain AI scaling trends (some estimate that data will be exhausted by LLMs by 2025) and what the effects of adding synthetic data are. Videos and data locked up in enterprises are likely up next. “
“Assuming current data consumption and production rates will hold, research from Epoch AI predicts that “we will have exhausted the stock of low-quality language data by 2030 to 2050, high-quality language data before 2026, and vision data by 2030 to 2060.”, the report states
If you are a startup who has used synthetic data for training, I am extremely curious to hear the results.
Report here
********************************************************
#reviewswithranjani
#Technology | #Books | #BeingBetter
#stateofaireport2023