The development of the AI industry has hit another hurdle: data availability is beginning to decline, writes the New York Times.
Subscribe to RB.RU on Telegram
The development of the AI industry has hit another hurdle: data availability is beginning to decline, writes the New York Times.
Subscribe to RB.RU on Telegram
The MIT-led Data Provenance Initiative conducted a study and found that many key web sources have begun to limit the use of their data, which has a negative impact on the training of powerful systems.
14,000 domains were analyzed and used in three major AI training datasets. The results revealed a significant “emerging crisis of consent.” In one year, about 5% of all data and 25% of the highest quality data are known to be restricted by the Robots Exclusion Protocol, a tool that site owners use to block automated data harvesters.
And it turns out that nearly 45% of the data in the C4 dataset is now restricted by websites’ terms of service.
The new restrictions are expected to affect not only companies developing AI, but also researchers, scientists and non-profit organizations that use web data.
We previously wrote about what “AI PR” is and why it has become a problem.
Author:
Nikolai Tikhonov
Source: RB
I am a professional journalist and content creator with extensive experience writing for news websites. I currently work as an author at Gadget Onus, where I specialize in covering hot news topics. My written pieces have been published on some of the biggest media outlets around the world, including The Guardian and BBC News.