Researchers analyzed 14,000 fields used in three large AI training datasets and found a significant “emerging consensus crisis.” Over the past year, about 5% of all data and 25% of the highest-quality data was restricted by the Robots Exclusion Protocol, a tool used by site owners to block automated data collectors.

The study also found that up to 45% of data in the C4 dataset is now restricted by websites’ terms of service.

These restrictions will affect not only companies developing artificial intelligence, but also researchers, scientists, and nonprofits that use web data.

Source: Ferra

Previous articleBakalchuk announced several new appointments at the merged company with Russ
Next articleAnderson Network Founder Shuts Down Her IP in Russia
I am a professional journalist and content creator with extensive experience writing for news websites. I currently work as an author at Gadget Onus, where I specialize in covering hot news topics. My written pieces have been published on some of the biggest media outlets around the world, including The Guardian and BBC News.

LEAVE A REPLY

Please enter your comment!
Please enter your name here