Noopur Bhandiwad, Pune
Several leading news publications like The New York Times, CNN and the Australian Broadcasting Corporation (ABC) have blocked Open AI from allowing their web crawlers to access their content.
This decision comes amidst Open AI’s Chatbot ChatGPT’s mechanism to use web crawlers to visit webpages and use the data it collects to train its AI model better. Other news outlets like CNN, Chicago Tribune, and Australian Community Media (ACM) have reportedly undertaken similar steps to disallow Open AI from accessing their web content.
This issue has built up tensions between The New York Times and Open AI to such an extent that NYT is considering taking the legal route to protect the intellectual property rights of its content. This tension had risen due to negotiations surrounding a licencing deal, under which OpenAI would pay The New York Times to use its stories for its Artificial Intelligence (AI) tools.
Web crawlers are used by OpenAI to enhance ChatGPT. Text from books, news articles, and other copyrighted items may be present in the datasets it gathers. However, the businesses that create these models frequently fail to mention whether copyrighted material exists in these datasets. Therefore, it poses a problem when AI models copyrighted content without required authorisation, resulting in content owners rushing to protect their intellectual property rights.
“Giving GPTBot the permission to access your site can help AI models become more accurate and transform their potential and safety,” OpenAI said in a blog post.
A spokesperson from Reuters told the Guardian, that they frequently review its robots.txt and site terms and conditions. “Since intellectual property is the backbone of our business, it is extremely important that we protect the copyright of our content,” the spokesperson said.
There is increased skepticism worldwide, as more and more news organisations consider using AI in news gathering. Google, which is also developing its own chatbot, Bard, has suggested tweaking the copyright law so that AI systems can scour the internet for content, with an option for publishers to back out.