Some recent news from Penguin gives me a chance to revisit a question I was asked last week at the Net Caucus Academy briefing on AI and IP: can a publisher change the law by printing magic words on their work?
Have you ever actually read the copyright page in a commercially published book – something put out by one of the handful of big publishers like Hachette or Penguin? Go grab a book off a shelf in your home or office and take a look. Along with the “© Joe Smith 2022” and the Library of Congress cataloging stuff, you’ll likely see something like “No portion of this book may be reproduced in any form without permission from the publisher.” And now Penguin is adding “No part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems.” Funny thing about these statements, though: they are both false. They misrepresent the law, and adding them to a book doesn’t change the law.
I’m sure Penguin wishes they were true. Movie studios probably wish that the FBI Warning we used to have to sit through at the beginning of DVDs was true. But they’re not. They’re not true in the US, where fair use and other user rights permit copying portions of books and movies in a wide variety of contexts, including for school – check out our new infographic below! They’re also not true in any country that has signed the Berne Convention (which is almost every country in the world), because that flagship international copyright treaty includes a mandatory right of quotation from in-copyright works. And don’t even get me started on copying for accessibility, which is an affirmative right granted to libraries and others by both US and international copyright laws regardless of publisher warnings to the contrary. And eventually, these books and movies will rise into the public domain, and the warnings printed on their pages will be utterly false – all reproductions for any purpose will be permitted. (That probably won’t stop Getty Images from trying to lock them up behind a paywall.) These rights are too important to let publishers veto them by simply scribbling some magic words on the first pages of all their books.
So when I was asked at the Net Caucus event last week whether the law gives force to things like the robots.txt file, which web publishers use to announce their preferences about web scraping, I said “No!” Moderator Tim Lordan’s suggestion of a “don’t scrape me, bro” tag for websites is already a thing, and AI companies like OpenAI have published documentation about how to tell their bots to exclude your site. Since no single website or video actually adds much value to AI training, a non-binding opt-out is a potential win-win for individual creators and AI developers. But to give legal force to these unilateral notices would let copyright holders rewrite copyright law by fiat, undermining the careful balance required by the Constitution and crafted by Congress.