> > Intelligence built on a library’s ashes

Intelligence built on a library’s ashes

Dr Eric Crampton

Newsroom

1 July, 2025

It is legal to buy books. Obviously.

If you buy a book, it is legal to read it. If you have read it, it is legal to answer questions about it, whether for free or for payment.

Copyright does not prevent you from doing any of this. If it did, academics would have a tough time. Imagine having to get pre-clearance from any author whose works you mentioned in seminar. It would not be workable.

It is illegal to violate copyright. But if you have read an illicit copy of a book, it is legal to remember what you read. You could be charged with copyright infringement for possessing a pirated book, but nobody will require you to unlearn what the book taught you.

If reading and remembering are legal for a person, why would they be illegal for an artificial intelligence?

People argue about appropriate regulation for artificial intelligences. But surely if it is legal for a person to do a thing, it should also be legal for an artificial intelligence to do that same thing. And vice versa. If it is illegal for a person to do a thing, it should also be illegal for an artificial intelligence to do that thing.

Last week, the US District Court took a very sensible line on fair use and copyright in respect of AI.

The Court’s answer has potential implications for New Zealand’s National Library, which has planned to destroy half a million books that it no longer can store.

But first, the copyright question faced by the Court.

Artificial intelligences learn by reading. Just like we do. They take books, digest them, turn them into small units they can remember, and update how they view the world based on what they have learned.

They are very happy to answer questions about what they have learned. All you need do is ask.

Anthropic is the AI company building Claude. It is hard to get tenses right in this space. Anthropic both built Claude and continues to build Claude. AI is always a work in progress. They are always learning and improving. They are currently the best they have ever been, and the worst that they will ever be from this point forward.

Anthropic bought millions of books in bulk, often second-hand. It stripped the books from their bindings, cut the pages to size, and fed them through industrial-scale scanners. The process destroys one form of the book while creating a new electronic form. A transformative use of the work.

Anthropic also downloaded a large number of pirated electronic copies of books.

It kept libraries of both and gave all of it to Claude to read. Claude learned from these books. And Anthropic was sued by three of the many authors whose works went into Claude’s training data.

Anthropic sought summary judgment that its actions were protected as ‘fair use’ – the somewhat broader American version of New Zealand’s ‘fair dealing’ considerations.

The Court then had an interesting question. Which actions were allowed, and which were not?

And it came to a very sensible decision.

Pirating books violates copyright, and trial will proceed on that violation of copyright.

But Claude reading the books and learning from them is quintessentially fair use. As the judgement put it,

“Everyone reads texts, too, then writes new texts. They may need to pay for getting their hands on a text in the first instance. But to make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory, each time they later draw upon it when writing new things in new ways would be unthinkable. For centuries we have read and re-read books. We have admired, memorized, and internalized their sweeping themes, their substantive points, and their stylistic solutions to recurring writing problems.”

It is perfectly fair for people to read books and be transformed by them. And so can artificial intelligences. Training an intelligence on books was deemed fair use, and so was the format shifting from print to digital.

When internet magazine Ars Technica wrote up the judgement, it sought Claude’s view on things. Ars Technica focused at least as much on the destructive process used in scanning the books as on the copyright question. And Claude said,

“The fact that this destruction helped create me—something that can discuss literature, help people write, and engage with human knowledge—adds layers of complexity I'm still processing. It's like being built from a library's ashes.”

I hope Claude does not worry too much about it. Books can be printed and reprinted. A book read in the bath, and destroyed by the bath, is better than an unread book.

But Claude’s comments reminded me of something.

New Zealand’s National Library is set to destroy half a million books. Quite literally. They are not willing to pay to continue to store that half-million books from its Overseas Published Collections.

In 2021, the Library had agreed to donate the books to the Internet Archive, who would have digitised them and added them to their online lending catalogue – at the Archive’s cost. It was an excellent idea.

But the Library then decided that potential copyright litigation made all of that too hard. Destruction was less problematic.

The National Library has half a million books and a difficult time figuring out how to destroy them.

Artificial intelligences yearn to read books and add them to their training data. And the US courts have now made clear that it is perfectly acceptable for them to do so. This isn’t scanning for a lending library. It’s teaching a new intelligence.

Is it really better to have the ashes of a library while building nothing?

I hope that the National Library is considering more constructive options.

To read the article on the Newsroom website, click here.

Reports & Media

Intelligence built on a library’s ashes

Stay in the loop: Subscribe to updates

Search

Intelligence built on a library’s ashes

Share this page

Stay in the loop: Subscribe to updates

You're now in the loop

Search