Lawfare Archive: Pam Samuelson on Copyright's Threat to Generative AI

Hi, I'm Dana Stuster and I'm a foreign policy editor at Laugh Fair.

You may have heard me on rational security, maybe you've read some of the articles I've edited about in your actual affairs.

“Those articles are just one part of the amazing work that Laugh Fair does to tackle”

hard national security questions. Laugh Fair provides in-depth non-partisan analysis on issues that impact anyone who cares about democracy, cybersecurity, foreign policy, and the rule of law. I've been working with Laugh Fair since 2016 and I'm so incredibly proud to be a part of this organization.

The analysis always provides nuance and engages in the complexity of the issues while still being

approachable. I'm constantly learning when I edit our authors and read the site and it's a real privilege to contribute to this site that is a resource not just for me but for so many people trying to make sense of this moment. But as you've probably noticed, the pace of events isn't slowing down.

The need for the type of informed expert analysis that Laugh Fair provides is greater than ever.

“We're a 501 seed 3 non-profit and everything we produce is accessible to anyone who wants”

it. But that only works if readers and listeners like you contribute to helping us make the site.

So, I'm asking you to please help support our work by becoming a material supporter.

Join our community of smart and informed people and you'll get access to member only perks like the ad-free podcast feed, monthly ask us anything opportunities, invites to special events, and more. To become a material supporter, head to Laugh Fair Media.org/support. Just $10 a month or more if you're able, makes a difference for us.

That's Laugh Fair Media.org/support. Thank you for listening and for caring about the things that matter. My Marissa Wong Internet Laugh Fair, with an episode from the Laugh Fair archive for May 10th, 2026. On May 5th, a novelist and five major publishers filed a copyright infringement lawsuit against

meta and mark Zuckerberg. The lawsuit accuses the tech giant of illegally using millions of copyrighted works to train their artificial intelligence program Laugh. For today's archive, I chosen episode from July 17th, 2023, in which Alan Rosenstein sat down with Pamela Samielsen to discuss then current litigation on copyright issues for developers

of generative AI models. The two also discussed these cases implications for the legal limits of using copyrighted material to train AI programs and how the issue will develop in future litigation. This is The Laugh Fair podcast. I'm Alan Rosenstein, associate professor of Laugh at the University of Minnesota and senior

editor at Laugh Fair. To explore these issues, I spoke with Pam Samielsen, who was the Richard M. Sherman distinguished professor of Laugh at the University of California at Berkeley, and one of the pioneers in the study of digital copyright law. She's just published a new piece in the Journal of Science titled "Generative AI Meets

Copyright" in which she analyzes the current litigation around "Generative AI" in where it might lead. It's the Laugh Fair podcast, July 17th, Pam Samielsen, on copyrights threat to generative AI. We start by asking a very general question, since we don't usually have much cost to discuss

issues of copyright law and law fair, so both for my sake as a non-copyright law specialist

“and also for the sake of our audience, what is the core idea behind copyright law?”

The idea is that people often need incentives to be creative, to create books or music or other things, and if they want to make a living from their creations, they need to be able to have some exclusive rights, at least for some period of time, and copyright essentially

allows the original works of authorship to be protected from the moment they're first

fixed in a tangible medium under US law. And so, essentially, every photograph you take, and every grocery list you write, the copyright that kind of attaches to it automatically, obviously those are not things that typically are commercially valuable, but nevertheless, copyright attaches to them, and when they are commercially valuable, then copyright law gives the owner of the rights and ability

to control the commercial exploitations of their works. When somebody creates something that in fact is commercially valuable, then they had the exclusive right to, in fact, exploit the work and to authorize people to make derivative works of their

Creations.

And that way, if you create something valuable, you get the commercial benefits from it,

“and that's something that lots of people who are professional writers, professional musicians,”

professional coders, care about quite a lot. And this, please correct me if I'm wrong, this is a right that attaches automatically. So you don't have to actually write on your grocery list or your photograph or whatever that little C with a circle, right, this is copyrighted. This is just a thing that attaches because you wrote it and then you created it, and then

you get whatever rights that come with that.

Yes, that's exactly right.

So now we've established sort of the idea behind copyright law. At a high level, what is the potential conflict between copyright law on the one hand in generative AI on the other hand?

“And of course, the whole idea, hopefully, behind generative AI is that it's creating new”

works. It's not just reproducing existing works. So the biggest issue for a lot of professional writers and other kind of people who are professional creators is that the generative AI is trained on data and where is there a lot of data, it's out there on the Internet.

And so if a photographer has put up his images on the Internet or a blog is up on the Internet, it's probably going to be used as training data for some of these generative AI systems. There are sites, for example, which have a lot of digital art and one of the lawsuits

involves visual artists, basically, claiming that you ingested our work as training data.

And the reason that you can mid-Journey and the other generative AI image systems, really nice images is because of the quality of the image that you used as training data. And so even if the output is not a really close resemblance to the input data, the quality of the output, is due to the quality of the inputs and so the big issue is really about ingesting works as training data, and of course, a lot of the professional writers and

graphic artists and so forth are worried about the competition between what they charge money for and what you can get for, let's say, 20 bucks if you use one of these generative AI systems. So there were, you know, the screen writers strike, for example, as worried about the use of generative AI by the studios to write scripts about this or the other thing.

And they're worried that they won't have a job anymore, or at least that they won't be paid as much for scripts that they might contribute to. So that's sort of the job loss issues, a little large for a lot of the professional creators. Intuitively, I understand the argument that using copyrighted works for training data might cause an issue, but at the same time, what is the difference between a generative AI system

using this as training data, and me, for example, using this as training data, right? There's that famous probably apocryphal, but it's so good, it's good of too good to check and quote from a public Picasso that good artists imitate and great artists steal. We are all in a sense, you know, recombination engines of stuff that's gone before us.

“That's how we learn to paint or write or make music or code or write or have you articles.”

So presumably, I can't be sued just because I used a bunch of people as my training data. So why can a generative AI system? Well, I think that the people who design these systems, believe that if you take data from the open internet, you scrape data from the open internet that you aren't hurting anybody, you are using it, not to essentially exploit the expression in the work, what you're doing

is kind of decomposing the works into very small units, which the computer scientists call tokens, and you tokenize things in a way that allows for computational uses and to really try to understand what words are likely to be next to what words, and so their view is that this is a lot like the Google Books case in that case, the author's guild,

To Google for digitizing millions of in copyright books from research library...

and the courts eventually found that that was very used because Google wasn't trying

to exploit the expression, and you couldn't essentially get enough expression from the books to essentially supplant demand for the original, and if you couldn't supplant demand, if you were just using it for computational purposes, then you weren't exploiting expression, and what copyright law protects is the exploitation of expression. So from the standpoint of the technologists, this looks like what we're doing is very much

like what Google did, and Google was doing it from research library collections, we're doing it from the open internet, and web scraping is just something that people do all the time, and therefore it must be very used because it's been allowed for years and years. Are the main cases that you're tracking in this wave of litigation, and sort of where

are they in the process, and do you have any sense of how they're going to resolve where is it sort of too early to know?

“I think it's really too early to know in all the cases, but I can't, that there are”

now 10 lawsuits, the ones that I'm following them most closely are the getty images for accessibility case, there's one filed in the US and one filed in the UK, again about the stable diffusion and about the training data and outputs as infringement, similar claims in the Anderson class action lawsuit against stability, and the same lawyer is handling the Anderson case and the dovers get up case, which is about co-pilot, so it's about software

not about visual art, and then just in the last week or so there's been three new lawsuits, two against open AI and one against meta based on books, so there's lots of stuff going on, but these are the main cases, there's also another case that's more focused on privacy issues, which is also against open AI, but I think the copyright cases are the ones that are of greatest interest.

“Is your sense that the companies that have developed these generative AI systems?”

Are they surprised? Did this catch them off guard, or was this inevitable and they understood it and they

decided, well, it's just the cost of doing business and building these amazing technologies

is that we're just going to have to deal with this when this comes down the pike. I don't think it was a complete surprise what you have with open AI and with meta are companies that have very high valuations, and they're doing something that's quite novel, and it certainly has, I think, probably surprised them just how angry some of the visual artists and some of the author's professional authors groups have been attacking them, but the

lawsuits, I think, not a huge surprise, but I'm sure that the companies, Microsoft, open AI and meta, have had to do some risk analysis, and they wouldn't have gone forward with these projects if they didn't think they had a pretty strong case.

“And in terms of that risk analysis, what is the range of outcomes in these cases?”

I mean, obviously, one possibility is the courts find that there aren't no copyright issues, the other possibility is that the courts find that there are some copyright issues and then there's this question of what the remedy is. In those cases in particular, are we looking at, you know, it's going to cost them some money, but it's not that big of a deal, or are we looking at, wow, this could stop

generated AI in its tracks because obviously, without the training data, these models are useless. So I think the biggest threat for them is that the courts decide that the training data, copying is infringement and then orders the destruction of the models.

That would be, that would be something really amazing, but it is quite possible as an outcome

courts have the authority to impound and destroy order, impoundment and destruction of things that are copyright infringements. So I'm not, I'm not predicting that that would be an outcome, it may be that damages would suffice, but, you know, the open AI lawsuit involving GitHub, that complaint asked for

$9 billion, and, you know, I mean, Microsoft has $9 billion, but $9 billion i...

Fair use, the concept of fair use, is likely to be a major defense from AI companies.

And so I'd like you to sort of explain, again, generally the idea behind fair use and also how it's likely to apply in these cases and what parts of fair use in particular are likely to be most relevant. So the copyright statute in the United States says that fair uses of copyrighted works are not infringements, it directs courts to take certain factors into account in making

a fair use decision, so what purpose did the cutative fair user have when making use of an existing work, what's the nature of the copyrighted work, how much was used and what

“kind of effect does that have on the market for or value of the work?”

And let's go back to the author skill of the Google case for a minute, because that's the

closest analog to the generative AI fair use cases. So what was the purpose of Google's scanning millions of in copyright books from research library collections? It was to create a database so that it could engage in computational uses of the books and the contents of the books, including for enabling snippets to be created of content.

So if you're looking for information about Buffalo New York, you can ask a question in Google's search engine, and you get a little snippet from a book that talks about the city of Buffalo, and maybe that will satisfy your curiosity. Maybe you just wanted to know what the population of the city is, and you can get that kind of information through Google book search.

So the purpose was quite different than the purpose for which the books were initially marketed. So the court decided that that meant that when it was done for a different purpose, it wasn't competing with the book as a commodity. It was allowing people to get information and allowing Google to be able to make information

available to people, and that was actually a positive thing. So the court said the purpose was transformative because it was a different purpose. The nature of the copyrighted works, there were old books in the research library collections, but the court didn't give very much weight to that. Now, generally speaking, if you copy the whole thing, that doesn't necessarily turn out

to be a good thing, right?

“It usually cuts against fair use, but if you want to index the contents of books, and if”

you want to be able to serve up snippets, you would, in fact, have to copy the whole thing.

And so the court basically decided that that was reasonable in light of the purpose, and

the court said, "No, there's no harm to the market for the books because Google basically isn't serving up ads next to the snippets. It just, in fact, has links to places where you can buy the books that snippets are shown of. And so they're not undermining the market for the book because they're not really supplanting

demand for purchases of the books. And so the court kind of on balance decided that that was a fair use, and there will be similar kinds of arguments made by OpenAI and a buy stability that my purpose is very different than the purpose of the original. And the works are creative, but you put them up on the Open Internet, and so that makes

them fair game, again, we copied the whole thing, but that's necessary if you want to be able to essentially create these language models or image models, and as long as I'm not spitting out something that's substantially similar to any particular input of the training data, then it probably should be very, that's the kind of argument that they're going

“to be making, now, again, I think the getty images case is when it's going to be a little”

tougher for stability to win because getty images says, "Hey, I've got a licensing program for making my photographs available as training data."

So you're interfering with a market opportunity that I have.

So that's going to be, I think, a big issue in that particular case.

“So what are the factors you pointed to in the Google book case was the idea that the output”

of that was not really going to compete in a meaningful way with the authors of those books. And so that's one thing that would cut in favor of a fair use determination. It doesn't even this case, though, that you have a lot of really worried artists and coders and musicians who are not claiming that their work is directly going to compete with them,

obviously it's going to be transformed, but that the output or the outcome, which are these

incredible models, are going to basically drive the costs of this creative work down

to very low amounts and therefore they're going to be priced out of the market, essentially. And that there's sort of a further irony that they are in the sense that kind of agents of their own destruction because, of course, is their content that's being used against them. Does that seem like a meaningful difference to you between these cases and the Google books

index in case? Well, that's certainly what some of the professional writers and visual artists are arguing.

“And I have some sympathy with that, but remember, advances in technology have made lots”

of creations possible that compete, so professional photographers, for example, today are

having a tougher time because the quality of the images that we are all able to generate even on our phones, the quality of those images makes it possible for people to essentially use creative comments instead of hiring a professional photographer for certain kinds of images that they want to be able to use on their websites or for ads or whatever. And so it does seem to me that lots of tools to make a fan fiction and the like essentially

means that there's kind of more competition that isn't necessarily a bad thing. In fact, there's a kind of democratization of creation, which overall is probably a net positive in terms of copyright policy. What is the purpose of copyright? It is to promote the progress of science in the useful arts that is to say to encourage

the creation and dissemination of original works of authorship and all of those works of authorship build on pre-existing works. And so what we care about is the ongoing progress, not protecting particular people's jobs. So if the outputs are substantially similar to particular inputs, that's actually something

that copyright law would treat as an infringing copy or an infringing derivative work. But very often what's going to be outputted is going to be something that's very different from any particular input. And so far as that's true, that's not something copyright law has done before. So it has not extended that far.

So the Anderson lawsuit, for example, claims that every image that's outputted by a stable diffusion is an infringing derivative work because essentially is derived from the training data, which is derived from the images that were copied in the course of the training data, and that's just a stretch from the standpoint of copyright law. Now again, Congress is going to be having hearings about this, the copyright office has

already had a series of listening sessions in which people who have ideas about what copyright law should do about these generative AI systems. You know, they heard some criticism, they heard some praise, and they will be having a notice of inquiry sometime this summer and asking people for comment, and then probably writing a report sometime later this year or early next year, and making recommendations

to Congress. And I know in Europe one of the things that people are talking about is a possibility of some sort of collective license, so that so that creators can get some compensation for the use of their works as training data, but when we're talking

“about the stable diffusion was trained on a data set of, I think, 600 million images.”

Like, how are you going to get, you know, 25 cents to each of 600 million authors, and

Where are they, and how do you get them, get the money to them?

So it's not, it's not going to be an easy thing to solve through a collective license,

but that's another one of the issues that Congress will probably have to contend with. So you just mentioned both Congress and Europe, which is very helpful, because that's where I wanted to take this conversation next. So let me ask a few questions about that, you know, obviously we've been talking so far large in the context of a judicial case and judicial doctrine, but you know, it sounds like

if there's going to be a comprehensive solution to this, it's going to come at least in part from the political branches, Congress and the executive branch.

So hoping you could, you could talk a little bit at a high level about what Congress and

the executive branch's role is in setting out copyright law and copyright policy, and then as it applies in the case of generative AI, you know, what are the interventions that you'd expect, you know, Congress and the executive branch to make in a particular what interventions you think they should make? Well, one of the things that Congress has done and will do is hold hearings, and invite

people to make some presentations, and there's already been one congressional hearing about generative AI, and I expect there will be more in the future, but I'm going to be surprised

if Congress does anything more than just hold hearings.

“The copyright office, I think, is very focused and very aware of the consternation about”

generative AI that has been raised by some other groups and by some visual artists and also by some of the people in the music industry, and so I think they're going to be pretty sympathetic to the concerns of the professional creators. At the same time, they can make some pronouncements, but they, you know, courts are faced with cases that are pending, and those cases are going to keep going unless the motions

to dismiss are granted. There is actually one motion to dismiss in the Anderson case next week, so we'll see what happens with that, but the copyright office can make a recommendation, and it can offer it's own interpretation of copyright law, but courts may or may not find that interpretation

“to be persuasive, so, you know, I think they will do a careful and thorough job, because”

they realize that there is a great deal at stake here. One of the things that's stake is U.S. competitiveness in the international marketplace for generative AI systems, and the Ministry of Justice in Israel, for example, has published a paper, essentially arguing that ingesting in copyright works is, as training data is very useful, and that there isn't infringement unless the output is substantially similar,

and that will attract some investment in developing generative AI, and Japan also has a very broad exception to enable text and data mining, and they too want to be leaders in the field of generative AI, and China wants to also, and so there are, if the U.S. decides not to treat generative AI systems and training data as fair use, then some companies will move their basis of operation elsewhere, and so there's a kind of counter-bailing interest

for the United States, because generative AI right now is a big industry for the U.S. and U.S. firms are doing very well with it, and so, you know, you want the industries to be successful, and of course there will be more generative AI systems developed in, you know, the next three to five years, and it's not clear at this point, what the legal situation

“is going to be, I think all the companies who are defending these cases are well represented”

by good lawyers, and so, you know, they will put up a good fight, but you know, it will be up to the courts to really decide this. I think more than copyright office and more than more than Congress. What about Europe's role in all of this? So, you already mentioned that there are plans that Europe proposes for a compensation system. I want to ask more generally about Europe,

and it's effect in copyright policy in this space, in part because obviously Europe has

Been, I think, much more proactive when it comes to regulating technology ove...

several years than the United States has been, and I think there's a perception, at least,

in, you know, some parts of tech policy that this is kind of Europe's world, and we just

“live in it, and that the Brussels effect is, you know, honestly more important than the DC effect.”

And I'm curious if you agree with that, and if so, sort of what you think the effects of whatever Europe is going to do, will be on the ecosystem of gender they are. Well, I think before gender to the AI was a thing, Europe actually went through a copyright revision process, and decided that essentially ingesting copyright works for text and data mining purposes should be lawful. So, adopted two exceptions to copyright rules to enable

text and data mining. One is for non-profit research institutions, and if they do text and data mining copying, that's actually completely exempt from copyright liability, and it was based on that that a German research institution essentially created essentially a training

data set of 5.8 billion kind of works from the open internet, and that database is available

on an open source basis for anyone who wants to use it as training data. So, that's exempt from liability, as I understand it, under European copyright law, there is a separate one for non-research institutions that is to say for companies and a like that might want to engage in text and data mining. So, text and data mining is also lawful by let's say commercial firms such as Microsoft, but there's an opt-out allowance for that. So, if you are a copywriter

owner and you don't want your work to be used for text and data mining purposes, you can opt

out of the text and data mining regime, and so that's the state of play in Europe, and there are definitely firms that will opt out of text and data mining for commercial purposes. I'd like to finish our conversation by trying to synthesize the many very interesting legal and policy issues that we've talked about, and to ask you to the extent that you should have a view of what the quote unquote right answer here is. In other words, you know,

to the extent that's most compatible with existing law and also with the policy objectives of the copyright system, if you were the judge in these cases, what would be the principles that you would apply and that you would want to see in whatever long-term settlement there is when it comes to these issues of copyright in the genitive AI context? Well, I have sympathy

“with the concerns of many of the professional writers and visual artists. I think that one”

of the things that we are going to be contending with, not just for them, but more generally, is that gender to AI systems and AI systems more generally are going to displace jobs that people have had for a long time. So, you know, the co-pilot system that open AI and get hub, and Microsoft have been promoting. You know, is a way to essentially automate the creation of new computer programs and, you know, programmers have been making a lot of really good

money in the last several decades. And, you know, their jobs are at risk too. So, you know, we're going to have to think about universal basic income for people and finding ways for these systems to be tools that we can use for our good purposes. So, the metaphor that GitHub and OpenAI have for this system that they are promoting of co-pilot. That metaphor

“of co-pilot is, I think, a really powerful one. Obviously, it's a trademark for OpenAI”

and GitHub. But what we'd like, I think, is for AI systems to be our co-pilets, right, to help us in the creation of new works and not to displace us, but to make us more productive,

To make us able to create things more quickly and to be able to spend more le...

doing fun things. So, that's the, that's the happy story out there. And, I'm going to be

“surprised if generate a AI get shut down, but certainly there are people right now who are”

pretty intent on trying to stop them. And, I don't think that's the perfect solution either.

So, you know, I don't think that we're going to get a conclusive answer to these questions

“for at least three years, probably a little longer than that. I would guess that the”

getty images case will settle because that's would be a sensible thing for stability to do. But, these class action lawsuits seem to me to be just too remote from what copyright law has been able to handle so far. So, I'm, I'm kind of thinking that the, the various defenses that have been discussed so far, they seem pretty plausible to me. And so, you know, that's not going to make a lot of people happy, but it is something which, you know,

“copyright can't solve all the problems of the world. Well, I think that's a good place to end”

this. Thank you so much Pam Samuelson for coming on the show. Okay, thank you for inviting me. The Law Fair podcast is produced in cooperation with the Brookings Institution. You can get ad free versions of this and other law fair podcasts by becoming a law fair material supporter at patreon.com/lawfare. You'll also get access to special events and other content available only to our supporters. The podcast is edited by Jan Pat Yajawal and your audio

engineer this episode was known as "Bend of Goat Rodeo." Our music is performed by Sofia

Anne, as always, thanks for listening.

Transcript

Hi, I'm Dana Stuster and I'm a foreign policy editor at Laugh Fair.

Creations.

To Google for digitizing millions of in copyright books from research library...

$9 billion, and, you know, I mean, Microsoft has $9 billion, but $9 billion i...

So you're interfering with a market opportunity that I have.

Where are they, and how do you get them, get the money to them?

Been, I think, much more proactive when it comes to regulating technology ove...

To make us able to create things more quickly and to be able to spend more le...

Compare and Explore

More Transcripts You Might Like

Why AI Needs Ethics First | Nekia Nichelle & Shekhar Natarajan | Live at CES 2026

Success, Spirituality & Staying Real | Shilpa Shetty Kundra x Shekhar Natarajan

Will.i.am on AI, Data Ownership & The Future of Nations | Shekhar Natarajan Podcast

AI Needs Intention, Not Fear | will.i.am on Leadership, Creativity & the Future of Artificial Intelligence

Learning Resources Inc. v. Trump

Dames & Moore v. Regan