Should I allow my book to be used to train AI?

ShiroKuro

Well, I figured this would come up eventually… I received a message from my publisher asking if I was ok with allowing my book to be used for training AI and LLMs.

Here’s the main part of the email:

START
“I am emailing today as we have begun to receive enquiries from tech companies wishing to purchase the rights to use our eBook material to train AI models. Since our contracts were drafted before this technology was quite so prevalent, we wanted to first ask whether you would be happy for your book to be used in this way. We have not yet accepted any offers but find it encouraging that companies are now requesting permission and recognising that content should be licensed and paid for when using it to train these models.
Some enquiries have been made under NDA restrictions so we may not always be able to share the details of sales. However, we would like to share with you the basic parameters we will be using as a guide for any and all agreements:

The rights granted will not allow licensees to reproduce significant portions of the text verbatim or produce adaptations of the book.
We will negotiate fees in line with the intended use, sector and rights duration.
Authors will receive 50% of the portion of the fee for their book, bringing this in line with the Subsidiary Rights section of your contract.”
END

My inclination is to say no, but I guess I need to think about that more? I did email and ask how other authors are responding.

I’m trying to think about the potential downsides to saying no... For example, if my book is used for training, then those ideas and arguments would potential come up in searches, which is potentially a good thing?? Especially if the source is cited. Copilot (MS’s version of ChatGPT), for example, cites sources.
I haven’t used ChatGPT for a while now because I tend to only use CoPilot when I want to use GenAI for something.

Does anyone know if ChatGPT has started citing sources?

I hope @Mary-Anna Anna will see this thread. Of course the money-issue will be different for her (no one writes an academic book to make money, although I bet Mary Anna's Agatha Christie book sells much better than my like sociolinguistics books! But I'm curious how other authors think about this issue.

Big_Al

My first reaction is no, hell no. It seems like sleeping with the enemy.

My second thought is that accessing books, if used with citations and the limitations already commonly used for scholarly works could improve the accuracy and quality of AI-generated responses. It would work similarly to Wikipedia's requirement for adequate citations for the information being presented. It could also limit the tendency of AI to hallucinate.

I don't know and can't predict how such training and subsequent use might affect human-centered academic research and authorship.

Big Al

Axtremus

Ask for a functional definition for what constitutes “significant portions of the text”
Ask for how actual uses or references to your work in the trained AI models and subsequent derivatives will be measured and periodically reported to you (akin to you getting a periodic report of how many copies of your book is sold), and a formula or fee schedule for how the data in that report will be used to calculate royalty due you (i.e., so you get all the necessary data and information that would enable you to independently audit your royalty).
Tell them you will make a decision on whether to allow your work to be used for training AI models after you have reasonably time to consider the information requested above.

In a world where big tech, publishers, and authors have roughly equal negotiating powers, I imagine the above would be how things get done.

CHAS

HAL 9000: "I'm sorry Dave, I'm afraid I can't do that" That is what this brings to mind. There is no stopping AI, it is too developed to stop and greed rules.
Your book could be a foot in the door for you if/when AI gets out of hand.
The Cylons may be a problem, but no doubt you could handle them.

ShiroKuro

@Axtremus said in Should I allow my book to be used to train AI?:

Ask for a functional definition for what constitutes “significant portions of the text”

Good idea.

One problem is that AI is new enough that I don’t think I can actually give informed consent. I don’t know how things will evolve so I don’t really know what I’ll be agreeing to…

Ask for how actual uses or references to your work in the trained AI models and subsequent derivatives will be measured and periodically reported to you (akin to you getting a periodic report of how many copies of your book is sold), and a formula or fee schedule for how the data in that report will be used to calculate royalty due you (i.e., so you get all the necessary data and information that would enable you to independently audit your royalty).

This is a really good idea, thank you.

CHAS

@ShiroKuro I agree that Ax has given very good advice. Pardon my flippancy.

ShiroKuro

Not at all @CHAS flippant is always welcome!!

Lisa

AI is only going to be as good as the information fed to it to train it. I would be inclined to say yes, both because it means more money for you and it means an AI that has the knowledge you shared with it via your book.

LL#2 is training an AI sports model right now as part of his job. Actually he's QA'ing the training it already has, meaning he asks it questions and sees if it gets them right. He is easily able to tell when the training data it was given isn't adequate and in some cases, identify exactly what it was given to make it respond in a certain way. He shared a funny example yesterday -- the question he asked was "Who were the best major league baseball pitchers who never pitched a no hitter" and its answer was something like "Although Nolan Ryan pitched 16 no hitters in his career, he never threw one with the Mets" -- his question didn't ask about the Mets at all and he was loooking for pitchers who never threw a no hitter at all so that was a totally wrong response. He knows it was trained on a set of articles owned by the company he works for so he looked through the article set and found one called something like "The best pitchers who never no-hitted with the Mets" - that's where the AI got the response from but it clearly didn't have the ability to answer the much more basic question he actually asked.

Ethical questions about whether AI will kill us all someday aside (if it's any consolation, LL#2 has decided AI is actually pretty dumb at this point and we are in no immediate danger, LOL!), I think we should use the best info we can find to train these AI models because at some point, using AI will be just like using google....and if you want accurate information out of it, you will have to put the best info into it. Of course, that means that someday maybe no one will need to buy your book, but I guess the same could be said for google now -- almost anything you'd want to know you can find on the internet somewhere -- and people still buy books so I dunno. That's above my pay grade!

Just my 2 cents.....good luck with your decision!

Piano*Dad

"Authors will receive 50% of the portion of the fee for their book, bringing this in line with the Subsidiary Rights section of your contract.”
END"

Big whoop. You'll get a 50% royalty on the sale of a single copy? Now THAT is a generous offer! Not.

ShiroKuro

@Piano-Dad , I don't really care about the monetary details... wait, let me rephrase that. I don't care if no one makes money off of my book. But if someone is making money off of my book, it should be me and my publisher.

BTW, now that I look at this again, "50% of the portion of the fee for their book" -- what does that even mean. What is the fee? The fee that the tech company pays the publisher? What is that fee?

These details have not been explained.

ShiroKuro

@Lisa your comments are much harder to respond to.... But very, very much appreciated because it's helping me think about this from different angles.

Re this

I think we should use the best info we can find to train these AI models

One reaction I have to this is, why should this responsibility fall to individuals like myself? Maybe that's too knee-jerk, but I'm thinking about who stands to benefit, and sure they're saying they're going to pay the authors (how much, how, these things aren't clear), but ultimately, AI is not going to be free to use (it already isn't free), so who benefits from better AI? The companies behind them. Why should I help train AI so that those companies can make more money in the future?

Another reaction I have is that there's training, and then there's use. I'm less worried about training than I am about use. AFAIK, ChatGPT still doesn't really cite its sources when it gives out information in the use context. I am more familiar with CoPilot, which does cite its sources, or at least tries to. But one of my concerns is that my work won't be cited.

Ethical questions about whether AI will kill us all someday aside (if it's any consolation, LL#2 has decided AI is actually pretty dumb at this point and we are in no immediate danger, LOL!),

The question isn't whether AI will kill us (I believe you were joking here but...) bu there are other, more important questions that are more troubling imo. One is how should AI be incorporated into educational and occupation activities that were once things only humans could do. Another might be how can AI make the world of work more equitable? How can AI be harnessed to free up humans to engage in more meaningful activities?
These are the questions I'm concerned with, but I don't trust the agenda of most tech companies to meaningfully include these questions.

BTW I am pretty familiar with how dumb AI is from the AI activities I have my students to help them understand the problems with relying on AI...

So back to my question about how to respond to this request...
I think the "what's in it for me" attitude is generally selfish, so I want to stress that my concerns are not primarily the money questions... but there are too many unknowns and it's hard to be comfortable with just saying "sure, go ahead, use my book, do whatever you want with it..."

AdagioM

I feel like a semi-Luddite, but I don’t want AI training off my work so that it can replace what I do. And my book is like my baby; I’m very protective of it.

Axtremus

Thinking about this more generally ...

"Training AI and LLMs" is the machine version of getting an entity to learn something. The flesh-and-blood analog is "getting a person (student) to learn something."
If a human-person reads your book, presumably you get a bit of royalty (roughly that of selling one copy of your book). The human-person then goes on to do whatever with whatever he learnt from reading your book -- short of that human-person making additional copies of your book or quoting substantial content from your book (to the point of violating copyrights), you derive no "residual" income from that human-person who has read your book.
So maybe the AI developer is thinking/arguing "ok, I buy one copy of [your] book from the publisher to train my LLM", akin to "*I buy a copy of [your] book from the publisher/author to train my human-employee." You get no further share of whatever economic value generated by the LLM just like you get no further share of whatever share of whatever value generated by the human-employee.
Of course an LLM is not the same as a human-employee -- the LLM can be duplicated into a billion exact copies, will never quit its job, has perfect memory, and will effectively live forever. I.e., the long-term potential economic that can be generated by the LLM may be "infinite" compared to a human-employee.

Will that argument work? Is the arrangement "fair"? I suppose that's the billion dollar question to be shaken out in the years to come.

WTF-Beta

Should I allow my book to be used to train AI?