As someone who’s been paying attention to AI since the 1970s, I’ve noticed the same pattern over and over: People will say, “It takes real intelligence to do X (win at chess, say), so doing that successfully will mean we’ve got AI.” Then someone will do that, and people will look at how it’s done and say, “Well, but it’s just using Y (deep lookup tables and lots of fast board evaluations, say). That’s not really AI.”

For the first time (somewhat later than I expected), I just heard someone doing the same thing with large language models. “It’s just predicting the next word based on frequencies in its training data. That’s not really AI.”

Happens every time.

There’s a lot of talk these days about the risks of AI, with many suggestions that it should be “regulated,” but with little specificity of what regulations would be appropriate. As usual, anybody who has an AI loves the idea of some sort of regulation, which would serve as a barrier to entry for competitors.

I have a suggestion that avoids that trap, minimizes the harm of regulation, and yet sharply constrains the opportunities for AIs to do bad stuff. It’s also easy to implement, because it requires little or no new legislation.

It’s very easy: enforce copyright laws.

Any firm that uses or makes available a large language model AI should be required to identify every copyrighted text used in training the model, and then share with all the copyright holders any revenues that the use or availability of the AI brings in.

This burdens existing AIs whose creators thought they were getting all their content for free by scraping the web for it, while giving a big leg up to any AIs that are simply trained on a corpus of text that the AI owner has the rights to. (I read about a physician who had been answering patient questions by email for twenty years training an AI on his numerous messages. He’d be fine.) That seems all to the good.

As to how much to pay the copyright holders, I think the publishing model of the past couple hundred years provides a good guide. Roughly speaking, book publishing contracts proved half the profits to the writer—but because it’s too easy to game the expenses side of the business to make the profits disappear, the contracts are written to provide something more like 10% to 15% of the gross revenues. That would probably be a reasonable place to start.

The huge cost of actually identifying each copyrighted text used, and finding the copyright owner is very much part of the desired outcome here: We don’t want people pointing at that difficulty and then saying, “Well obviously we should be able to just steal their work because it’s too much trouble to figure out who they are and divvy up the relatively small amount of money they’re due.” Making firms go through the process would provide a salutary lesson for others tempted to steal copyrighted material.