2025-04-20 09:32

Posted on 2025-04-20

Clearly the move here is to come up with a large language model that writes really good prose that advocates for the policies that you want.

The draft says the department must greatly expand its use of artificial intelligence to help draft documents, and to undertake “policy development and review” and “operational planning.”

—https://www.nytimes.com/live/2025/04/20/us/trump-news#trump-state-department-overhaul

No doubt other people thought of this before me. I wonder how far along they are?

Not really AI (happens every time)

Posted on 2024-08-21

Philip Brewer

As someone who’s been paying attention to AI since the 1970s, I’ve noticed the same pattern over and over: People will say, “It takes real intelligence to do X (win at chess, say), so doing that successfully will mean we’ve got AI.” Then someone will do that, and people will look at how it’s done and say, “Well, but it’s just using Y (deep lookup tables and lots of fast board evaluations, say). That’s not really AI.”

For the first time (somewhat later than I expected), I just heard someone doing the same thing with large language models. “It’s just predicting the next word based on frequencies in its training data. That’s not really AI.”

Happens every time.

An actually useful use-case for large language models

Posted on 2024-01-24

Philip Brewer

I just thought of a possibly actually useful use-case for large language models (what’s being called AI these days): Generating metadata for your photo library.

This is useful, because almost nobody is willing to generate their own metadata for photos. Most people have vast libraries with literally nothing but the date, time, and location captured by their phone or camera, the image itself, and details of the capture (exposure time, ISO, etc.).

Using the date, time, and location info, together with the image itself, AI could:

Write a brief description of the image.
Tell you where it was taken from (not just the latitude and longitude, but the name of the place where you were standing).
Look up if an event were underway at that place and time and say what it was (county fair, protest march).
Tell you any number of arbitrary things, like if there was something going on with the weather at that time (blizzard, wind chill advisory)—but only if it was interesting.

I know Google Photos can already do some of this. I don’t think it writes metadata for you, but it will find all of your photos that were taken in St. Croix, for example. (I’d heard that it could locate all your photos of a particular sculpture, but it didn’t work for the sculpture I just tried to find.) In any case, an LLM running on your own computer, saving the data to your photo library, would have all kinds of advantages. There are the obvious privacy advantages, but also sharing advantages—the metadata (or a subset that you selected) would be available to be included when you shared the image with a friend.

Regulating Artificial Intelligence with copyright

Posted on 2023-05-26

Philip Brewer

There’s a lot of talk these days about the risks of AI, with many suggestions that it should be “regulated,” but with little specificity of what regulations would be appropriate. As usual, anybody who has an AI loves the idea of some sort of regulation, which would serve as a barrier to entry for competitors.

I have a suggestion that avoids that trap, minimizes the harm of regulation, and yet sharply constrains the opportunities for AIs to do bad stuff. It’s also easy to implement, because it requires little or no new legislation.

It’s very easy: enforce copyright laws.

Any firm that uses or makes available a large language model AI should be required to identify every copyrighted text used in training the model, and then share with all the copyright holders any revenues that the use or availability of the AI brings in.

This burdens existing AIs whose creators thought they were getting all their content for free by scraping the web for it, while giving a big leg up to any AIs that are simply trained on a corpus of text that the AI owner has the rights to. (I read about a physician who had been answering patient questions by email for twenty years training an AI on his numerous messages. He’d be fine.) That seems all to the good.

As to how much to pay the copyright holders, I think the publishing model of the past couple hundred years provides a good guide. Roughly speaking, book publishing contracts proved half the profits to the writer—but because it’s too easy to game the expenses side of the business to make the profits disappear, the contracts are written to provide something more like 10% to 15% of the gross revenues. That would probably be a reasonable place to start.

The huge cost of actually identifying each copyrighted text used, and finding the copyright owner is very much part of the desired outcome here: We don’t want people pointing at that difficulty and then saying, “Well obviously we should be able to just steal their work because it’s too much trouble to figure out who they are and divvy up the relatively small amount of money they’re due.” Making firms go through the process would provide a salutary lesson for others tempted to steal copyrighted material.