To detect or not to detect LLM output

Recently the question of whether and how text generated by large language models (LLM) can be distinguished from human text showed up in the W3C Semantic Web mailing list. A solution that seems obvious at first glance would be to use the same technology for this purpose. There are already some tools that do exactly that. This blog post lists some of them. But that’s not a long-term solution. Let me explain why:

Deep Learning based classifiers

The current LLM may have some typical patterns which can be used to distinguish LLM text from human text, but the next generation will improve and be tested against the existing detection solutions. This challenge of generator vs. detector will not last very long. It’s already difficult at the moment, and when both get closer to each other, the signature will be more subtle. That will require more text to improve the signal-to-noise ratio. Most of the time, you will not have enough content unless the content was generated for the sake of testing the detector.

Generative Adversarial Network

Btw, there is a deep learning approach that is quite similar but in a single model. It’s called generative adversarial network (GAN), which is based on the idea that a generative network challenges a discriminative network.

What can we do?

So what can we do? We have to use the same approaches to determine whether a text was written by person A or person B. Chain of trust and web of trust could be useful tools. A known state of knowledge at the time a text was written can also help. If a text contains information that was not publicly available at the time of writing and we are able to trace back the information, it could be used as an indicator.