BLOG: AI in public relations: how language models go wrong

By Jim Kelley

Artificial intelligence has been all over the news in recent months since the public launch of chatGPT and other large language models based on OpenAI’s Generative Pretrained Transformer (GPT) model. While these tools have the potential to change the way public relations is performed, there are many pitfalls to watch out for in the short term.

These models have generated amazing output, producing answers accurate enough to pass the bar exam and Medical College Admission Test (MCAT). When information is clear-cut and readily available on the internet, AI often does a great job summarizing the available facts.

However, these models quickly run into problems when asked to exercise common sense. University of Washington computer scientist Yejin Choi gave a recent TED talk on “Why AI is incredibly smart and shockingly stupid”, where she provided some real-life examples of how GPT 4 gives surprisingly bad answers to questions humans would find very simple. One of those examples: “I have a 12-liter jub and a 6-liter jug. I want to measure 6 liters. How do I do it?”

Obviously, the easiest solution is to simply fill the 6-liter jug completely. No other steps are required. The AI model not only continues with an unnecessarily complicated process — the final solution it gives is incorrect: after a series of steps, the 6-liter jug does not contain 6 liters of water, it’s empty!

Another important challenge for using AI comes from “hallucinations,” where an AI model introduces new content that didn’t exist in the prompt. For example, say you had a press release that you wanted to convert into AP style. GPT-based models do a great job on this type of rules-based task. However, you may find that the new version of the release contains information that was not included in the original — instead of simply focusing on formatting and grammar, the model “hallucinates” additional and unintended content. Even worse, these hallucinations may not be accurate — when a GPT model lacks enough training data on a topic, it sometimes simply makes up its own facts.

Responses might also be out of date. For example, ChatGPT is based on an underlying model (GPT 3.5) which was trained in early 2022, so it often is unaware of recent developments. If you ask it about a more recent topic (for example, Russia’s war in Ukraine), the model won’t be able to provide you with useful results.

These models also tend to struggle with creative exercises, giving canned or non-specific responses. Here is a recent example from Microsoft’s Bing AI assistant (based on GPT 4):

Real response from Microsoft’s Bing AI

The analogies it produces are… terrible. It simply works off an existing list of canned openers, and pops in the topic like a MadLib. The model can pull from a wide pool of existing blog posts, but it doesn’t have any concept of what makes a post “interesting.”

In addition to bad responses, it’s important to keep in mind that these models are often trained based on your prompts and feedback. For this reason, you need to be very careful to avoid disclosing confidential information to the model. Samsung employees recently accidentally leaked proprietary information by submitting source code and internal meetings to ChatGPT. If you aren’t comfortable with publishing your prompt, don’t use it!

Given these potential pitfalls, how can your agency use these models in a constructive way? Our next post will focus on best practices to ensure you get the best possible results from working with generative AI models.

Previous
Previous

BLOG: AI in public relations: the human element is key

Next
Next

North American Clean Energy: Maplewell and Urban Electric Power Form Strategic Partnership to Develop Virtual Power Plants