Artificial intelligence is eating the world. The pace at which machines have recently been able to achieve human or even superhuman performance on tasks previously requiring human intelligence has been breathtaking. AI has the potential to become a profoundly useful assistant and tool in almost every aspect of human life, but it’s an open question as to how the role of people will change within this emerging new world.
However, one area in which people will remain critical for the foreseeable future is in determining what, exactly, people want. Traditionally many AI models learn from training data in which the right answers have been provided by people. For example, if you want to teach an AI model to tell whether and where there is a cat within a picture, you can get people to annotate tens of thousands of pictures with this information. People tend to be good at this, and there are agreed-upon right answers. Or, sometimes, a model can be trained by generating its own data where there is a clearly defined objective like winning a game of chess. But what about when there is no obvious right answer? If a large language model (LLM) is asked to write a poem, how do we tell if the output is any good? Beyond a certain foundational level of quality, there is no objective, agreed-upon right answer. It depends upon the user’s tastes and preferences.
This is where feedback and data from real world users is critical. Today, the industry is unable to effectively use computers to generate responses that accurately mimic real-world users, and response data from one demographic cannot easily be replaced with data from another. AI models can only get better at satisfying the subjective desires of real-world users by learning from data generated by those same users. There is no substitute.
Critical areas in which this real-world feedback is essential include:
- Understanding prompts - “Prompt engineering” is one of the buzzwords associated with generative AI, and it means correctly fashioning the prompt given to the model to achieve the desired result. Ideally, the onus of understanding shouldn’t rest with the user, but with the model. The model should understand what the user wants based on the user’s own words.
- Aesthetic judgments - What is the style of a poet, an artist, or a speaker? Much of the output from generative AI attempts to mimic that produced by specific individuals or groups of people, and how close it gets is a question of subjective judgment. Whether an AI-generated image looks like it’s in the style of, say, Michelangelo isn’t fundamentally based upon how close it matches objective measures like opacity or proportion, but upon whether the intended audience thinks it looks like Michelangelo.
- Humor - There are no objective ways to measure whether something is funny, and what’s hilarious to one crowd might be boring to the next. What’s true for stand-up comedy is true for AI models - ultimately, the only way to know whether humor works is to put it out there and see if the audience laughs.
There’s something else that’s as true for AI models as it is for people and products - you never get a second chance to make a first impression. You don’t want to launch your models without testing them on a sample of your target audience.
If you’re building an AI model you need real-world feedback. The Pulse Labs InsightStudio is the top-of-the-line solution used by major tech companies for acquiring it. If you’d like to know more, please reach out to us. We’d love to find out what you’re working on and see how we can help.