Open AI has a freemium model whereby GPT-3.5 is free and GPT-4 (“Plus”) is not. Understandably, there are questions about whether GPT-4 is worth the cost. Is it better? How much better?
One simple answer is: GPT-4 is 8% better.
Why? The perceived quality of the response. Putting other features and benefits aside, responses are the main product of AI platforms, and how users perceive differences among them is a key factor in evaluating their value and price.
GPT-3.5 | GPT-4 | |
---|---|---|
Preferred Response in Blind Side by Side Test N=[437] |
46% [202] |
54%* [235] |
Pulse IQ™ blind side-by-side (SBS) comparisons of the same prompts across several categories reveal:
- GPT-4 responses are preferred more often than GPT-3.5, but only 8% more
- GPT-4 does well on ambiguous queries, whereas 3.5 does better on those related to logic
Implications
There are many reasons why consumers consider paying for AI platform services, including the quality of response and whether it is worth the cost. For a business, response is a critical component to evaluate, but also coupled with other factors: threshold and throttle amounts, data training requirements, API capabilities, data ownership, privacy policies, and of course cost.
And key for businesses is response quality for their consumers, particularly in the first few moments. As we have seen in the search realm, products or businesses NOT on the first page of search results capture a much smaller audience. Businesses need to ensure the AI they choose strikes the right balance between the cost and the information presented, tone, and length. Not easy, but as we see here, testable with the right methodology.