GPT leads both user's platform choice and in blind side-by-side comparison
As part of our Pulse IQ Research, we have users compare prompts and responses from different AI platforms in a blind side-by-side comparison. This time, we asked users to review work-related prompts. In the latest round of comparison, 59% of users preferred responses from GPT-4, edging past GPT-3.5 at 57% and Bing at 51%.
The chart below shows historical side-by-side rankings on general topics. Compared to past general performance, Bing performed much higher on workplace-related prompts.
Historically, in the self-directed portion of our research, we've assigned users an AI platform. This time, however, we asked participants to use whichever platform they felt would best fit their workplace needs. We found that users gravitated towards GPT-4.
Side-by-Side Deep Dive
People find GPT-4 to be the most work appropriate
In the blind, side-by-side comparison, users preferred GPT-4 due to its well-balanced responses, which they feel combine a professional tone with simplicity. They found the formatting and structure of the output appealing, making it easy to digest. Additionally, users valued that the responses were nearly work-ready, requiring only minimal edits for their specific scenarios.
“This response is perfect for use at work and I would be able to easily incorporate it into email communication, SOP or presentation.”
—William, 49, GPT-4
“Because it could include objectives, steps, activities, homework, materials needed, etc, for a lesson plan.”
—Maria, 40, GPT-4
Each platform has different areas of strengths
In an earlier Pulse IQ article we saw that different platforms have different "DNA," and we began to see how preference varied across platforms depending on type of prompt. For example, when it comes to generating ideas, GPT-4 takes the lead with a 74% advantage. However, when it comes to content generation, GPT-3.5 has a slight edge at 69%. For handling formulas and functions, Bard and Bing are evenly matched at 64%. In terms of design, GPT-3.5 stands out with a score of 68%, while in the domain of coding, Bing leads the pack with a substantial 87% advantage over the others.
GPT-3.5's strength lies in empowering users with comprehensive information and resources, fostering both confidence and room for creative thought and development.
“It feels complete and carries the appropriate response I want. it doesn't give me its opinion, it allows me to think.”
—Dave, 33
Bing's responses struck a balance by being of optimal length, supplemented with relevant additional information through links. While a few users noted a slightly less formal language tone, this wasn't a concern in certain workplace settings.
“I would say so since it still answered the question even if it may be slightly informal, but the workplace isn't so strict on word usage.”
—Dustin, 27
Users appreciated Bard's responses for their thoroughness and found them appealing due to their more human-like and conversational tone.
“It is just right and appropriate, it sounds like I am interacting with a human.”
—Clark, 33
Self Directed Highlights
In the self-directed phase of our AI at Work research, users had the freedom to select their preferred platform. GPT-3.5 emerged as the top choice, closely followed by GPT-4 and Bard. Notably, GPT-4 and Claude achieved 100% satisfaction rate among their users, while Bard users showed high satisfaction at 92%. This demonstrates that these platforms are highly effective in meeting user expectations.
Implications
- The publicly available generative AI platforms are not one size fits all. Employers can help employees choose the right platform for specific tasks to improve productivity, or look into purchasing an industry-specific generative AI tool. For example, law firms may look to something like Leah while an ecommerce or finance company that relies on analyzing vast amounts of data may benefit from a tool like Tellius.
- Larger companies may want to explore building their own proprietary generative AI to both reduce their security risks and make sure their teams are getting results that are based on their business's data and relevant to their specific use cases. Tools like Snowflake and IBM's Watson allow developers to create custom AI tools without building complete LLMs from scratch.
- Study participants' comments about and appreciation of AI responses with less formal language highlights evolving workplace language standards, which employers can consider as part of their company's communication culture.
Pulse Labs can help you create custom research to determine which Generative AI tools are right for you. Contact us to learn more.