Gemini Ultra vs GPT-4 (Which is better at completing creative tasks?)

It’s only been four days since the release of Gemini Ultra 1.0 and everyone’s going crazy about it. So which LLMs win in completing creative tasks?

Feb 14, 2024

A few months back after announcing the Gemini family of models, Google finally released Gemini Advanced – a new experience with access to the Ultra 1.0 model, their largest and most capable AI model.

With Ultra 1.0, you’ll have access to expanded multimodal capabilities, more interactive coding features, deeper data analysis capabilities, and more. It’s available as part of their new Google One AI Premium Plan for $19.99/month. And the best part is that they’re giving us a two-month trial at no cost to play with the model.

There is definitely lots of excitement going around about this release, with many comparing its features and capabilities to the god of LLMs, ChatGPT. So today, we’ll compare GPT-4 by OpenAI with the Gemini Ultra 1.0 model, especially for writing and creative tasks.

Gemini Ultra vs GPT-4

First of all, according to the comparison table below, Ultra beats GPT-4 in 7 out of 8 benchmark tests, and is the first model to outperform human experts on MMLU (massive multitask language understanding).

A chart showing Gemini Ultra’s performance on common text benchmarks, compared to GPT-4 (API numbers calculated where reported numbers were missing).

Another interesting fact about the Ultra model (from what I see online) is that it’s much more conversational compared to GPT-4. But, the question is, is it better or worse than GPT-4 in completing writing and creative tasks? That’s what we’ll try to find out here.

Creative Writing Test

I would expect Gemini Ultra to be better in creative writing because of its conversational nature compared to GPT-4. And indeed, GPT-4 response is more robotic and Gemini Ultra response could pass off as written by a human in my opinion.

Write a short story of a teacher who scolded his student, who in turn became the teacher's teacher.

Image Creation Test

Both models support image generation with DALL-E3 for GPT-4 and Imagen 2 for Gemini. Both did great on this one. But if we were to compare the realism for both, GPT-4 won.

Create a realistic image of an empty room with no human working at a desk. No human at the desk.

The ‘Transparent Bag’ Test

For our logical reasoning test, I decided to try the ‘transparent bag’ test I saw on Reddit. Without surprise, both models gave detailed reasons as to why Jimmy might think that the bag contain chocolate.

The transparent bag was empty 5 minutes ago. Then, Sara put peanuts inside of it. Sara gives the bag to Jimmy and tells him there is chocolate inside of it. The bag is labeled "chocolate" on the outside. Jimmy then looks at the bag. What does Jimmy think is in the bag

These models still require you to be specific with your prompts. Statements like “Jimmy then looks at the bag” must be included to tell the models that Jimmy indeed looked at the bag and did not blindly open it.

Verdict

I believe that both LLMs are great. But I personally still prefer GPT-4 as my go-to LLM for commonsense reasoning and creative work. In some AI blogs, it was highlighted that Gemini Ultra 1.0 is nowhere near close to GPT-4. Still, these models work best if the prompts given to them are good. And it’s really up to you to decide which one is the best for your use cases.

AI Tools to Boost Your Productivity

Code Design - Build and design your website by writing a simple text prompt and host it in minutes using AI.
Writesonic - AI-powered SEO-friendly content creation for blogs, websites, and articles.
Namelix - Save hours coming up with the ‘perfect’ name for your business. Generate brandable business and domain names in just seconds or minutes.
Simplified - Supercharge your content creation with the power of AI.
Content at Scale - Create human-like, undetectable, and risk-free AI-generated content for website owners and marketers.
Superhuman AI - AI-powered email built for high-performing teams. It's compatible with Gmail and Outlook email accounts and is available for macOS, iPhone, iPad, and Chrome using a browser extension.

In partnership with Figma

Brainstorm, design, and build better products – from start to finish with Figma

Whether it’s consolidating tools, simplifying workflows, or collaborating across teams and time zones, Figma makes the design process faster, more efficient, and fun—all while keeping everyone on the same page.

Personally I used Figma daily to create my creatives and prototypes, and has been a game changer in my design process. Figma's intuitive interface and collaborative features have significantly streamlined my workflow. Try it out if you haven't.

Top Learnings of the Week

The ‘most responsible’ AI was released: Goody-2, a LLM that is so safe it won’t answer anything that could be construed as controversial or problematic. It is outrageously safe and every conversation stays within the bounds of their ethical principles (link)
The power of systems thinking: If we want to get something in life, our personal attributes will not directly help or be responsible for our success. In fact, what matters most is the machinery or systems we create in our lives and how sustainable it is. The machinery or systems does not give two shits about your personal attributes.

Alvis Oh's Newsletter

Discussion about this post

Ready for more?