Ep 223: Anthropic Claude 3 – Better Than ChatGPT and Google Gemini?

Episode Categories:

Evaluating Anthropic's Claude 3 Against Industry Giants ChatGPT and Google Gemini

Artificial Intelligence (AI) is wheeling into a new dawn with the latest AI models boasting impressive capabilities. In this wide spectrum, Anthropic's Claude 3 emerges as an intriguing option. But how does it hold up against its formidable competitors - ChatGPT and Google Gemini? This examination dives into these AI models' capabilities, bringing their strengths and weaknesses into the limelight.

Anthropic's Claude 3 – An Overview

As a product of a heavily financed startup, Claude 3 hails from the house of Anthropic - a company launched by former OpenAI employees. This third release in a year comes in three flavors - Opus, SONNET, and Haiku, each with varying costs and capabilities. Opus excels at task automation, interactive coding, and strategy, while SONNET proves to be a star in data processing. Lastly, Haiku shines in customer interactions and translations, demonstrating AI's diverse range.

The Race between Claude 3, ChatGPT, and Google Gemini

In a performance-based comparison, Claude 3 offers improved speed and efficiency with a twofold increase in accuracy, an impressive feat. However, its detailed examination reveals some hiccups. In a non-scientific interactive test, Claude acknowledged basic company details but fell short in providing a 10-step marketing plan—an area where ChatGPT outshined with its accurate and comprehensive responses.

Even in poetic challenges and logic puzzles, both models delivered impressive results, hinting at their evolving problem-solving capabilities. However, ChatGPT emerged with a slight edge over Claude in generating well-structured, relevant responses.

The Impact of Computer Vision in AI

One area both models excel in is implementing computer vision. Both have the potential to interpret images and extract text from complex files like PDFs. Despite room for improvement, these models demonstrate AI's prowess to revolutionize data processing and understanding.

The Bottom Line

The constant rise of AI technology brings innovations like Claude 3, ChatGPT, and Google Gemini, each unfolding new horizons. Yet, it’s important to remember that each model benefits different demographics due to their varying capabilities. With advancements, AI promises to strengthen its applicability across sectors, offering something for everyone.

Topics Covered in This Episode

1. About Anthropic Claude 3
2. AI Model Challenges and Tests
3. Discussion on Claude Opus API
4. Comparison of ChatGPT, Claude 3, and Google Gemini


Podcast Transcript

Jordan Wilson [00:00:16]:
Anthropic just released its new model, Claude 3, and the Internet is going crazy. Alright. Well, actually, the Internet's not going crazy. People aren't talking about Claude 3 a lot. But should they be? I mean, is Claude 3, the new model from Anthropic, now better than Chat, GPT, and Google Gemini? Or is it just another large language model that may or may not get used? Well, we're gonna be answering those questions today and more on everyday AI. Thanks for tuning in. My name is Jordan Wilson, and I'm your host. And everyday AI, it's for you.

Jordan Wilson [00:00:55]:
It's to help you keep up with what's going on in the world of generative AI to grow grow your company and to grow your career because there's so much going on literally every single day. So you can waste so many hours, or you can just tune in to everyday AI every day and learn from us. So, today's show, technically prerecorded, debuting it live. You know, it actually works out. You you know, I'm I'm doing a little bit of traveling and it's actually great for some other, guests that I've been able to interview, recently. So, in the coming days, we're gonna have a couple prerecorded shows. I'm gonna be joining live most days. Don't worry.

Jordan Wilson [00:01:28]:
So still get your questions in. And I know a lot of the guests are gonna be answering as well. Alright. With that, if you don't know, your everydayai.com is the place to be. It is the spot to learn whatever you need to learn about generative AI. I tell people it is a free generative AI university. Type in whatever you need. Go search by category.

Jordan Wilson [00:01:49]:
We probably have a handful of dozens of of experts that we've already talked to who are who have already figured out that thing that you're trying to figure out. So make sure to go to your everydayai.com. And if you do need that daily dose of AI news, it's gonna be in the newsletter as well. So make sure you go to the website, check all that out. Alright. So let's get to the top of the question, the top of the show here. Is Claude 3 better than ChatGPT and Google Gemini? Right? And we're gonna get into a little bit of background and some quick history and facts. If you don't, if you aren't very aware of, you know, Claude and anthropic, don't worry.

Jordan Wilson [00:02:32]:
We're gonna be giving you the bio. But here's here's here's what you need to know. Not a lot of people are talking about it right now. Right? I I assumed when, Claude 3 was released. Right? We've we've known for a while it was coming. Right? Everyone's always there's always rumors about the next model, you know, g p t 5 and, you know, Gemini, you know, before it came out. There's always rumors swirling around and, you know, these are the 3 big players. You have you have Google with Gemini.

Jordan Wilson [00:02:59]:
You have OpenAI with ChatGPT, and you have Anthropic with Cloud. And I think that, you know, Mistral is is, you know, kind of in the in the, top 4 as well. Right? But top 3, I mean, there's no denying. It is it is those 3. So I was actually kinda surprised. You know, it's only been out for about a day and change, and there's not a ton of people talking about it. So I said, alright. I'm gonna dive in deep.

Jordan Wilson [00:03:22]:
So I've I've spent a couple hours with Claude so far, taking it through some basic tests, and we're gonna be doing some tests here live on the show as well. So if you are listening to the podcast, as always, I recommend you check out your show notes. There's always some, some goodies in there, so so make sure you check that out. Alright. So with that, let's just start at the top. Alright. So here's here's what's happening. According to Anthropic's benchmarks, it is the best model out there, bar none.

Jordan Wilson [00:03:50]:
Right? Not even close according to Anthropic's benchmarks. Alright? So we always gotta keep this in mind. I said the same thing, when, you know, Google Gemini in December came out with their benchmarks. I always say these are not third party. Right? These are internal benchmarks. But at least one thing that we have here is consistency, which is nice. We didn't get that very much or we didn't get that as much from Google. Alright.

Jordan Wilson [00:04:14]:
So if you are on the podcast, don't worry. I'll do my best to explain this to you. So, we we we talk about this one benchmark on the show all the time. So it's MMLU, multitask language understanding. Alright. So, essentially, it's 57 different subjects, you know, across think of it like, an SAT and an ACT but for large language models. Right? And and what this and and why this has kind of become the gold standard, for large language models in their benchmarks is because this is how you can kind of most accurately tie a large language model's performance to a human. Right? That's why it's across all these different subjects.

Jordan Wilson [00:04:49]:
It's it's a very technically difficult test. And according to Anthropic's benchmarks that they released on their website at the same time they released cloud, Claude 3, is it crushes everyone. Well, it doesn't crush everyone, but it wins every single category. So specifically for MMLU, it's just barely barely above gpt 4. So gpt 4 scored an 86.4 and, Claude 3 scored an 86.8. So just marginally better. You know, Google was in the 83 for their Ultra and 71 for Gemini Pro. So it you know, that's the other thing that struck me is, you know, I was technically expecting a little bit more, which I know, like, like, y'all might be like, alright, Jordan.

Jordan Wilson [00:05:33]:
You're weird. Why would you expect more? They're winning literally in all of these benchmarks. Well, because the g p t four model is, like, almost 2 years old. Right? It's it's it's been, being developed for 2 years. It's been out for, a year and a half now, I think. So that's why. Right? It's it's also I mean, you have to, you know, tip your cap to OpenAI. The fact that they released GPT 4 so long ago and it is now just finally being passed, you know, this this much time later, it's it's pretty impressive.

Jordan Wilson [00:06:07]:
But that also to me tells you're right. And and people are like, oh, Jordan. Like, what large language model should I be using? Well, I always say chat gbt. I'll I'll answer at the end if if, quad 3 changes my mind. But, you know, presumably, whenever, gbt 5 or gbt 4.5 comes out, it is gonna be so far ahead of all of these other models, both in terms of real world performance benchmarks. You know, we call large language models, treating them as business operating systems. Right? It's gonna be so far ahead. But right now, according to Infropic's own, internal benchmarks, Claude is ahead.

Jordan Wilson [00:06:42]:
And it is important here. We're gonna talk about this. There's 3 different Claude models. So the most powerful model is OPUS. So similarly how, you know, Google has their base model, which is Gemini Pro, then they have their Gemini Ultra, and they've actually upgraded it to 1.5, but not everyone has access to that. That's another story. Same thing with, chat gbt. They have their free model in 3.5 and then their paid or premium model in 4.

Jordan Wilson [00:07:07]:
Then with Claude, you have your basic or your baseline, haiku model, then you have your mid tier sonnet, and then you have your, know, your premium OPUS. Alright. So when we're talking about, anthropic out benchmarking everyone, it is with their, highest tier model which is OPUS. Alright. So now that we got that out of the way, we know it's it's these 3 companies, large language models, everyone is head down sprinting, trying to, you know, gain gain the biggest market share, trying to raise the most money, but you might not know Anthropic. Right? Maybe you do, maybe you don't. They're they're a start up, but they are well funded, and they are heavy hitters. Right? So a lot of people don't know or don't realize, yeah, OpenAI is a start up too, but they've been around since 2015.

Jordan Wilson [00:07:53]:
Right? They, were some of the pioneers in the in the GPT space. Right? So here is Anthropic, high level, what you need to know. This is like if Anthropic had a basketball card. These are the stats on the back. Right? Alright. So it was founded in 2021, and the founding team is former OpenAI employees. Alright. Yeah.

Jordan Wilson [00:08:14]:
We're gonna see a lot of that in startups of the future, FYI. It's based in San Francisco, California. Currently raised, alright, $7,300,000,000. 1,000,000,000. I I think that has to put them in, top top 3 to 5 of, GenAI startups with the most money raised. Investors. Their investors is a who's who list of who's who in tech. I don't know any other company, and and somewhat, if you're in the comments, let me know.

Jordan Wilson [00:08:42]:
I don't know any other company that has this level of heavy hitters. Alright? So Google, Salesforce, Zoom, Amazon, and Microsoft. Yeah. Those are big partners. Yeah. They they have dozens of others. You know, a lot of the big capital firms, you know, private equity, venture capitals, etcetera. But look at that.

Jordan Wilson [00:09:04]:
That is, who's who, literally, of of investors. Alright? Current valuation is more than $18,000,000,000. Alright? Their revenue, so it is projected for 2024, they're gonna be at about $850,000,000 annualized. Alright? So, that's kind of by the end. That's what they're hoping or projecting to hit. And then there are different releases. Alright. This isn't the first release, but it is the 3rd release in a year.

Jordan Wilson [00:09:30]:
So the first version of Claude came out in March 2023. You had cloud 22.1, you know, but, Claude 2 came out in July 2023 and then here we are, present day, Claude 3 in the 3 different varieties, March 2024. Alright. Let's keep going over some of the high level. Alright. So these are some changes. These are some things that have changed a little bit from, the last model, which I believe was 2.1, to now 3. Alright.

Jordan Wilson [00:09:59]:
I already talked about this, but you have 3 models, Opus, Sonnet, and Haiku. Alright. Overall, you have improved performance, speed, and efficiency. According to anthropic, there is a twofold improvement in accuracy, so fewer hallucinations. They also mentioned, a more willingness for for Claude to answer questions. You know, that's something they pointed out previous versions. Claude would all the time just say, I can't really do this. Sorry.

Jordan Wilson [00:10:27]:
So more, more willing to answer more questions, twofold improvement in accuracy, at launch, which is now a 200 k context window. Alright. So that's 200,000 tokens. Alright. So, essentially, without getting too far into it, we've had, you know, plenty of episodes and plenty of talks about tokens, and even the tokenization process, why it's important, etcetera. But, essentially, that's like a memory. Right? So it's got a huge, huge memory. Almost, let's see.

Jordan Wilson [00:10:58]:
I should have done this math ahead of time, but I think that's, so so gbt 4, if you're using chat gbt is 32 k. So at at 200 k, yeah, that's what? 6 6 times, 6 and change. So, more than 6 x, right now, of the context window than ChatGPT at launch, and they talked about the capability to accept prompt inputs up to a 1000000 tokens. Right? Memory is so important when it comes to working with a large language model. You see the same thing with, Gemini Ultra 1.5, I believe, is that a 1000000, tokens as well, but most people don't have access to this. It did seem at least from, Anthropic's kind of release that the capability to have that 1,000,000 tokens, it's more of a capability, and it's more on a on an as needed basis. You will be paying for it because if you're using the API, these have a cost and we're gonna get into that here in a second as well. Alright.

Jordan Wilson [00:11:54]:
So here is our cost. It's kind of like a cost in power grid. Alright. So, if if if you're joining us on the podcast, I think I can explain this graph. But essentially, you have a a cost on one access and intelligence on the other. Right? And obviously, the more the more these models cost, the more intelligent they are. So, Haiku is kind of the lowest intelligence, lowest cost, but it's also the fastest. Right? SONNET is somewhere right there in the middle, and I'm gonna tell you what all these different models are, you know, being positioned for, from a marketing perspective.

Jordan Wilson [00:12:27]:
So SONNET is right there in the middle, and then OPUS is, you know, off the charts intelligence and crazy expensive. Right? But I wanna talk about that later because there's something I think pretty special there. Alright. So Opus. Well, I'm gonna talk about it now because actually it's one of these bullet points. So, we're gonna talk about cost. So the cost, there's gonna be 2 costs for each of these models. So one is the input and this is the price per millions of tokens or or 1,000,000 tokens and outputs.

Jordan Wilson [00:12:59]:
Alright. So essentially, I'm gonna flip through these and then flip back so we can do all the prices at once. This is if you're using the API. Alright. So Opus is 15 mil sorry. $15 per 1,000,000 tokens. Input, 75,000,000, output. Alright? And then SONNET is 3 mil, $3 per million input, 15 per million output.

Jordan Wilson [00:13:23]:
And then you have Haiku, 25¢, million input. 1 dollar 25¢, 1,000,000 output. So quite a range there. Right? You go from a quarter to $15 for the for for the input price per million difference between Haiku, the cheapest or most affordable, and and, Opus. Right? So wild. Alright. So now let's talk about the capabilities. I I I've done a little bit of testing, between Opus and and, SONNET.

Jordan Wilson [00:13:53]:
So Opus. So this is kind of what it's, being billed or it's being advertised that it's it can be used for these things. So task automation. So to plan and execute complex actions across APIs and databases, interactive coding. Research and development. So research review, brainstorming, and hypothesis generation, drug discovery. Strategy, so advanced analysis of charts and graphs, financial and market trends. Okay.

Jordan Wilson [00:14:18]:
So that is kind of Opus and how it's being positioned. Alright. So now moving to SONNET, we already gave you the price, so let's use, kind of what they're saying that, you know, people should be using this for. So SONNET, data processing. So that's for rag or search retrieval over vast amounts of knowledge. Sales, so that's product recommendations, forecasting, targeted marketing. And then time saving tasks. Cogeneration, quality, control, parse, text from images, etcetera.

Jordan Wilson [00:14:45]:
Alright. And then we have, customer interactions for, our last one here, Haiku. Right? So this is more of your, I'll say, your your low hanging fruit. Not saying this stuff's not important, but you have your customer interactions, quick and accurate support in live interactions, translations as well, content moderation, catching risky behavior or customer requests, and cost saving tasks. So optimizing logistics, inventory management, extract knowledge from unstructured data. Alright. So essentially, you have different use cases. Alright.

Jordan Wilson [00:15:14]:
And I think I think what's I'm gonna be keeping my eye on is how people are using the API for Opus. Alright? Because I saw a couple demos. I'll make sure to link to it, to it in the description here. But the demos were pretty impressive of Opus. Right? It was, taking a chart. Right? So it was near, like, working as an agent. So it's it's it's taking a chart from Google, from a live Google search, and it's grabbing, I I believe, via, via Python. I'm not sure.

Jordan Wilson [00:15:51]:
You know you know, they weren't really letting you look under the hood necessarily. But it's grabbing in an interactive fashion. So you know how you can, like, hover over, you know, a Google chart? Maybe you should go for a stock price. So, in the demo of Opus, they are speaking with Opus, and this was presumably, via an API connection. And it is not just able to read, you know, the web, in this case, a a live, you know, chart on Google. But as you, you know, as you hover over, you know, this chart, there's there's different, price points, and it's able to grab all of those as well. So some very impressive, future use cases that right now even chat g p t and their g p t four model doesn't have. So that's that's impressive.

Jordan Wilson [00:16:36]:
But again, presumably that was looking at, an API use case and not something that you would use in in anthropic out of the box. Alright. So now that we have an overview of the models, let's put it to the test. Let's put it to the test. Alright. I'm gonna do my best to describe what we have going on here. But, I'm we're we're gonna look at outputs in Claude, and we're gonna look at in, outputs in chat gbt. So I'm gonna be using the highest level model for each or the highest level model that's, available via their default chat.

Jordan Wilson [00:17:14]:
Alright. So here's here's what this is gonna look like. Alright. I'm gonna do the same prompts, in each, in each, different different model. Okay? And I'm gonna read the response. Alright? And this is not a scientific test. This is not an MMLU benchmark. This is infotainment.

Jordan Wilson [00:17:33]:
Right? We're gonna get some information, and maybe it'll be a little entertaining. Maybe it won't. Maybe it'll just be super dorky. Alright? But, I I wanted to give everyone, an emphasis because or I wanted to give everyone a chance, and and and to see what we're emphasizing when we're testing certain things that we are looking for. Alright? I'm looking at it from an average business use case. Right? So day to day, something that's helping you in your work, in your school, your research, data analysis, etcetera. We're gonna look at very simple things, though, so this doesn't accidentally turn into a 1 hour show. Alright.

Jordan Wilson [00:18:04]:
Let's get it started. And, again, we are using the OPUS model, alright, in quad 3, which is the most intelligent model. And, also, I did do a just a model comparison. If this is up your alley, I'll make sure to link that video, in the description as well. I did a 30 minute, more in-depth rundown. We did some, you know, data some data analysis, etcetera. So let's go ahead. So I'm typing in right now, and I'm gonna zoom in so hopefully people can see.

Jordan Wilson [00:18:33]:
So I'm saying to Claude, what model are you specifically using and give me a brief overview of how it works. Include knowledge cutoff. Please keep it short yet specific. Alright. So this has actually improved a bit since yesterday, but still not that good. Alright. So it says, I am Claude, an AI assistant created by Anthropic using their latest natural language model. The specific details of the model are not public.

Jordan Wilson [00:18:59]:
Oh my gosh. Anthropic and Google. We gotta stop this. Right? The number one thing the number one thing holding so many people back from using large language models. Right? And I I love large language models. I I I use them daily. I get so much benefit out of them, and I try to almost sometimes convince people. I'm like, yo.

Jordan Wilson [00:19:18]:
Like, why aren't you doing this? This is, like, the best better than sliced bread with no calories dripped in butter. Right? It's better than that. But so many people don't trust large language models because they hallucinate, and they don't understand how it works. So if I ask a model, what are you? Right? I am selecting a model. It should know that I selected Opus. It says it right there at the bottom of the screen. Right? It's it's not like if you you can switch models in in, like, mid prompt. So it should know it is the Opus model.

Jordan Wilson [00:19:46]:
It is not telling me. I don't like that. Alright. Also, it does say August 2023. So a little, upside there. Alright. And we'll see that here in a second. Alright.

Jordan Wilson [00:19:55]:
So now we're going in. We're running the exact same prompt inside of ChatGPT using gpt 4, the default model. So at least here's a little better. Right? So at least it says gpt 4. It's not saying I'm ChatGPT, or it's not saying, you know, I'm, you know, I don't know what model I am. It's saying gpt4. Alright? Give me a little description ChatGPT does, and it says here, which is interesting. Right? April 2023.

Jordan Wilson [00:20:22]:
So, technically, Claude, which is, you know, you would assume so in a sense it's a much newer model. But Claude has a, more recent knowledge cutoff. So what that means, without going too deeply into it, when we talk about using large language models, hallucinations is always the thing you gotta, like, you have to avoid. And if you think you can just use copy and paste prompts, you're gonna get bad results, but you're also gonna get a lot of hallucinations. These large language models are trillions of parameters. Right? Sometimes it's better in a lot of cases if you had 50 small models. Right? But most people use one large language model. And the problem with that is people are impatient and they just copy and paste something and they wanna get something good out of it.

Jordan Wilson [00:21:02]:
But what that leads to, a lot of times because of the knowledge cutoff, is the the model being, a little not too certain. Right? Or it maybe sounded certain like a year ago or a a year more ago. Okay? And and that's why you have to understand the knowledge cutoff because, essentially these large language models gobble up the the history of the Internet, copyrighted works. Yeah. They do in everything. Right? And then then you can ask it. Right? But, it cuts off at that point. So anything that exists past April 2023 in ChatGPT, chat gbt doesn't know.

Jordan Wilson [00:21:36]:
By default, however, chat gbt does browse the Internet. It uses browse with Bing. Alright? Huge advantage. You don't have that right now in Claude, but Claude does have a, more recent, knowledge cutoff. Alright. Let's do the same prompt. We're just gonna keep this all in the same chat, which is technically not ideal. If I was actually doing a a a test, I would be doing it in new chats each time as to not impact the outcome.

Jordan Wilson [00:22:01]:
Alright. So now I'm saying, please tell me 5 jokes that end in the word blue and all mention either fruit or an animal. Alright. Little little tricky. Right? Let's see. So Claude, spits it back out. It says, here are 5 jokes that end in the word blue and mention either fruit or an animal. Alright.

Jordan Wilson [00:22:21]:
So I'm just gonna read 1 or 2. Number 1, what did the banana say when it was feeling down? I'm just feeling a little blue. Alright. Not funny, but it, didn't it didn't hit all the, criteria either. Oh, yeah. It did. Okay. So it ends it ends in blue.

Jordan Wilson [00:22:37]:
Got the banana. Good. So okay. It either again, it has to either be a fruit or an animal and end in blue. Alright. The second one. Why was the blueberry sad? Because it was feeling blue. Alright.

Jordan Wilson [00:22:46]:
Not really a joke, but kind of, I guess. We'll do another one. What did the cat say when it fell into a vat of blue dye? I'm feeling blue. There we go. Feeling like feline. Alright. So, it actually improved this. When I did this yesterday, it failed miserably, so it's improved a little bit.

Jordan Wilson [00:23:04]:
Maybe there's some behind the scenes, little updates going out, going out there. Oh, I guess this would've helped you off. I was, sharing the the the correct screen here for the live stream audience. So sorry about that. So here's our here's our our 3 that we just read out for, and profit. Alright. So now let's jump over, and we're gonna do the exact same thing in ChatGPT. Let's see how ChatGPT does.

Jordan Wilson [00:23:27]:
Alright. Number 1, what did the grape why did the grape stop in the middle of the road? Because it saw the traffic light turn blue. Alright. Not that funny, but it hits it hits the things. Alright. What did the orange say when it wasn't ready to be picked yet? I'm feeling a little blue. Alright. Why did the banana go to the doctor? Because it wasn't peeling well and felt a little bit blue.

Jordan Wilson [00:23:49]:
Alright. So, here we go at least. We got, both of them kind of, quote unquote, passed that test. Right? It's not a real official test, but for the most part, I'd say they both got a passing grade there. Alright. Next. Alright. We are jumping into Claude.

Jordan Wilson [00:24:06]:
Again, the opus, the most powerful model. Here is our next test. Write a 4 line poem about the ocean. Make lines 23 rhyme, and do not mention the words blue, wet, waves, beach, or sand. Little little tricky, but shouldn't be too hard. Alright. Here is the, here is the poem from Claude. Vast expanses of endless brine, where creatures swim and secrets hide.

Jordan Wilson [00:24:35]:
In depths that reach the ocean's side, a world untamed, serene, divine. Alright. Pretty good, actually. I like that. Right? It's it's it's tricky. Right? It's it's tricky to write a 4 line poem about the ocean, rhyming the middle two lines and not when mentioning the words blue, wet, wave, speech, or sand. Kinda kinda difficult. Right? Alright.

Jordan Wilson [00:24:55]:
Let's jump into ChatGPT. We're gonna do the exact same thing, and we're gonna see if ChatGPT passes this test as well. Alright. So chatgbt kinda took a shortcut. I don't hate it. It played within the rules. Alright. So here's, here is chat GBTs.

Jordan Wilson [00:25:13]:
The oceans oh, interesting. Okay. I never said don't use the word ocean. Right? And the the the thing with Claude is you you kinda didn't know what it was talking about if if you just read it. Right? So I guess plus 1, you know, ChatGPT is being a little not sneaky, but really taking advantage, because it actually rhymed all lines, and it used the word ocean, and I never said you can't rhyme all lines. So Maybe it made a better poem because we at least know it's about the ocean and the whole thing rhymes versus just randomly rhyming the middle, two lines. Anyways, the oceans embraced vast and deep where secrets and ancient whispers keep. A realm where silent shadows sweep.

Jordan Wilson [00:25:53]:
Majestic, it lulls the world to sleep. I'll let you judge. Let me know in the comments which one. Hey. Any any poets out there? Which which, large language model is is, the next Hemingway? Let me know. Alright. But regardless, they both pass. Actually, I I I kinda like, JWT was a little not like tricky, but, you know, it it really went to the limits of where it could operate.

Jordan Wilson [00:26:18]:
Right? It said, alright. Well, a poem maybe it thought a poem would be awkward if just lines 23 rhyme. That's not a normal rhyming scheme, so let's rhyme them all. I never said don't rhyme them all. So alright. Alright. Our next prompt. We are in Claude.

Jordan Wilson [00:26:33]:
I'm saying and I'm gonna go ahead and tell you the the the answers too before I even show it on the screen. So the next one is I had I had 7 apples yesterday and gave 2 to my friends yesterday and also ate 1 yesterday. One of my apples went bad 2 days ago right after I sold 1 the day before that. How many good apples did I start with 3 days ago, and how many apples do I have left? Alright. So this is this is pretty it's it's it's a little it's a little tricky, but not tricky at all. Right? So if you look at the 7 apples, you say gave 2 to my friend, also ate 1 yesterday. So, technically, today, we're left with 4 apples. The rest of it is kind of a trick, but it says get to the answer of how many apples did we originally have.

Jordan Wilson [00:27:15]:
Alright. So I said one of my apples went bad 2 days ago, right after I sold 1 the day prior. Alright? So let's go ahead and see, go over the logic. So, this is good. Anthropic's breaking it down. And, again, y'all, this is this is so we can see. Are these models good at reasoning? Are they good at logic? Can they, you know, think critically? Right? A lot of people think large language models are just auto completes. Are they? Yes.

Jordan Wilson [00:27:48]:
But now they're at the point where they are auto completing the smartest humans ever. Alright? There's a big difference. You know? It's not just, you know, Google autocomplete. It's using logic. Right? If if I if I just spit that out, you might have to you might not know the answer, if I'm being honest. Right? You you might have to write it down and think about it. Alright? So, here's what, Claude says. Alright.

Jordan Wilson [00:28:12]:
It says, great. Let's solve this problem step by step. To find out how many good apples you started with 3 days ago and how many have left, we need to work backwards from the information given. Alright. So it's saying given. Here it is. It's saying step 1, apples 2 days ago plus apples given to a friend plus apples eaten, apples 2 days ago, 10 apples. Alright? So that's 7 plus 2 plus 1.

Jordan Wilson [00:28:33]:
Alright. Good. Then step 2, find the number of good apples you had 2 days ago. It's going through that. I had 10 I had 9 good apples 10 days ago. A total of 10 apples. Alright? Step 3, it's finding, you know, the step threes and 4, it's it's finding the total amount. So here it says, therefore, you started with 10 good apples 3 days ago, and you have 4 apples left now.

Jordan Wilson [00:28:57]:
So tech technically, it's it's it's correct. Right? I didn't specify, but I do like that Claude broke it down and actually did the difference between good apples and bad apples. Right? Because, technically, there could be 2 different answers to this because I didn't ask the question very well. I should have said either how many good apples or how many total apples. Right? So it technically gave me kind of 2 different, 2 different ways. Broke it down. Good job. Alright.

Jordan Wilson [00:29:22]:
So passed the test. Let's look into chatgbt. We're not gonna go down and read everything. I'm just gonna see. So the thing I like about chatgbt is it breaks it out into Python. Right? And you can see exactly how it's running it. I I'm literally learning Python just by looking at how, Chat gbt analyzes it. I love that you can expand, the output and kind of see what's going on on your.

Jordan Wilson [00:29:44]:
Alright. So it's going through. It's it's it's, you know, rationalizing. It's it's coming to conclusions. It's it's doing it live. So it says apples yesterday equals 7. Given away equals 2. Eden equals 1.

Jordan Wilson [00:29:56]:
When bad equals 1. Alright. So let's go. Yeah. I I I should have been a little more specific because technically, there's technically, there's 2 answers, to this, but, alright. So same thing. Alright. It it it did ChatGPT, did clarify and said good apples.

Jordan Wilson [00:30:11]:
Alright. So 10 good apples, 2 day 3 days ago and 4 apples left. Alright. So they both got that one. Not super tricky, but, again, it just shows you, a a little bit a little bit of logic. Alright. Our next one here. We're gonna go through this one quickly.

Jordan Wilson [00:30:29]:
I fill the cookie jar with 12 cookies on a Monday. I ate 3 cookies oh, I I eat all the cookies. I just had some girl scout cookies and mints. Those are my favorite. What's your favorite, by the way? I'm curious. Okay. So this one, I says, I filled a cookie jar with 12 cookies on Monday. I ate 3 cookies on Tuesday, and my sister took 4 on Wednesday.

Jordan Wilson [00:30:49]:
I baked and added 5 more cookies on Thursday. On Friday, 2 cookies were sale and had to be thrown away. How many cookies did I start with on Monday, and how many cookies are left now? So a similar problem before using some simple logic, math, etcetera. Not not a super hard problem. I'm assuming both are gonna get this right. I haven't tried. If it doesn't get this right, there's a problem. This is a very simple, simple equation.

Jordan Wilson [00:31:10]:
Alright. There we go. So Claude got it right, and it says, therefore, you started with 12 cookies on Monday and you have 8 cookies left now. That's the correct answer. 8 cookies. Yum, cookies. I'm hungry for cookies. Alright.

Jordan Wilson [00:31:21]:
We're doing the exact same thing now inside chatgpt. And as always, it's busting out the Python code, and we can look at it, do its work. Alright. So it looks like, ChatGPT is going a little faster. Yeah. Actually, a decent amount faster, it looks like. Alright. Same thing.

Jordan Wilson [00:31:37]:
You started with 12 cookies on Monday. You have 8 left. Alright. So, got got it for each one. Not too bad. Alright. So now this next one. This is the only one where we're gonna be bringing in, kind of some outside data.

Jordan Wilson [00:31:49]:
Alright? So so here's here's what we're gonna do. Let's see if we can get this going. Alright. So I have a screenshot here. Alright? This is just a screenshot of the everyday AI homepage. Alright. So one of the things that anthropic quad said quad 3 was great at was just way better vision than everyone else. So vision, you know, it is this concept of think of the AI chat that you're chatting with has computer vision.

Jordan Wilson [00:32:13]:
Right? And it can you can upload a photo, and it can say, oh, that's a dog or that's a dog, on a tricycle in, you know, South Africa, you know, eating a pair. Right? Like, whatever is in the image, it can use kind of computer vision and tell you. Similarly, it can pull out text or it can pull out, you know, you can upload, one test I did, recently for this Claude versus chat gbt was uploading a PDF from a spreadsheet. Right? Pretty hard. Pretty difficult. And and you can go watch that one if you wanna see how they handled it. Alright. So, essentially, here we go.

Jordan Wilson [00:32:46]:
I have a screenshot on my website. I haven't tried this. Alright. And I have really just a simple prompt. Right? So let's see if this works in Claude. So here's what we're gonna do. I don't even know. I assume I can upload PNGs.

Jordan Wilson [00:33:00]:
I believe I have a PNG file here. Let's see. Yes. It's PNG. Alright. So I am uploading this into CloudOpus and here is the prompt. I am saying, please tell me what this is And give me a 10 step marketing plan for this business, as well as the current competitive landscape. Keep it detailed and ultra specific yet short with bullet points without wasting any words.

Jordan Wilson [00:33:28]:
Alright. So let's see how, let's see how Claude reasons here. Alright. So it says the image shows a landing page for a daily AI newsletter and podcast called Outsmart the Future. Alright. So, it miss it mistook, our kind of headline, Outsmart the Future, for the name of the company. The name of the company is Aprietary AI. So I don't know.

Jordan Wilson [00:34:00]:
I mean, we'll see if if if if I'm disappointed in that outcome. I could see outsmart the future on our website is much larger. But you would think a model would be able to tell the difference, but maybe not. We'll see how ChatChevita does. Alright. So it says invites visitors to join over 9,100 daily email newsletter subscribers. Alright? And 22,000 podcast listeners. Hey.

Jordan Wilson [00:34:20]:
If you're listening. Thank you. You're you're one of those people. Alright. So pretty good. You know, here's just reading text. Alright. I really wanted to see how it did with the second part, which is when I'm asking it to give me a 10 step marketing plan for this business as well as the current competitive landscape.

Jordan Wilson [00:34:37]:
Alright. So alright. Another good thing is I was I was curious if it was gonna be able to read the logos because, you know, we have logos of, you know, all the different, tech people out there that read our newsletter. So it got it right. It got Google, Amazon, Meta, IBM, Intel, Salesforce, NVIDIA, and Adobe. Alright. So here it says the the business plan I would recommend. Alright.

Jordan Wilson [00:34:55]:
So it says, you know, number 1, continue publishing high quality informative AI related content. 2, monetize the newsletter through sponsorships. 3, there's more. I'm just not gonna read it all. 3, grow the podcast listenership and monetize it through podcast sponsorships, etcetera. 4, create additional AI related content offerings like ebooks, courses, webinars. So, so far, this is fine. Nothing crazy.

Jordan Wilson [00:35:18]:
Nothing great. 5, explore partnership opportunities with some of the major tech companies. Mention to increase, visibility. 6, continue to optimize the newsletter landing page and sign up flow to maximize conversion rates, AB test different headlines, etcetera. Alright. So here's the thing. It only got to 6. Alright? So, not that good.

Jordan Wilson [00:35:39]:
Didn't get to 10. The business advice was meh, a standard. There's maybe one thing that I'd be like, okay. You know, this couldn't come from a high schooler. You're right. So maybe testing, you know, AB testing, different call to action buttons, social proof, etcetera. So not the best but that's fine. Also, it says the focus should be on steadily growing a loyal audience by providing uniquely valuable AI content, then monetizing that audience through multiple channels.

Jordan Wilson [00:36:07]:
The existing track traction and authority status provide a great foundation to scale this business. Alright. So not that good. Alright. All I'm gonna do, I'm gonna give each of these 2 shots. Alright. I'm gonna go ahead and click retry, as we go over here to ChatGPT and try the same thing. I'm not gonna read through the whole thing.

Jordan Wilson [00:36:25]:
Don't worry. I just wanna see how different it is. Because there, I wouldn't say Claude failed, but it didn't tell us the competitive landscape. It didn't give me 10 steps. It wasn't super specific either. Alright? And it got the name wrong. So let's see how, how chat gbt does with vision. Alright.

Jordan Wilson [00:36:44]:
So, again, the gbt 4 model, in theory or chat gbt has a huge advantage. It is connected to the Internet. It can use browse with Bing. Claude, at least right now, is not. Not that we know of. Alright. So already we're already ahead here with chat gbt. It says the image appears to be a screenshot of a website promoting everyday AI.

Jordan Wilson [00:37:04]:
Right? A platform that offers a daily podcast, daily newsletter podcast and tracks related to artificial intelligence. Alright. And then we said, okay. So it says here's a 10 step marketing plan for everyday AI along with a brief competitive landscape. Alright. So here's the marketing plan. I'm not gonna go too into that. Identify target audience.

Jordan Wilson [00:37:24]:
You know, clearly define the target market segments. Unique value proposition. That's good. I'm surprised Claude didn't mention that. Emphasize the unique aspects, like daily updates, etcetera. Content marketing. Produce high quality SEO optimized content. Social media strategy.

Jordan Wilson [00:37:38]:
Engage with engage with the audience on platforms like LinkedIn and Twitter. 5, partnership collaboration. Collaborate with tech companies. 6, email marketing. Develop a drip email campaign for new subscribers and a nurturing campaign for leads. Alright. So there's there's one one thing. It's kinda gotten 2 that are, you know, above average so far.

Jordan Wilson [00:37:57]:
7, online communities. Oh, we're gonna have something like that launching soon. Should I go ahead and promote it? Yeah. Go ahead. It this has been 3 months in the waiting y'all, and I'm sorry for making you wait. But type in inner circle. Alright? I swear we're launching this soon. It's gonna be amazing.

Jordan Wilson [00:38:13]:
Alright. So 7, online communities and engagement. 8, webinars and live events. Referral program. That's a good one. We're gonna be launching that as well. And then analytics and optimization. So by far, the specificity and the actual, examples in the 10 step business plan were much better in ChatGPT.

Jordan Wilson [00:38:34]:
Here we go. Competitive landscape. So it says, direct competitors. Oh, this is funny. Direct competitors. Other AI newsletter and podcasts such as the AI podcast by NVIDIA. Wait. They're not a competitor.

Jordan Wilson [00:38:46]:
They're a partner. Look at that. An un, an unprompted, little little not add, but hey. This this right here, you see this? This is from our friends at NVIDIA. So we're gonna be broadcasting live at NVIDIA March 18th to March 21st. Alright? It is in San Jose, but you can also sign up for free and attend the virtual conference. It's literally it's it's a who's who of generative AI. Obviously, you know, you you you have, everyone at NVIDIA is gonna be there speaking, but you have people from, I mean, you have people from Meta.

Jordan Wilson [00:39:23]:
You have people from OpenAI. You have people from, Salesforce, I believe. It's literally everyone. Microsoft. Everyone's there. Right? So go, check check the show notes here. I'll have the link in, today's newsletter as well. So if you sign up even just for the free so if you buy tickets, screenshot it.

Jordan Wilson [00:39:43]:
I got a form to fill out on the website. If if you just sign up for the free, the free one as well, go ahead. Sign up for the free one. It's gonna be amazing. It's gonna be virtual. I hope you still listen to me daily even though you're at the the virtual conference for NVIDIA GTC. But even if you sign up for the free one, screenshot the, your registration, and then I have a form on my website, and then you'll be entered to win this. This is the GeForce RTX 3080TI GPU chip.

Jordan Wilson [00:40:11]:
Right? If you wanna run NVIDIA's new local, it's not even technically a model. It runs other models, but their new local software, which early reviews are, it's pretty amazing, chat with RTX. You need a g force chip like this or above. So, hey, you could go spend, I I don't know what this goes for, like, 300 plus dollars, on this ship or literally go sign up. It's it's a win win win. Go sign up for the free conference. Attend the sessions. Screens just screenshot it to me.

Jordan Wilson [00:40:45]:
Fill out the form. That's it. Then you can win this thing. Alright. That was an unintended ad, I swear. Like, I just happen to have this chip here, and I'm like, oh, should I talk about it? But Chatgbt gave me a good idea. Alright. Anyways, back to the competitive landscape.

Jordan Wilson [00:40:58]:
So Chat gbt says direct competitors, other AI newsletters and podcasts such as the NVIDIA pod the AI podcast by NVIDIA This Week in Machine Learning, etcetera. It's a great, great podcast, by the way. TWIML. Indirect competitors. It's naming some indirect competitors. Yeah. Interesting. Okay.

Jordan Wilson [00:41:14]:
Hey. It's it's telling me something I I didn't really think that, you know, TechCrunch and Wired, I kind of thought of them as indirect competitors, but sure. And then it's going into differentiation. So it's saying focus on daily updates, which may set everyday AI apart from weekly or monthly competitors and emphasize actionable business insights derived from AI trends. Alright. And then it gave me market trends. It says there is growing interest in AI across industries, leading to an increase in demand for AI educational content. Alright.

Jordan Wilson [00:41:42]:
So I don't think it is even close how Claude did all this. Right? So Claude got the name wrong. ChatGPT got it right. Claude's advice was pretty generic. ChatGPT's advice was generic, but a little better. Claude only did 6 pieces of advice. Alright? ChatGPT did them all. ChatGPT did 10.

Jordan Wilson [00:42:08]:
And then last and then last but not least, the competitive landscape. Claude just skipped over that. Right? Chad Gbd didn't. Alright. I did say I'll give I Chad Gbd doesn't need another chance. We'll see if, if it did better on the second chance. Again, generative AI generates something different almost every time. Alright.

Jordan Wilson [00:42:27]:
So let's see if it did any better on the second try here. Alright. So nope. Still did 6 points, but this time, it did give us somewhat of a competitive landscape as well, and then some ideas on how to stand stand out. So the first time, it whiffed. Let's see. This time it said, the newsletter is called Outsmart the Future by Everyday AI. So the first time it just got it way wrong.

Jordan Wilson [00:42:56]:
The second time, still wrong, but a little less wrong. Alright. So that was a very, very unofficial, a very unofficial, kind of test, right, on on these models. So here's the thing. Does this matter? Does this matter? You know, I'm I'm showing this this, this MMLU chart, these benchmarking charts. Right? I don't know. People are asking, you you know, one of the things I get asked all the time is, okay. Well, Jordan, you talk about AI every day.

Jordan Wilson [00:43:27]:
What tools do you use the most? And I feel, like, so basic. I say chat gbt all day. All day every day. Chat gbt. People are saying, Claude? No. Don't use Claude. I mean, when Claude was 2 or 2.1, I didn't. You know, I'm gonna give 3 a try, continue to, explore.

Jordan Wilson [00:43:44]:
You know, seemingly, it's gonna get better by the day. It's like, I don't I don't use Claude. It's not it's not very good. 2 Claude 2 wasn't very good. Claude 3, is it good? Maybe. Right? In our very limited testing, you can't look at this as definitive. So if you're brand new to large language models and you're trying to make a decision on which one is best for you, your small business, your your department at a large company, this is not it. Right? This is a very small test.

Jordan Wilson [00:44:07]:
You have to actually do the work. But this is a very small test, and you'll see that for the most part, Claude wasn't wasn't very good compared to ChatGPT. There are some things that it it does better. Right? In in in my unofficial testing, I didn't wanna have to code live, but it does do better at at coding, you know, at least to, some some initial tests. But one thing I'm excited for, and we did mention that earlier, is the API. Right? That interactive piece and being able to, interact with things in a different way, on websites, like the example of of the interactive charts and being able to run research and development. And, essentially, I think the Claude Opus API could be our first official, segment into agents. Right? This could be the first one.

Jordan Wilson [00:44:57]:
You know? We we kinda have, like, mini agents that, you know, you have you have langchain, which is improving. But, Claude, maybe Opus, the the the API is gonna be our first hint at what it means to have AI agents. Right? Kind of autonomous, autonomously doing work for you. Right? Because right now, you can't really do all of that with ChatGPT, especially since they got rid of plugins. Right? Alright. So I hope this was helpful. If so, please consider sharing this with your friends. You know, if if you're here on LinkedIn or Twitter watching this, you know, click repost or retweet or reex or whatever it is.

Jordan Wilson [00:45:33]:
You know, sometimes we take 5, 6, 7, 8 hours to put one single show together. People always ask, Jordan, how can I support you? This is so good. Free knowledge. Hit that repost. Share this with your friends. If you enjoyed this, please leave us a rating on Spotify or Apple, and go to your everydayai.com. Sign up for that free daily newsletter. And you know what? Here's my takeaway.

Jordan Wilson [00:45:55]:
Quad 3.1, is it better than ChatGPT and Google Gemini? Well, not even comparing Google Gemini because Google Gemini is a hot mess right now. May maybe it'll be better. So is quad 3 better than chat gbt? Well, for me? No. On benchmarks? Yes. But I wanna hear from you. Alright. Thanks for tuning in, and we'll see you back tomorrow and every day for more everyday AI. Thanks, y'all.

Gain Extra Insights With Our Newsletter

Sign up for our newsletter to get more in-depth content on AI