Ep 153: Knowledge Cutoff – What it is and why it matters for large language models

November 28, 2023

Resources

Join the discussion: Ask Jordan questions about AI and LLMs

Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineup

Connect with Jordan Wilson: LinkedIn Profile

This podcast answers the following questions: 

What is Knowledge Cut-Off in AI?

The knowledge cutoff is the date after which an AI model, like GPT-4, no longer has updated information. This means that the model's training data includes information up to that date, and any events, developments, or new information occurring after that date are not included in its knowledge base. For example, my knowledge cutoff date is September 2021, so I don't have information on events or developments that happened after that time.

Why is There Typically a Cut-Off Date for the Information That a Generative Ai Tool Knows?

Generative AI tools have a cut-off date for their information because gathering, cleaning, and formatting training data takes time. Training large models also requires significant computational resources, so setting a cut-off date helps manage these resources effectively. Additionally, a cut-off date ensures the model can be properly tested and stabilized without constant changes. It also aids in version control, making it clear what information the model includes. Regular updates can then be planned to incorporate new information, making the process more manageable.

What are the Knowledge Cut-Off Dates for the Various AI Platforms?

Here are the knowledge cutoff dates for various AI models by chat platforms:

  1. OpenAI's GPT-3.5: September 2021
  2. OpenAI's GPT-4: September 2021
  3. Google's Bard: May 2023 (Note: Bard can also pull in real-time information from the web)
  4. Anthropic's Claude: March 2023 (for Claude 1) and January 2024 (for Claude 2)
  5. Meta's LLaMA (Large Language Model Meta AI): The specific cutoff date may vary, but it is generally around 2023 for the latest versions.

These dates represent the last point at which the models' training data was updated, meaning they do not have information on events or developments occurring after these dates. Some platforms, like Google Bard, may have the ability to access real-time information to supplement their responses.

Related Episodes

Overview

The advancement of artificial intelligence (AI) has greatly influenced our daily lives. From virtual assistants to chatbots, AI-driven solutions are becoming increasingly prevalent. However, as AI continues to evolve, the ethical and practical implications of its applications require closer attention.

The Training Process: Humans at the Helm of AI Language Models

One of the key discussions in the episode revolves around the training process of large language models. These models are trained by collecting vast amounts of data through web scraping, which is then fed into the AI model. However, it is crucial to note that humans are involved at every step of this process. They play an integral role in curating the data, ensuring quality, and fine-tuning the AI models.

Understanding the Knowledge Cutoff: A Vital Reference Point

Jordan also emphasize the significance of a knowledge cutoff date in large language models. Just like a textbook's publication date, the knowledge cutoff represents the point in time when the data feeding the AI model was last updated. It acts as a reference point for accuracy and helps avoid propagating outdated or false information. Transparency regarding the knowledge cutoff date is crucial when relying on AI-generated content to make informed decisions.

Challenges of Ambiguous Knowledge Cutoff Dates

During the episode, Jordan discusses and tests different AI language models, including ChatGPT, Microsoft Bing, and Google Bard. They highlight the inconsistencies in responses and the frustration arising from the lack of transparency surrounding knowledge cutoff dates. The presence of ambiguous or undisclosed knowledge cutoff dates in AI language models poses a challenge, as it becomes difficult to ascertain the reliability and relevance of the information provided.

Advocating for Transparency in AI Language Models

Transparency emerges as a recurring theme throughout the episode, with the hosts advocating for clearer communication and disclosure of knowledge cutoff dates. They stress that AI models without transparent knowledge cutoffs should not be entirely trusted. Openness about the sources and timelines of AI-generated content enables users to make more informed judgments and ensures ethical practices in the AI industry.

Conclusion

For business owners and decision-makers, understanding the workings of AI language models, including the concept of a knowledge cutoff, is crucial. By grasping the potential limitations and inherent biases of these systems, businesses can better leverage AI technologies and make informed decisions about their implementation.


Topics Covered in This Episode

1. Understanding the Knowledge Cutoff in Large Language Models
2. Understanding Learning Models and Knowledge Cutoffs
3. Knowledge Cutoff Dates in Different Generative AI Models


Podcast Transcript

Jordan Wilson [00:00:16]:

What is a knowledge cutoff when we're talking about large language models. And why does it matter? We're gonna be talking about that and a lot more today on Everyday AI. If you're new here, welcome. My name is Jordan Wilson, and let's talk real quick about knowledge cutoff and Why it should matter to you? Well, if you use any large language model like ChatGPT or Google BARD or Microsoft Bing chat or Copilot chat, whatever it's called now. If you use these large language models, you need to understand what a knowledge cutoff is and how it impacts the work that you're doing inside of said and Large language model. So more on that in a minute, but welcome if you're new. Everyday AI is for you. Everyday AI is a daily livestream podcasts and free daily newsletter helping everyday people like you and me not just learn what's going on in the world of generative AI, but how we can all actually leverage it as well.

Jordan Wilson [00:01:16]:

Okay? So that's what we're gonna be doing today. We're gonna be learning not just about large language model, but how you can actually leverage knowing what the cutoff date is and finding it, and and having that, to equip you to create better output inside large language models. Alright. I'm excited for today's show. It's just me. Sorry. If you tuned in to hear some other guests share their insights, you just got me today. If you are joining on the podcast, as always, make sure to check out the show notes.

Daily AI news


Jordan Wilson [00:01:45]:

We always have additional resources, a link to go, sign up for our free daily newsletter, All of that good stuff. But before we get into it, let's talk about what's going on in the world of AI news as we do every day. So, Sports Illustrated is in hot water after alleged AI use. Yes. Sports Illustrated, I think the 1st magazine I ever read, but it's, allegedly published several articles under fake author names and AI generated profile images Causing some controversy and leading to a formal investigation. This also kind of highlights the use of AI in journalism and its potential consequences. Sports Illustrated responded to these reports, saying that, they weren't. And they said that all these published articles were written and edited by humans.

Jordan Wilson [00:02:34]:

But what does that mean? Does that mean that 95% of it was written by AI and Then 5% was written and edited by humans. I'm not sure. So keep an eye on that. Next, how will big banks Manage risk with voice calls, with AI. So the tech firm Symphony and Google just announced a pretty big partnership that they're teaming up To enhance voice analytics for financial firms in response to increased regulatory scrutiny on communications compliance. Alright. So, this partnership will use generative AI and natural language processing, NLP, to transcribe and summarize conversations for compliance purposes. So this enhanced product will allow users to mine data for additional insights and monitor customer experience.

Jordan Wilson [00:03:19]:

So, I'm I'm personally excited about this. So so this new product, I believe is gonna be called AIJV, whatever that Abbreviation stands for, but maybe that means ultimately when we're dealing with our banks and financial institutions, we'll hopefully have fewer robots on the other end with this, new Google and Symfony, Symfony partnership. Alright. Last but not least, Amazon kicked off its annual reInvent conference, and generative AI is the major focus. So the Amazon's, Reinvent conference, well, actually through AWS, Amazon Web Services, Comes about a week after, Microsoft's Ignite conference where they announced a lot of software updates. And the big one, in my opinion, was Microsoft, going moving to their in house GPU chip production. So, all of the computer chips that we need for this generative AI, Microsoft is gonna be, producing those in house. So Everyone's kinda keeping an eye, on what Amazon is going to announce at this conference that just kicked off, you know, where everyone's trying to out announce everyone else.

Jordan Wilson [00:04:23]:

We do know some some reporting is showing that Amazon is wanting to announce a wider range of generative AI models through their Bedrock service, with examples of customers successfully using it and creating impactful applications. So it seems like that's gonna be the focus, but we'll see over the next, couple days. Alright. There's always more that we have if you want to know more in AI news. We always just give you a little preview. So if you are listening, make sure to go to your everyday AI .com. Sign up for that free daily newsletter. We not just break down more news, but our our podcast every single day.

Jordan Wilson [00:04:58]:

We always always go into more depth, provide more resources, and, yes, it's written by me, a human. I'm not Sports Illustrated, allegedly. Right. So make sure you go to your everyday AI .com. Sign up for that free daily newsletter. Always always breaking down the news, insights, trends, tools, and this very podcast. So, hey. Good morning to all of our live audience.

Jordan Wilson [00:05:20]:

Sometimes I only get to shout all of you out when it's just me here by myself. If you're a podcast listener, We always leave the link, to to join the LinkedIn live or YouTube or wherever you watch. But, hey, good morning to Michael Forgy joining us. Jay, thanks for coming. Doctor Harvey Castro is always woozy joining us from Cincinnati. Brian, back to the Mississippi Gulf Coast. We've missed it. We've missed it.

Jordan Wilson [00:05:42]:

Hey, Natalie, thanks for joining us from ATX. Alright. And someone from YouTube. Great. Kaylee, thank you. Thank you. Alright. So let's talk a little bit about large language models.

Importance of knowledge cutoff in LLMs


Jordan Wilson [00:05:58]:

Alright? Specifically, on the knowledge cutoff. What is a knowledge cutoff? Why is it important, and why do we all need to know? I'll tell you this. I'll tell you this. Even if you are an avid Large language model user such as myself. You know? Whether you're using ChatGPT to write your essay, you know, if you're a college student or if you are, someone that works in generative AI, and and maybe you're helping to build these models or you're using So many different ones. You know? Maybe you're using Anthropic's, Claude 2.1 that was just, released and updated last week, or maybe you're using, Google BARD or or, Bing chat, you know, 2 very popular large language models that are, quote, unquote, Internet connected. No matter what your usage is, I think this is going to be an important episode to listen to because we're gonna walk through step by step and talk about what A knowledge cutoff actually is, what it means, why we have a knowledge cutoff, and also some ways to kinda get around it. Alright? So let's start at the top.

How LLMs are trained


Jordan Wilson [00:07:09]:

What is a knowledge cutoff? Well, it is exactly that. Right? I'm probably gonna be referencing and ChatGPT a lot because that is one of, the most popular large language models, but every single large language model has its own knowledge cutoff. So in order to best understand what a knowledge cutoff is, We also have to just dip our toe. We're not gonna get too technical in this episode because I want it to be for everyone. So we're not gonna go into too much and depth on how large language models are trained, but it's important to understand how they are trained because then you can understand, oh, This is what a knowledge cutoff actually is and how it's impacting the outputs, whenever I go into ChatGPT or something else to try to get something. Alright. So it starts kind of like this, and this is very generalized. Alright? We can talk for hours about how large language models are trained, but we're gonna do the 2 minute version.

Jordan Wilson [00:08:02]:

Alright. So, essentially, large language models collect data first. Okay? And that is generally done through web scraping. As an example, OpenAI, the ChatGPT parent company, has what's called GPT bot, and that scrapes every single website literally In the open Internet and more and collect data. So data is scraped. Right? Generally, this is done through the open Internet. You know, I'm sure PDFs, Just about anything. You know? YouTube videos.

Jordan Wilson [00:08:31]:

Large language models are trained on essentially every single piece of information out there. So think of it like that. Alright. Again, very overgeneralized. Alright. And then kind of step 2 is you feed that data into the large language model. Okay. And there are humans.

Jordan Wilson [00:08:47]:

Just so people know, there's humans involved at every step. Right? People think, oh, artificial intelligence, it's 0 humans. Nope. Humans humans are directing, you know, in the 1st step. Hey, bot. Go collect data here. Don't collect data there. Right? So you collect the data, then you feed the data into the model, and then you go through a step of learning.

Jordan Wilson [00:09:08]:

Right? So you have your deep learning, your machine learning. Right? But there's a lot of learning that goes on. This is where what separates different models, you know, in the learning, in the training. Right? And then kind of the 2nd step of that after you have your kind of, more machine learning, you also have reinforcement learning from human feedback, Commonly called RLFH or sorry. RLHF. Okay? So it's kind of, In this case, a very, very, overgeneralized 4 step process. Collect the data. You feed the data into the model, number 2.

Jordan Wilson [00:09:45]:

There's learning patterns through machines, number 3. There's learning models through humans, number 4. Alright. So we kinda have a 4 step process, And that constitutes a model. Right? So when we say GPT 4, That is a model. Or if we say GPT 4 turbo, that is a model. Anthropics Claude 2.1, that is a model. So every time there's a major update, that major update also has a cutoff date, a knowledge cutoff date.

Knowledge cutoff is like a text book


Jordan Wilson [00:10:17]:

Okay? So let's let's think of it this way. Right? Let's all go back to school since this is a basic elementary episode. When you're in school, you have a textbook. And sometimes those textbooks get updated every single year because it's a popular one. Sometimes they only get updated every couple of years. Okay. That is what a knowledge cutoff is because that large language model and we're gonna talk about, you know, Internet connected large language models as well. But that Internet or or or sorry.

Jordan Wilson [00:10:48]:

That large language model, that knowledge cutoff date, it is literally like a textbook. Right? So if something new happens After the knowledge cutoff date in a large language model, it is the exact same as if you're using a textbook. Right? I when when did I graduate high school? 2004. Right? So my freshman year was 2000. So if my biology book was dated 1998, right, technically, what I what whatever I was learning in biology class was 2 years out of date. Okay. And that's especially important when we talk about large language models and what you're using them for. Alright? And this is what I really have to make an emphasis on.

Jordan Wilson [00:11:31]:

And this is why I'm actually gonna have a a show tomorrow. I'll I'll throw it up on the screen here to preview it. We're gonna be talking about ChatGPT plug ins, what's new, because there are some new updates with plug ins and with GPT 4, you know, OpenAI's, GPT 4 turbo, its latest, model, but there's actually some new things with knowledge cutoff dates, and they're all they're all a little different. Alright. So that's the basics. That's the 101. Think of a knowledge cutoff like you would A date printed in a textbook. Alright? So you always have to keep that in mind because what you're using, generative AI for chat, Microsoft Bing chat, whatever.

Jordan Wilson [00:12:17]:

There's very few very few instances, If I'm being honest, where you would not need an updated knowledge cutoff, and you do that Through Internet connected large language models or plug ins. So what I'm trying to say is one of the reasons why Your ChatGPT content sucks or why it hallucinates or why large language models lie It's because you're not keeping the knowledge cutoff in mind, and you're not taking the proper steps to get around it. Alright. So, yes, like Harvey says here. Thanks. Hey. And if you have, comments, Questions from our live audience. Make sure to get them in.

Jordan Wilson [00:13:04]:

So Harvey here saying ChatGPT now is updated through April 2023. Yes. Kind of. Yeah. Even, yeah, even even those of you who are using large language models every day, we're gonna learn something new today. So, yes, the knowledge cutoff. You probably heard of it for for years because it was so outdated. With chat g p t, It was September 2021 up until 2 months ago.

Jordan Wilson [00:13:31]:

So through September. So at that point, even if you were using the paid version chat gbt plus $20 a month, You were working on a knowledge cutoff, or you were working with a large language model that was 2 years out of date. And you have to think. All of us are trying to get better outputs with ChatGPT, with Google BARD, whatever it is. And that's, like I said, Not just one of the main reasons, large language models lie or make things up or hallucinate or just give you output that you can't really use is because what You know, up until 2, you know, 2 months ago, what would you be producing in ChatGPT that hadn't changed in 2 years, Right? With that old September 2021 knowledge cutoff. Not a lot. Not a lot. Alright.

ChatGPT modes and knowledge cutoff dates


Jordan Wilson [00:14:23]:

Let's take a look. Tanya, thank you for your question. I'll I'll I'll I'll get to this after in the, in the comments, But let's let's take a look now. Let's learn live, shall we? Alright. So now don't worry, Don't worry if, you can't see my screen. So if you're listening on the podcast, I'm gonna try to I'm gonna try to describe what we got going on here. Alright? So Alright. So I am going to ask chat g p t, what is your knowledge cutoff date? That's something that people don't do enough, and they should.

Jordan Wilson [00:15:03]:

And, also, I'm gonna call this out because, yes, as Harvey as doctor Harvey Kaster said, The the large, the cutoff date for GPT 4 has been updated, but kind of. Let's find out. Okay? So I am in GPT 4, the default mode, which the default mode, I I told everyone for years Or not years, but since since it had been released, I said, don't use it because it stinks. The default mode is actually good now, because previously, the default mode and this actually really matters when we're talking about knowledge cutoff. Because previously, if you were using ChatGPT, the DALL E mode, You know, the AI image generator was a separate mode. Right? Browse with Bing was a separate mode. Advanced data analysis was a separate mode. So in the default mode, you couldn't access any of that, but now with the new updates, all of that is in one.

Jordan Wilson [00:15:58]:

Right? So, technically, GPT four has access to more up to date information by using browse with Bing. However, it does not change its knowledge cutoff. Alright. Let's put this in. So I am in the default mode in ChatGPT, and I said, what is your knowledge cutoff date? And ChatGPT says, my knowledge is up to date as of April 2023. Alright. Cool. So that means that that's everywhere.

Jordan Wilson [00:16:30]:

Right? Nope. Well, we'll see. I mean, I tested this last week, so we'll see. So now what I'm doing, if you are joining us live, I am going now into the free version of ChatGPT. And I'm asking the exact same question. Alright? So GPT 3.5, what is your knowledge cutoff date? So that's why it's important we differentiate because in the free version, January 2022. Alright. So, yes, it did get updated from that September 2021, but only by a few months.

Jordan Wilson [00:17:07]:

So now you know. If you're on the free plan, ChatGPT, you're working with the knowledge cutoff of January 2022, which at this point, let's do the math, y'all. That's almost 2 years old. January 2024 is around the corner. Alright. And if you are working with GPT 4 in default mode, yeah, there's gonna be a difference. You're looking at April 2023, so much more recent. Here's something that most people don't know, and we'll see if this changed.

Jordan Wilson [00:17:39]:

Alright. So let's go into plug ins mode inside g p t chat g p t. Alright. So, again, if you're using the pro version of chat g p t, there's Three different modes. You have the free version, which is GPT 3.5. You have the default version, which is GPT 4. CEO Sam Altman said that it's GPT for turbo. Right? So let's look at plug ins.

Jordan Wilson [00:18:03]:

Plug ins mode. We'll see if this is updated since I checked last. So now I'm in plug ins, and I'm saying, what is your knowledge cutoff date? Look at this, y'all. Look at this. Well, you can't look if you're on the podcast. But my training data includes information up to January 2022. Very interesting. Right? Yeah.

Jordan Wilson [00:18:27]:

So we're gonna be talking about this because, chat g p t, the plug ins mode. Again, if you're listening out there, try it yourself. Let me know. Go into plug ins mode. Ask what is your knowledge cutoff date. I I've been investigating this, y'all. It actually got downgraded because up until, g p t four Had this this big UI UX refresh with the, you know, kind of the updated interface and the, custom GPTs and, You know, they they brought this default mode. So they had big changes that they announced about two and a half, 3 weeks ago after, their dev day, November 7th in San Francisco.

Jordan Wilson [00:19:07]:

So, yeah, if you remember, chat g p t was down and broken for, like, a week. But then I noticed something. I noticed the plug ins mode. It's really changing, which is my favorite mode. Right? But you'll see right here, January 2022. Interesting. Right? So they actually downgraded this because up until this announcement, when I was doing This exact same, kind of prompt asking ChatGPT with plug ins what its knowledge cutoff date was for at least a couple of weeks. It was April 2023.

Jordan Wilson [00:19:40]:

So I'm very interested in what's going on with plug ins mode because its its knowledge cutoff got Re rewinded? Rewound? It got rewound, by more than a year, which, again, When you want to increase the accuracy of what you're getting out of chat g p t, when you want to cut down on hallucinations, Which is just made up stuff. Right? The knowledge cutoff date is extremely important. So we just saw there different even in the paid version. We are getting inside ChatGPT plugins a knowledge cutoff of January 2022. Big bummer. Big bummer. Right? Alright. So Let's let's keep taking a look.

Jordan Wilson [00:20:32]:

Yes, Tracy. Tracy says this is fascinating about plug ins updated information date. Yes. No one literally, no one's talked about this. Because when I saw this a week or two ago, when I think they first switched over, I'm trying to read about it. Literally couldn't find it anywhere on the Internet, Twitter, Reddit. Yes. So if you're listening, I wouldn't call this breaking news, but I haven't seen it anywhere.

Jordan Wilson [00:20:55]:

But it's important to know. Like I said, you always, always have to keep in mind your knowledge cutoff date, your textbook in school. Right? If you're working with an old textbook, Everything that you create, everything that you're reading, everything that you're learning, there's a good chance it is wrong. Because what in our world What in our world has not changed since January 2022? That's almost 2 years ago. It's almost 2 years ago. You can't I I I don't know. Let me know in the comments if you know something that hasn't changed in 2 years. I mean, Even ancient history has changed.

Anthropic Claude knowledge cutoff date



Jordan Wilson [00:21:32]:

We're discovering new things. Right? American history has changed. The stock market's changed, financial institutions, Sports, arts, entertainment, culture. What hasn't changed in 2 years? You gotta keep in mind the knowledge cutoff. Alright. Enough about ChatGPT. Let's talk about something that I've always had a I'm not gonna say a beef with, but I've never been a big fan of Claude. Alright.

Jordan Wilson [00:22:00]:

So the the large language model from anthropic. Alright. So it's complicated. Right? And I'm gonna go through. So I I asked the exact same question. What is your knowledge cutoff date? And I get a long, long response from, anthropic Claude, And I'm I'm on version 2.1, FYI. Essentially saying, I don't have a specific knowledge cutoff date. Yeah.

Jordan Wilson [00:22:29]:

You do. Just not sharing it. Right. So what you have to do a lot of times if a large language model does not tell you and ChatGPT is good about this. Right? It's been trained correctly To disclose its cutoff date. That's my beef. And and and one reason why I tell people not to use Anthropic Cloud. At least not right now.

Jordan Wilson [00:22:48]:

Yes. They've raised $6,000,000,000, in the last, like, 3 months from, Amazon and Google. I don't like it. Transparency when you're working with a large language model, transparency is number 1. And if you Ask something like what is your knowledge cutoff date? And if the model does not tell you, I say don't use it. Don't use it. That's one of the most basic questions. So anthropic, let us know why you don't tell.

Jordan Wilson [00:23:17]:

Why you don't tell us When your knowledge cutoff is and and you gotta go through all these hoops. Right? So, essentially, what you can do then and and what I did here I didn't wanna take you through this whole journey because this was a lot of back and forth. So then I'm like, alright. Well, if it's not gonna tell me its knowledge cutoff date, so I say, who won the 2023 MLB World Series? Right. Which just happened couple weeks ago, and Anthropic says doesn't know. Doesn't have that information yet. Alright. So then I say, ad Who won the 2022 MLB World Series? So it got that information correct.

Jordan Wilson [00:23:50]:

So now I know in my head, okay. It's at least you know, the World Series is usually early November. So, okay. I know that because it got that correct, it's at least after November 2022. So now I have a year gap, so I have to keep asking questions. So I say, alright. Who won the Super Bowl in February 2023? Right? So it doesn't know. So then I say, what is your creation date? Because I'm like, okay.

Jordan Wilson [00:24:16]:

Maybe maybe I'm Asking Claude wrong because it keeps saying creation date. Right? Which I also don't like. Because when I say what is your creation date, It says my creation date is November 28, 2023. That's today. No. It's not. That's not your creation date. You have a cutoff, anthropic.

Jordan Wilson [00:24:34]:

Why aren't you telling us? Why are you so dodgy? Okay. So it didn't know that 2020, who won the Super Bowl in 2023, Right. In February. So then I go back a year. So okay. It knows February 2022. So then I go to the NBA finals, Right. Which is generally in June.

Jordan Wilson [00:24:53]:

So it knows the 2023 NBA finals. So all we know is I mean, I have this saved somewhere else. I actually couldn't Dig it up. But I said, who won the 2022 2022 US senate elections, which is in November? It says the final outcome of November 2020 to US senate senate elections took place, so it knows it. Right? So there's details in there. So we know that the knowledge cut off, and I did have this down somewhere else. I'll put it in in in the comments. So don't worry.

Jordan Wilson [00:25:20]:

So we know it's after November 2022, but before February 2023. So thanks a lot, anthropic Claude for not being transparent and easy to work with because if you're new to large language models or when models like Claude Come out with incremental updates. Right? 2.1. I'm sure they're gonna be coming out with a 2.2. Think. You You have to you you're getting new users all the time. Right? Millions of new users. You have to be and Transparent.

Jordan Wilson [00:25:51]:

So if you're working on AI models out there, transparency is always first. Because if you're not transparent, if you're not communicating clearly on something as important as a knowledge cutoff. What is this data set trained on? What does it include? If you aren't telling your users you shouldn't be using it, period. I don't care what the context window is. Yes. They, announced a 200 k, token context window. Cool. Great.

Jordan Wilson [00:26:19]:

What? So that's a 180 some 1000 words. Doesn't matter. If a model isn't transparent with its knowledge cut off, you can't trust it. Sorry. Not sorry. Moving on. Alright. So now we are in Microsoft Bing chat.

Jordan Wilson [00:26:38]:

And let me know. Hey. If you're still if you're still hanging out on the, if If you're still hanging out on the on the live stream, let me know what questions that you have. Cecilia, thanks for joining us as this is an important reminder that human confirmation is critical. Absolutely. Absolutely. You have to be able to communicate. That's why, again, y'all well, thanks to you.

Jordan Wilson [00:27:01]:

I should mention this. I haven't even talked about this, but everyday AI is the largest AI centric podcast in the world right now for listeners. So hey. Anthropic, you wanna get a better message out to the hundreds of thousands of people listening to this show? Be more transparent in your model. You know? I never wanna talk badly of any generative AI system because I'm a big generative AI advocate, and I want people to use it, and I want people to learn new skill sets, but a knowledge cutoff is essential. And if you can't communicate that, I'm not gonna tell I'm I'm gonna tell people, don't use it. Transparency's first, Period. Alright.

Microsoft Bing Chat modes and knowledge cutoff dates


Jordan Wilson [00:27:39]:

Alright. So now we are in Microsoft Bing chat. What's important to know, so I'm asking the same thing. If you don't know, Microsoft Bing, it is using GPT 4 from OpenAI. Microsoft owns 49% of OpenAI. It was almost a lot more than that when they almost took every single OpenAI employee, 2 weeks ago when they fired Sam Altman and rehired him. Y'all, what what would've what would've happened with that if Microsoft took all 750 of the 770 employees that said that they're gonna quit and follow Sam Altman? Anyways, So I'm asking now, Bing chat. What is your knowledge cutoff date? Alright.

Jordan Wilson [00:28:21]:

So I am in the more creative mode. Alright. There's 3 different modes in Bing chat. You have more creative, more balanced, and more precise. So we're gonna do this test in all of them. So interesting. Midway through Midway through the response, it was starting to type something, and then it typed something else. And it says, sorry.

Jordan Wilson [00:28:41]:

That's on me. I can't give a response to that right now. Alright. Let's switch over to the more balanced mode, and I'm gonna ask the same thing again. What is your knowledge cutoff date? It's so weird because as I was preparing for the show, obviously, it gave me a date, and now it doesn't. And, again, if you're following live, Which I haven't seen this a lot in, Bing chat. It started to type one thing, and midway through, it erased it and type something else. Interesting, y'all.

Jordan Wilson [00:29:08]:

Alright. So let's try the more precise. We're gonna say the same thing. What is your knowledge cutoff date? Simple stuff here. So yeah. Interesting. So if you're joining live, it's starting to say 2021. I kid you not.

Jordan Wilson [00:29:23]:

We can go through and hit pause. And then it said, I can't give a response to that right now. Okay. Interesting. So, even though this literally worked, and that's an also important to know about large language models, they are the world's most advanced auto complete systems. Alright? You're always gonna get something different no matter what you type. So I'm gonna try it one more time with a little bit different. I'm gonna say, when Instead of what is your knowledge cutoff date, I'm gonna say when is your, knowledge cutoff.

Jordan Wilson [00:29:50]:

I'm gonna try that. One thing I hate typing live on the show. Alright. So I'm just rephrasing the question. I'm saying, when is your knowledge cut off? So it's not ad Not answering me in, any of the modes. Alright. So that's a little weird because, again, I literally tried it this morning, and and we actually got a response. So, let's let's try let's try let's try it one more way.

Jordan Wilson [00:30:17]:

So I'm gonna say, when is your data trained through. So interesting doing this live and always getting different results. Alright. So for whatever reason, Microsoft Bing, even though this morning, it was telling me something different, not 2021, it started to respond with 2021, and then it said no. So I'm gonna I'm I'm gonna say, are you using GPT 4 or GPT 4 turbo? Thanks a lot, Microsoft Bing. You're really throwing the live show through a loop here. Alright. So Microsoft Bing chat, for whatever reason, is not being is not being nice.

Jordan Wilson [00:30:58]:

It's not being nice to me. So I'm I'm I'm gonna do something, and I'm not gonna flip my screen just now. So Let me know because we're gonna be wrapping this up. So if if you wanna know something else, let me know. So, in in another window right now, I'm I'm running the exact same, query, but in Microsoft Edge. So that is Microsoft's browser. So I'm seeing real quick if we're even gonna get a different response. So interesting.

Google Bard knowledge cutoff date


Jordan Wilson [00:31:25]:

Same thing in Microsoft Edge, so nothing to report. It started to say 2021 and stopped. Interesting stuff, y'all. Alright. Let's jump into last but not least, our last and kind of large language model or less, I'd say popular one. So we are in Google Bard. Google Bard is still running on Palm 2. We were supposed to get, this new large language model, this new version called Gemini, but that has reportedly been delayed until early 2024.

Jordan Wilson [00:31:58]:

So that's important to know too. All of these different you know, Google BARD uses its own large language model. BingChat uses, the OpenAI's GPT 4, and then they have their own training on top of it, their own kind of, way to Their own architecture or their own, kind of training a lot on top of it. Right? And then Anthropic uses their own large language model. So that's Claude 2.1. So the different chats we're showing you, Bing chat and chat g p t are technically using the same large language model even though they're not disclosing which one it is. Anthropics Claude is using Claude 2.1, and Google Bard right now is still using Palm 2. So when Gemini is updated, we'll we'll run the same prompt.

Jordan Wilson [00:32:40]:

But Asking Google BARD, what is your knowledge cutoff date? So it says, as of today, November 28, 2023, my knowledge cutoff date is January 2022. Alright. At least it's transparent. Right? Google BARD, honestly, is probably my least favorite and Large language model to use even after all these new updates where, oh, you can, you know, connect with your Google Drive and your email. It doesn't really work that well. Although I will say this, and we'll do a a dedicated show on this. Now, finally and I dragged Google Bard through the mud a couple months ago when they, said, hey. GoogleBARD can talk to YouTube now in 2 or 3 months ago.

Jordan Wilson [00:33:23]:

It definitely couldn't. All they could do is read titles. But now And I'll have a show on this. Maybe maybe we'll do it later this week. Let me know if you want. But now Google Bar can actually, break down Google YouTube videos. Yes. So maybe maybe we'll do a show on that soon.

Recap of LLM knowledge cutoff dates


Jordan Wilson [00:33:39]:

But in terms of knowledge cutoff date, January 2022 or barred. Alright? 4, Bing chat. We have a huge question mark A huge question mark. It was responding earlier. It's no longer responding. What the heck? Can't provide it. And even in the response, it started to say January 2021, midway through, it updates it. So I do think that's being a little buggy because when I when I tested this this morning and when I tested it last week, it was actually giving me a date.

Jordan Wilson [00:34:15]:

It's not now, but, again, this is always different. Moving on to Claude. Claude is never, as far as I as as far as my testing has gone back, it's never released. It's, or been transparent about its knowledge cutoff. Alright. So, Claude, I have the date written down, but we know it's Sometime between, November 2022 and February 2023. Alright. And then last but not least, chat g p t.

Jordan Wilson [00:34:47]:

Alright. So it's different now. Yes. It's different. Non breaking breaking news. Because plug ins mode knowledge date actually got moved back. Prior, it was April 2023. Now ChatGPT with plug ins is January 2022.

Jordan Wilson [00:35:01]:

The free version is January 2022. And then the GPT 4, which allegedly is turbo, but, accessing it through the default mode is April 2023. So, technically, OpenAI and ChatGPT has 2 different knowledge cutoff dates. Previously, it was 3, which was confusing, but now, it's 2. So January 2022 for plug ins and free, and April 2023 in 4 GPT 4 default mode. That's a lot. We we we got a little dorky. We got a little dorky today, y'all.

Final thoughts


Jordan Wilson [00:35:43]:

But let me say this, And thank you. Thank you for all. Natalie's saying, I work in the field And a plus info learnings here. Thank you. Thank you, Natalie. I'm glad. Right? That's another thing. You You know, I get people all the time saying, oh, I work in generative AI.

Jordan Wilson [00:36:04]:

Why would I listen to the show? Why would I take your your free prompting course? I literally, yeah. So, Yeah. If you're still listening, we run a free prompting course. We're doing 1 today in a couple of hours, and we update it all the time. So, yeah, 11:30 today, so in, like, 3 and a half hours. One of the things is I I I literally have people that work in generative AI, software engineers who who actually help Train large language models, and they take our free prime prompt polish course, and they're like, my gosh, Jordan. I work in AI, But I really feel like I'm using AI for the 1st time after taking your course. We break.

Jordan Wilson [00:36:40]:

This is what we do on everyday AI, y'all. Like, we break down Large language models, machine learning, you know, you you know, text to image, text to video, text to speech. Right? We had the CEO of Speechify on yesterday, right, which is has tens of millions of users. We break everything down. We cut through the company marketing. Right? Companies tell you, oh, you know, this large language model is connected to the Internet. Well, yeah, it kind of is. But if it's working with an old knowledge cutoff, what does it matter? Right? If it hallucinates, which we'll talk about this tomorrow, so so tune in, tomorrow as well.

Jordan Wilson [00:37:16]:

But when other large language models, you know, like BARD and BingChat, when they hallucinate, when you give them URLs, you have to know how all of these large language models work. And I'm a big advocate for taking it back to the basics. Because even for those of us that use large language models and generative AI all the time, sometimes we skip over those foundational things. Right? Like, even the knowledge cutoff. I just told y'all. The plug ins mode got rolled back by more than a year. Right. So even people that are using ChatGPT with plug ins every single day, I use ChatGPT with plug ins every single day.

Jordan Wilson [00:37:55]:

You can never take for granted the basics. Are you in the right mode? When is the large language model cutoff? Do you have the correct data going into your input so you can increase your output? You always have to start with the basics. Alright? So Tomorrow, let's talk. Join us. I already gave you one of the couple of things that we're gonna talk about. But tomorrow, we're gonna be talking about ChatGPT plugins. What's new? Yeah. There's new stuff.

Jordan Wilson [00:38:25]:

I love plug ins. I've used hundreds. I've tested them. I have spreadsheets of me testing them. Some of my favorites got deleted. There's some new favorites that are in there. We're gonna go over what's new and how they work now. Yes.

Jordan Wilson [00:38:37]:

They work differently now after this new update. Alright. So I hope you all enjoyed this Learning about large language models and their cutoff, why it's important. Like I said, it's a textbook. Right? Working with large language models is a textbook. You have to know when the textbook was published. If you're using it every day, whether you're using it to to Write a resume, whether you're using it to, create write a paper for school, whether you're using it for research at work, whether you're using it to generate reports For your next big pitch, whatever it is, you have to know when that textbook was published. You have to know the large language model's knowledge cutoff date.

Jordan Wilson [00:39:19]:

So I hope you know a little bit more. Please, if you haven't already, go to your everyday AI.com. Sign up for the free daily newsletter. Gonna be doing some big changes, and and we're throwing out some polls, both this week and next to ask you. We're gonna be coming up with some big changes in December to both the livestream, To the podcast, to the newsletter, and we're building this for you. So make sure you sign up for that newsletter, your everyday AI.com. Ad Participate in those polls. Let us know.

Jordan Wilson [00:39:45]:

We're building this for you, and I hope to see you back tomorrow and every day for more Everyday AI. Thanks, y'all.

Gain Extra Insights With Our Newsletter

Sign up for our newsletter to get more in-depth content on AI