Ep 208: Small Language Models – What they are and do we need them?

Episode Categories:

Resources

Join the discussion: Ask Jordan questions on small language models

Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineup

Connect with Jordan Wilson: LinkedIn Profile


Small Language Models - A Game-Changer in AI Integration?

Artificial Intelligence continues to make a significant impact on the way businesses operate today. Over the years, Large Language Models have dominated the scene, but an up-and-coming trend is shifting towards Small Language Models such as Gemini Nano and S24. This shift is driven by increasing concerns about privacy and the misuse of large language models. Companies, namely Apple and Meta, foresee these small models becoming an integral part of their future offerings.

The Superiority of Small Language Models

Small Language Models come with a slew of benefits. Their size and simpler structure make them easier to maintain, enabling them to be effortlessly integrated into software and web applications without requiring extensive infrastructure.

In contrast to large models, small language models provide a balance between performance and resource usage, making them an ideal solution for practical applications like chatbots, voice assistants, and search engines. They can function on both cloud-based services and by downloading them, giving them a significant edge over their larger counterparts.

High-profile Small Language Models

There's a plethora of highly visible small language models, such as NVIDIA's Chat with RTX, Meta's Llama models, and the power of the RAG (Retrieval Augmented Generation) combined with small language models. These models showcase the potential of Small Language Models to tackle daily tasks effectively.

Bringing in your own database of information and working with small language models bypass security and privacy concerns, making small language models' future promising. Their overall success heavily relies on the impact of the first large-scale commercial rollouts, like Google and Samsung's collaborative effort to bring Gemini Nano to mobile devices.

The Evolving Definition of Language Models

The definition of what constitutes a small language model is continuously shifting as Large Language Models grow in both size and parameters. Large models carry billions to trillions of parameters allowing them to manage complex tasks, while small models like Vi 2 and LAMA operate with millions to billions of parameters.

Comparing Small and Large Language Models

While large models are versatile and capable of handling various tasks, small models are tailored for specific uses, positioning them as an excellent choice for customer service, creative writing, and other specialized tasks. Small models require less computational power, making them more accessible to users with limited resources. Moreover, their compact size allows them to be deployed on local devices and embedded systems, making them energy-efficient and suitable for real-time applications.

A Paradigm Shift towards Small Language Models

In conclusion, Small Language Models provide an alternative to their large counterparts by balancing efficient resource usage with excellent performance. They offer a cost-effective solution for both developers and business owners with limited resources. The future of Small Language Models shines bright, as seen from the interest of industry giants like Apple, Meta, and Google, who show promising commitments towards their incorporation.

If you're a decision-maker interested in staying on top of the latest AI trends, take a strong look at Small Language Models, as they might just be the next game-changer your business needs.

Topics Covered in This Episode

1. Introduction to Language Models
2. Advantages and Usage of Small Language Models
3. Comparison of Small and Large Language Models
4. Future of Small Language Models


Podcast Transcript

Jordan Wilson [00:00:17]:
It seems like we just got used to large language models. But now we're hearing more and more about small language models. Like, what the heck? What are small small language models? What's the difference between them and the big large language models, and when should we be using small models? We're gonna answer those questions today and more on everyday AI. Welcome. What's going on y'all? My name is Jordan Wilson, and I am The host of everyday AI, and this is for you. We are a daily livestream podcast and free daily newsletter helping everyday people Like you and me, not just learn what's going on with generative AI, but how we can all actually leverage all of this information To grow our companies and to grow our careers. And I think, you know, the more and more prevalent generative AI becomes in our day to day lives, you know, using in the workplace. We have to become more comfortable with all of these, you know, buzzwords and and all of these acronyms.

Jordan Wilson [00:01:17]:
Right? So if you're not already using large language models or small language models in your day to day at work and you live and work here in the US, You probably will be soon, so I think it's important to understand the difference between the 2, and what small models are good for. So we're gonna be diving into that here in a second. But before we do, as a reminder, please go to your everyday AI .com. Today's, today's episode, there's gonna be a lot of depth and detail. So if you're listening, you know, whether you're Walking your dog right now or, you know, you're on the treadmill or whatever it is. Maybe you can't type down notes. So we always do that for you. I'm a human.

Jordan Wilson [00:01:51]:
I'm a former journalist. I write the newsletter Right when I'm done with this podcast and the livestream. Also, as a reminder, y'all, this is live. A lot of people don't know this is essentially live, unedited. It's the realest thing in artificial intelligence also on the website, if you didn't know, we have more than 2, I think, 210 episodes or something like that now. You can go listen Every single episode. Go watch every single episode. Go read the newsletter that came along with every single episode.

Jordan Wilson [00:02:19]:
I'd argue we are probably the number one source for free generative AI information in the world. I don't know of a single other source, single source that has all this information, so make sure you go check it out. Alright. So before we jump into what Small language models are. Let's first, as we do every day, go over the AI news. So Salesforce is bringing a new Gen AI option in Slack AI. So Salesforce has just rolled out a new native generative AI capabilities within Slack, including features such as channel summaries, thread recaps, and AI search. So these features aim to help users access and make sense of the collective knowledge within Slack, improving productivity and saving time.

Jordan Wilson [00:03:01]:
So the new AI features in Slack can save users time and improve productivity with an estimated average savings of 97 minutes per week. So these AI capabilities right now are rolling out just in the US and the UK and are, right now for Subscribers of Slack enterprise plans. That's the important part. But the company is working to expand the additional plans and languages. Hey. Hot take. I don't like Slack, if I'm being honest. I think, you know, we use, we use ClickUp.

Jordan Wilson [00:03:28]:
I think it's a better option for us

Jordan Wilson [00:03:30]:
at least, but, I know so many people use Slack, and, man, this The noise of just Slack right now is is making me, is making me nervous. Alright. Our next piece of AI news, Apple's Latest alleged AI tool is bringing new types of creativity to everyday people. So Apple has developed a prototype AI animation tool called Keyframer that uses large language models to add motion to 2 d images. Alright. So it is a description based tool that converts text prompts Into CSS code to animate images and has potential applications in web based animation. So Keyframer is a promising new tool for creating these web based animations, and and just the the simplicity of turning 2 d images into animations. You know? As as someone that's, You know, built and developed websites on and off for, like, 20 years.

Jordan Wilson [00:04:18]:
This is pretty exciting for me because it's not easy. Right? And and the capabilities to turn a Still 2 d photo into a moving animation with a text prompt is kind of, awesome. Right? So so keep your eyes out on that. Alright. Last but not least in AI news, Google has rolled out a new coding tool for its employees. So according to reports, Google has introduced an internal AI model called Goose to assist its employees in writing code more efficiently. So this is part of Google's broader efficiency drive, which includes job cuts and team reorganizations. So this new goose model reportedly is a plan to bring AI to every stage of the product development process and is part of Google's efforts Enhance AI skills and streamline operations.

Jordan Wilson [00:05:04]:
You know what? All these names are are getting confusing because I thought maybe Meta Was the one that was gonna go down the, the animal route, you know, or the exotic bird route with their, emu video, but now, Google is is getting in goose. Right? Can we can we just have separations? Can can all, you know, can all tech companies maybe pick 1 genre of animal that they wanna call their, models so I can stop getting confused. Alright. So as a reminder, if you want more news, we do it every day as well as, yes, we recap this show. We go over more news than this And fresh finds from all across the Internet, a lot of other great information in our newsletter. So make sure you go to that, your everyday ai.com. It's in the show notes if you're listening to the podcast as well. So make sure to check out the episode description for those links, as well as

Jordan Wilson [00:05:51]:
you can email me, you know, reach out on LinkedIn, all that good stuff. Alright. So let's get into it,

Jordan Wilson [00:05:56]:
and I would love to hear from our audience. Hey. Tara is always first here. Good morning from Nashville, doctor Harvey Castro. Denise, thanks for joining us. Brian and Wuzzi and Mauricio, we got a crowd today. Y'all y'all actually care about small language models. Alright.

Jordan Wilson [00:06:12]:
I'm not the only one. Good morning to Svetlana and and Raul. Thank you all for joining us. So let me know right now. I would love to know for our livestream audience or just let me know also if you're listening on Podcasts do you use? Small language models? Do you know what they are? What what questions do you have? I may not have every answer. Right. Some of them I'll be able to answer live here on the show. So if you are joining us, please get your question in about what small language models are or what questions you have about them.

Jordan Wilson [00:06:40]:
But, Hey. If I can't get to them live, I'll make sure to answer them in the newsletter so we can all learn together. Alright. So we're gonna talk now about a what is a small language model, and I'm gonna give you 14 facts that

Jordan Wilson [00:06:51]:
you need to know. Alright?

Jordan Wilson [00:06:53]:
No no clickbait here. Just straight facts. You know we bring receipts. Alright. So let's start here with small language models. So it's important to know that the definition is always changing. Right? As actually large language models get bitter bigger, what we consider a small language model is always changing. So Keep that in mind.

Jordan Wilson [00:07:13]:
Alright? The goalposts are always moving in terms of these definitions, and there is no one overarching definition that everyone adheres to. So, You know, you could even say, oh, something that we might have considered a small language model, you know, 2 or, you know, a large language model 2 years ago. People might be calling it a small model now, so keep that in mind. So previously, a small language model was considered a model with fewer than hundreds of millions of parameters. But now, like I said, that definition is changing a bit. So as large language models like GPT 4 get bigger and bigger, Right. When we jumped up from, you know, GPT 3.5 to GPT 4, right, as the large language models get bigger, We're starting to call other models, oh, yeah. These are small now comparatively.

Jordan Wilson [00:07:57]:
Right? So it's important to know, and it's all kind of judged by parameters. Alright? We're gonna get into here in a second, what those parameters are and what they mean. So, essentially, large language models have 1,000,000,000 to 1,000,000,000,000 to 1,000,000,000 of parameters Allowing complex text tasks of all varieties. Right? So your your GPT 4, your Gemini Ultra, think of those as the ultra ultra big boys, right, in and large language models, and they can perform any task, and they essentially know everything. Alright? So that's not how small language models work. Okay. Small language models have fewer parameters, making them more efficient, and they are more for specific task Or to be used locally on devices with more limited resources. Right? We're gonna get into all the intricacies, but, you know, if we wanna talk big picture, That's the best option.

Jordan Wilson [00:08:47]:
Right? Or or that's the best way to think of them. Large language models are the the behemoths that can literally do anything and everything, and probably most of us actually use large language models like Chat GPT and and Google Gemini and anthropic Claude. We probably use large models like that more than we do small models. Alright. But small models are more for specific tasks. So it's not something that you can really get the best results sitting down and asking, You know, a 100 different questions from a 100 different, you know, walks of life. That's not really what small models are for. They are for they're fine tuned for specific tasks.

Jordan Wilson [00:09:21]:
Alright. Let's keep diving in. Let's look at some examples. So, again, even when we talk about parameters, because I'm gonna define what a parameter here is in a second, But first, I'm giving some examples of large language models and small language models and their parameters. Right? So a lot of times, Companies don't even say how many parameters their models are. So, you know, even as I throw these numbers out there, you know, some of them are unconfirmed, but, you know, kind of reported to be true. So keep that in mind. So let's just look at the difference here.

Jordan Wilson [00:09:50]:
So as an example, large language models, probably the 2 most popular ones, are GPT 4 from OpenAI, Which is reportedly about 1,800,000,000,000 parameters. Okay. Gemini Ultra from Google, which is a reported, you know, 1.5, 1.6 trillion parameters. Alright. Then let's look at some some popular now, small language models that maybe would have been considered big, you know, 3 years ago, but they're not. They're small now. So phi 2 from Microsoft is about 2,700,000,000 parameters. Then you have LAMA as an example, very popular open source model from Meta, 7,000,000,000 parameters.

Jordan Wilson [00:10:27]:
Alright? You know, don't we can we can pick hairs on the parameters all day. Again, those are estimates, those are reports, etcetera, but, Those are some some big names out there. Right? So we're talking OpenAI, Google, Meta, Microsoft. So, you know, there's these Smaller, more flexible models like Vi 2 and llama. Then you have your big, you know, your your your big ones, you know, OpenAI's, GPT 4 And Gemini Ultra. Alright. So now let's talk parameters. Right? Because that is essentially the difference.

Jordan Wilson [00:10:58]:
That's the difference. There's a lot of intricacies in how they play out differently, but the kind of the first, thing that we look at To separate large models from small models is parameters. So I just gave you the examples, you know, 1,700,000,000,000 versus a couple billion. Alright. So here's what a parameter is in very simple terms. Alright? So, again, I'm sure if you're a a machine learning expert with, you know, a decade of experience, You know, you might, have qualms with my definition here, but we're talking to the everyday person. We're trying to simplify this. So a parameter in very simple terms.

Jordan Wilson [00:11:32]:
It's so in simple terms of parameter in a large or in a language model refers to the variables that the model uses to make predictions. Alright. Each parameter represents a concrete part of the model that can change or adapt based on the data it's trained on. Alright. That's an important that's an important piece. Right? Because all of these models and all of the parameters that they contained are trained. Right. So which is why, obviously, these large language models are much more expensive to create.

Jordan Wilson [00:12:03]:
They're much more, expensive to to upkeep. They're they're they're more difficult to train, right, because they are so much more complex. Right? It's it's like, the way I like to think of it is you can if you know how to work with a large language model, You can essentially ask it anything, right, in the history of existence. And if you know how to work a large language model, you can probably get a pretty decent answer. Not necessarily the case with small models. Right? Small models are generally trained in different, you know, categories of work or different, You know, different types of outcomes you may want. Right? So as an example, a small language model may be trained or built specifically for a type of customer service, Right? To handle just, you know, inquiries from from customers. Right? And it might be fine tuned for a specific use case like that.

Jordan Wilson [00:12:50]:
Right? So If you have a small model that's maybe for customer service, right, maybe it's an open source model that you can, you know, tweak yourself, but let's just say that. Right? That small language model that's built specifically and tailored and fine tuned for customer service to respond to customer inquiries, you aren't gonna be able to code on that. Right? You're not gonna be able to develop a website. Right? You're not gonna it's it's not gonna be able to spit out images for you. Right? That's what large language models do. Right? It's it's it's the multi mode, the multimodality of large language models. Right? Being able to, inputs, photos as prompts, being able to output, you know, different types of of code and and multiple language, all of these things. A lot of times, small language models are not like that.

Jordan Wilson [00:13:36]:
Right? They're built for one very specific purpose, or They are just made for smaller purposes. Like, okay. This small language model excels at creative writing. Right? So can it, create an outline of, you know, how the stock market has changed over the last 30 years? Maybe, but that's not really what it's for. Right? So, again, different use cases, different types of training, different parameters, it it changes. Right? Complete difference in what a large language model and a small language model should be used for. Alright. And hey.

Jordan Wilson [00:14:15]:
So great great question here from Woosie. So have any specific products you like that are being run entirely with small models? I do have some some examples, Woosie, but I'll I'll I'll try to share those in the newsletter, so I don't go off track here. Alright? So Now let's talk about 14 things that you should know about small language models. Alright? And, again, I think most of us, myself included, are more familiar with large language models. So as we go over some of these facts and things to know, I'm gonna be comparing small models to large models. Right? Because that's what most of us know. You know? So think the large models, the the GVT 4, the Gemini Ultra, the Claude anthropic,

Jordan Wilson [00:14:59]:
and Etcetera or in profit Claude. So some some of

Jordan Wilson [00:15:02]:
the most important differences. Alright. So small language models require less and Computational power, making them more accessible for users with limited hardware resources. Alright? That's the most one of the most important things. Right. These small models can live locally on devices. Right? The new Samsung phones Have, Gemini Nano, right, which is technically a small model, but it lives on the hardware, right, which technically requires less compute. Because when you're using large language models, people don't like to talk about this, but they are extremely resource heavy.

Jordan Wilson [00:15:39]:
Right? There's there we we've talked about it on the show before. Right? Every couple 100 prompts, you know, it it kind of, can tell you how many you you know, what is the environmental toll, right on all of these prompts because large language models require a lot of compute power. You know? But small language models don't because they live Locally. Right? So they're not having to, you know, essentially send your query and, compute it, in the cloud, Which can be very expensive and very resource heavy. Small models not like that. So even with that, in the same vein, small language models are faster at training and inference due to their smaller size compared to large language models. Right? They're faster. Right? It's faster when there's way fewer parameters, especially when you're, You know, obviously, using, a small language model for what it's good at, it's faster.

Jordan Wilson [00:16:31]:
It's on it's on device. It has fewer parameters to look through. Right? So So think, think of it like this. You know, what's what's technically faster? You know, if if you had to read through, a a 500 page book, to to to learn about the history of everything or a 5 page book, right, that that gives you the history of things that maybe you care about. It's kind of like that. Right. It's just faster. Small language models are faster, but obviously way less robust.

Jordan Wilson [00:16:59]:
Alright. So more facts to know. Small language models are more energy efficient, which we just talked about. So they're reducing the carbon footprint associated with the training and the running of AI models. That's another part. It's not just the running, you know, large language models that are expensive. It's it's the ongoing training. There's so much compute.

Jordan Wilson [00:17:18]:
Right? That's why you have Sam Altman out trying to raise 7,000,000,000,000, yes, trillion with a t, $7,000,000,000,000 for more GPUs, right, to to build a new new class of of chips. Right? Because these GPU chips power All of these generative AI models, not just your large language models, but, you know, all all generative AI models are run off these very hard to get very expensive GPU chips. So Right now, you know, we are I I wouldn't say it's a compute crisis, but, you know, all all of this computing power is is scarce. It's expensive. It's resource heavy. In the long run, taking a toll on the environment. So small models, I think, in that regard, are important to keep an eye on. Also, small language models can be deployed on mobile devices and embedded systems unlike most large language models.

Jordan Wilson [00:18:06]:
Yeah. So, again, That's that's your edge AI, right, or your edge computing. So that's that's bringing these language models locally to small devices. Right? So you're seeing it on actual, you know, phones now and, you know, just talked about that with the new Samsung S 24 having, Gemini Nano. Also, reportedly, Apple, you know, should be announcing their generative AI offering in June at the, the worldwide developer conference. So Tim Cook just announced that, I think it was earlier this month. So we are presumably going to be seeing, a small language model in an upcoming iPhone, right, or maybe in upcoming, You know, MacBooks or Imacs. Right? So that's another important thing to keep an eye on is we are seeing that as well with NVIDIA's chat with RTX.

Jordan Wilson [00:18:51]:
Right? They just announced that this or it was actually announced a a couple of months ago, but it was just released, you know, in the last, like, 36 Hours NVIDIA's chat with RTX. Right? So we don't know how many parameters it is, but it it's a pretty it looks like a pretty solid, small small, Small language model that can run locally. Right? You have to have a certain, NVIDIA GPU running on your computer to to run chat with RTX, But the same thing. You know? So this is a big shift that we're gonna see, because it's better privacy as well. Right? That's one of the biggest things that people are concerned about with large language Models, and it makes sense. Right? So not just data sharing, but training data. Right? How are these companies using any data that we upload into their systems To train their models. Right? So when you think of smaller models that run locally, they are not sending information back and forth.

Jordan Wilson [00:19:43]:
Right? So it is the concept of running Large language model and generative AI locally on a device. It's much more privacy, much more security. Right? Just like if you're opening, You know Microsoft Office, let's just say, and you're not connected to the Internet, that's kind of, you know, what it's gonna look like in the future if you're working with a small language model On your on your phone or on your, PC or Mac once it comes out. Right?

Jordan Wilson [00:20:08]:
We're waiting on Apple. Alright. Some more facts. Here we go.

Jordan Wilson [00:20:13]:
So small language models are suited for real time applications, such as on device language processing where quick responses are crucial. Alright. Next fact. Next fact. Small language models have a lower capacity for understanding Complex language nuances nuances compared to large language models. Right? That's the other thing. Large language models, if you know prompt engineering 101, There's really not much in the world that you can't do with a large language model. Right? You can translate languages.

Jordan Wilson [00:20:44]:
You can build advanced Web applications by just asking, you know, a large language model that codes something for you, and and you can technically build, generated AI with generative AI. Right? It's it's so the the large language models are extremely complex, and they're only going to get more powerful and more robust as we see new models. Right? When we see g p t five, you know, and Sam Altman at OpenAI has been saying, oh, it's gonna be much better at reasoning. It's gonna be able to Rationale, you know, more more multimodal capabilities. Right? So these large language models are gonna get even more powerful and even more robust. Right. And even right now, large language models across, like, MMLU benchmarks. We talk about that on the show a lot.

Jordan Wilson [00:21:26]:
But, You know, the current version of of GPT 4 and presumably, Gemini Ultra, you know, are are about 3 to 4 times better, Uh-huh. On on these benchmarks, like MMLU, than the average human. Right? So right now, large language models, for the most part, are much smarter Much, much smarter than any 1 human. Alright? So that's important to keep in mind, so it's it's it's

Jordan Wilson [00:21:49]:
a big difference here. Alright. Let's keep it rolling. So another thing you need

Jordan Wilson [00:21:54]:
to know about small language models, they are often used in applications where speed and efficiency are more critical than deep language understanding. Again, tailored applications. Small language models can also be fine tuned and more quickly and cheaply for specific tasks versus large language models. Right? Yeah. Y'all, like, even though GPT 4, I think is still I think it's Still more powerful, at least right now, than Gemini Ultra. You know, we might see that shift, you know, as Gemini Ultra from Google, you know, starts to get a Little more stability and, just a little bit more improved. But, y'all, g b d four is, like, almost 2 years old now, right, which is which is crazy to think about. Right? But it's also important to know that and and and to see the difference.

Jordan Wilson [00:22:42]:
Right? Presumably, OpenAI has been working on, you you know, GBT 5 for for years. So these large language models are extremely expensive, to to create, to train, and to maintain. Right? It's it's it's like a Titanic ship in the ocean versus a jet ski. Right? You can't use a jet ski for everything, but for a specific task, a lot of times, a jet ski is much better than a huge, You know, you know, cruise ship maybe that 10,000 people can can go on, different different applications, Different, different vessels for different applications.

Jordan Wilson [00:23:21]:
Alright. Couple more things you need to know. You need

Jordan Wilson [00:23:24]:
to know, about small language models. And, yes, if you do have questions, I'm gonna try to get to them at the end, so keep them keep them coming if you do have them. So, small language models, they're much easier to maintain, obviously, in update due to their simpler architecture. Small language models can also be more and Easily integrated into software and web applications without needing extensive infrastructure. That's an important one. I think Every you know, all these different, you know, web applications and software early on, you you know, just jumped on, you know, OpenAI's models because Their API was was good. You know? They've been making it cheaper and cheaper and faster to work with. But I I I think we're gonna see a shift here in 2024, maybe to a lot of these, you know, pieces of software, these these different web applications.

Jordan Wilson [00:24:14]:
Instead, You know, using small language models. Because, again, as an example, let's just say, if you're building, you know, if you're a large company And you want to, you know, have your own version of

Jordan Wilson [00:24:25]:
a model for customer support. Do you need a

Jordan Wilson [00:24:28]:
model as big as as big as, you know, Google, Gemini Ultra are as big as OpenAI's. You know, GPT 4. I I don't know. You know, you you might be better off as an example with a model like Mistral or a model like like llama. Right? Something that is maybe, You know, more limited, more fine tuned. You know, another thing to keep in mind, which I think is important, is large language models. People struggle with them. Right? Because let me tell you this.

Jordan Wilson [00:25:00]:
If you're using and I know I always you you know, old man Wilson getting on his Porsche and shaking his fist at kids who don't know what a large language model is or how to use it. Right? So so much of what you see on on the Internet and social media is, you know, oh, use my prompts. Use my prompts. You know? Here's 15 prompts that'll make you rich tomorrow. Those prompts don't work. They literally don't work. That's not how large language models work. They're too big.

Jordan Wilson [00:25:23]:
They're too big. Right? If you tell, you know, GPT 4 as an example, you know, you're you're a copywriter with with 20 years of experience, that means nothing. It means nothing, you you know, for a large language model. Because guess what? It has gobbled up all of the information on the open web web and closed web Works of art, things that we don't even know about. It has essentially the history of of humankind in its dataset, in its, you know, 1.8 or 1.5 depending on what model you're talking about, You know, those trillions of parameters. So guess what? It's also gobbled up all this information. That's bad information. Right.

Jordan Wilson [00:26:00]:
People that say, oh, I'm an expert copywriter with 20 years of experience. Guess what? There's a lot of people that say that on the Internet that are garbage, And they're not good copywriters. Right? So when you're working with a large language model with trillions of parameters and you think you can use these copy and paste prompts and get great outputs, no. Would you get better outputs if you were using a small language model that is specifically trained for copywriting or creative writing? Absolutely. Right? That's why I think so many, individuals, so many businesses, especially early on, wrote off technologies, You know, these these very powerful and robust technologies such as, you know, chat g p t, g p t four, even, you know, goo Google, Gemini Ultra Because they're like, oh, well, I can put 1 big prompt in here, and it's not fantastic. It's because it's a large language model with trillions of parameters. You can't just put 1 prompt in and expect something right out because its brain, its big neural network is too big. It's too big.

Jordan Wilson [00:26:57]:
It's not fine tuned for a very specific task. So this is just a small mini rant brought to you by by old man Wilson. If you're working with large language models, need to understand the basics of prompt engineering. Right? You need to essentially train your chat that you're working with. Right? You have to. Like what Tara is saying here. This is what we teach in our free prime, prom, polish course that has been taken by thousands of peoples, thousands of of, you know, business leaders across the world. We teach them the basics.

Jordan Wilson [00:27:26]:
Most people are using large language models incorrectly. Using it like it's a small language model. It's not how it works. Sorry. Rant over. Let's keep going. Small language facts you gotta know. So small language models offer a balance between performance and resource usage, and it's ideal for many practical applications.

Jordan Wilson [00:27:45]:
Alright. So good examples here. Small language models can power chatbots. They can power search engines. They can power voice assistants, whereas large language models are advanced and used for every single task. Alright. Last couple facts you gotta know here. You gotta know.

Jordan Wilson [00:28:06]:
Small language models can be used in both cloud based services or by downloading them. Yeah. So that's the thing. You can't download A large language model. It's not how it works. I don't know if there's a single, you know, computer, GPU, you know, on any One physical device that you can download the entirety of, you know, like a GPT four. There's been people that have, you know, forked it, and they've created smaller versions of these large language models. But, know, for the most part, small language model is the big, big thing to keep in mind is, yes, they can be downloaded.

Jordan Wilson [00:28:36]:
They can be cloud based as well. Right? So there's great resources out there. We'll mention them in the newsletter. You know, hugging face is is probably one of the leading resources for, you know, working with and downloading, small language models, and you can run them locally on your machine. Right? You don't have to be a A tech expert to experiment and to download and to install large language models because that's what they're for. They're for, you know,

Jordan Wilson [00:28:58]:
on device use For very specific use cases. Alright. So I'm gonna get to

Jordan Wilson [00:29:05]:
your questions here, but we're gonna wrap up, and I'm gonna Ponder with you, and let me know, if if you're joining me live, what do you think the future is for small language models?

Jordan Wilson [00:29:16]:
I'm not a 100% sure. Right? I talk about large language models every day. I read about small language models. I use all kinds of models. So I'm I'm I'm very curious about what the future of small language models, is.

Jordan Wilson [00:29:33]:
I think what we're seeing, as an example, with, Samsung and, you know, Google, teaming up to bring, Gemini Nano to the s 24 to a mobile device, that's huge. Right? So I think the future of small language models actually is going to, kind of rely heavily on the successes or failures of these first couple large scale commercial rollouts. Right? So Even if we could we we could count on our hand, a handful of you know, we could call them, you know, high highly visible small language models. So you have your, you know, your models from from Meta. Right? Your your llama models, very popular right now. Very popular, You know, for people to run these models locally. You know, just talked about Gemini, Gemini Nano. I think we also have to talk about NVIDIA's chat with RTX.

Jordan Wilson [00:30:25]:
Right? And you can use other models on chat with RTX, which is great. Same thing. You can upload your own documents. Know, we haven't even talked about, you you know, the power, the power of of rag. Right? The power of rag, which is Something that is kind of tied in with these small language models. So it's, you know, retrieval augmented generation, Combining with small language models. So essentially bringing in your own database of information and combining it with small language models, you know, then you can, I think bypass so many of these security and privacy concerns? Right? When you can work with a small model that works on a device, it's faster, it's more efficient, it's cheaper. And then if you can bring your own data in and work with it in a secure fashion, I think the future of small language models is extremely promising.

Jordan Wilson [00:31:15]:
Right. It's almost like, I think, kind of. You you know, the large language models are are kind of like the Trojan horse. You know, it it it infiltrates all our daily lives, and we see how powerful they are, and we all start using them. Hundreds of millions of people are using large language models on a daily basis now. But then we are are also then concerned. Right? And we didn't know. I guess, like, some of the best marketing maybe for small language models It's large language models.

Jordan Wilson [00:31:42]:
Right? And people using them incorrectly. Because we see this, this robustness and and and how, powerful Models in general general, you know, language models are. Right? To to to turn unstructured data, into something that we can use and we can create with, extremely powerful, but we are also now, over the last 18 months, we've been exposed to the downside. Right? And and now we're becoming more cognizant of of privacy, and and trust. So I think small language models, it's kind of in a wait and see, but I I I do see them gaining popularity, You know, with the, you know, Gemini Nano, with the new s 24, with whatever Apple is going to be announcing, with Meta's open source local models, and also with Apple. Right? Presumably, we're gonna see some sort of small language model with Apple.

Jordan Wilson [00:32:31]:
You know? It could be

Jordan Wilson [00:32:32]:
a large language model as well. It could be a combination, but, Presumably, we're gonna have some some edge AI, some on device large language model in a future Apple offering. So I do think that working more With small language models is going to be the future. Alright. Let's see if we can get to a question or two. Maybe a couple comments. Let's see here. So, Juan says, what's your large language model of choice, for everyday use, emails, productivity, etcetera.

Jordan Wilson [00:33:02]:
Great question, Juan. I am still team chat gpt through and through.

Jordan Wilson [00:33:09]:
Right? At at least for most of, you know, our team's, needs, ChatGPT is is great. Again, but we're not working as much with many other people. We're a small business. So we're not working as much with confidential, documents. Right? I I think if we were, we might be looking at some of these and small language models right now, but at least right now, I still think while plug ins are there, right, yes, OpenAI will likely be phasing plug ins out soon. But right now, the ability to, you know, use what we call plug in packs, which are essentially mini agents. Right? You know, when you enable any 3 plug ins at once, Put a prompt in, and those 3 plug ins can work with each other autonomously. People don't know this.

Jordan Wilson [00:33:48]:
People don't understand how powerful plug ins are. And then you can also with the new feature, from OpenAI with the GPT mentions, then you can only 1 at a time, You can have your 3 plug ins almost working like mini agents in a chat and then also mention any GPT whether ones you create or from the GPT store. At least for me, I don't see any other large language model right now that offers that sort of, flexibility. Not even Gemini Ultra right now. Not all workspace accounts have access to all of the features, that that Gemini Ultra has. So, at least my take right now, that's the best large language model. Let's see. Tanya, can you give an example of a prompt that give the results you want? So, Tanya, I'm not sure.

Jordan Wilson [00:34:34]:
You you'll have to ask me a follow-up question. Maybe that's something we can answer, in the newsletter. So, Tara Tara asking what is the best PC or Mac For tinkering with a local model. My 2018, MacBook Pro wants to retire on me. Yes. That's a good Question. So you're gonna wanna look at probably, you know, laptops that were honestly introduced in the last, like, 3 to 6 months. So we'll we'll we'll have a list, we'll have a list in the, in the newsletter today.

Jordan Wilson [00:35:10]:
I don't have a list off the top of my head, But I do know, as an example, Microsoft did just release a new version of their Surface laptop that can run models locally. You know, we've already mentioned, a couple of, you know, phones that can run small devices locally. So, yeah, we'll have a complete list of, kind of different PCs right now or Macs that can run these models because, yes, you need newer devices with new, GPTs or or or GPUs, very powerful, you know, processing. So, yeah, you're gonna need probably something that's come out in the last 3 to 6 months in order to, you know, really leverage this. Alright. That's it, y'all. I hope you enjoyed a somewhat, you know, random look. Right? We kinda went all over the place on this one, but I gave you 14 facts you need to know about small language models.

Jordan Wilson [00:36:03]:
We talked about the big differences. We talked about what they are from parameters and the future. So if you want more on this, we're gonna break it all down in our daily newsletter. So go to your everyday AI.com. Sign up for that free daily newsletter. We're gonna get to some of the questions we couldn't get to live and more as well as more AI news, more fresh finds from across the web. Our daily tutorial. Check it all out at your everyday AI.com, and join us tomorrow.

Jordan Wilson [00:36:27]:
Join us tomorrow. We're gonna be talking, how AI is a creativity enhancer and not a creativity replacement. So make sure to join us tomorrow and every day for more everyday AI.

Gain Extra Insights With Our Newsletter

Sign up for our newsletter to get more in-depth content on AI