Ep 256: Microsoft’s VASA-1 AI Deepfake: So good it’s dangerous?

Potentials and Risks of Microsoft's VASA-1 AI Deepfake Technology

In the rapidly progressing world of AI, Microsoft's VASA-1 deep fake model has been causing a stir due to its advanced capabilities. This model can generate highly realistic talking head images from any image and voice, potentially opening up many doors on both technological and ethical fronts.

The Impressive Technicalities of VASA-1

Built with an impressive accuracy and detail, VASA-1 is highly capable of creating lifelike talking faces that match different voices in real-time. Demonstrations have shown the technology flawlessly manipulate and control different facial features and expressions to match various tones. Powered with detailed post-processing controls, the prototype requires minimal processing power, providing near real-time functionality.

Commercial Applications of Deepfake Technology

Deepfake technology has gained ground and made significant strides in recent years. Platforms such as Synthesia, HeyGen, Hour 1, and DID offer the capability of creating digital twins or AI avatars. However, the degree of realism offered by VASA-1 surpasses currently available models. Its commercial use cases could range from personalized training to breaking communication barriers, enhancing individualized learning experiences and innovating new modes of communication.

Weighing Benefits against Dangers

While there is potential for constructive application, this technology does not come without its risks. With the world being increasingly connected and reliant on digital platforms for information, the potential for misuse becomes a realistic concern.

The advanced nature of VASA-1 raises ethical questions about the risk of it being used to spread disinformation. The capacity to falsify video content with this degree of accuracy can blur the line between reality and falsehood, making it difficult, if not impossible, for the masses to discern the difference.

The Future of AI and Deepfake Models

The conversation around AI and deepfake models is undeniably becoming crucial. Despite uncertainties about its public release, it's important to explore these technologies and their implications on society as a whole.

The pace of progress in AI is groundbreaking, but it's equally important that society keeps up. As we witness the creation of technology that could have significant impact on our reality, understanding, managing and utilizing it responsibly is paramount.

Deepfake technologies such as Microsoft's VASA-1 exemplify this challenge: embracing the promise of AI-driven innovation while mitigating its potential misuse. As this technology continues to evolve, these discussions are not only important but imperative for the future of AI and digital society.

Topics Covered in This Episode

1. Explanation of VASA-1's capabilities
2. Concerns around Microsoft's VASA-1
3. Discussion on deepfakes
4. Potential Risks and Ethical Considerations
5. Potential Benefits and Positive Use Cases

Podcast Transcript

Jordan Wilson [00:00:16]:
Is Microsoft's VASA-1 deep fake model so good that it's dangerous? Do we need this technology? Should we be avoiding it, or should we actually be rushing toward, its its development and adaption? We're gonna be talking about that today and more on everyday AI. What's going on y'all? My name is Jordan Wilson, and I'm the host of everyday AI, and we are your daily guide to learning and leveraging gen AI to grow your company and to grow your career. So, yes, we do this every day. It is live. It is unscripted, and we are a daily livestream podcast and free daily newsletter helping us all. So, we're gonna get to that in just a second, but if you're whether you're driving on the car or joining us live, make sure if you haven't already, go to your everydayai.com and sign up for our free daily newsletter. And on our website, it's like a free generative AI university. So, 100 literally of back episodes that you can go and watch and learn anything that you want.

Jordan Wilson [00:01:19]:
Alright. So before we talk about Microsoft's new VASA-1 deep fake AI technology, and I've got some I've got some takes on this y'all, but before we get into it, let's start as we do every single day with the AI news. Some big stories today. So, Adobe has introduced some gen AI upgrades to Photoshop with its new Firefly V3 update. So Adobe has unveiled some new generative AI upgrades to Photoshop, including the new generate image feature powered by the new Firefly Image 3 AI model. So users can generate images directly within Photoshop by typing a text prompt or selecting from preset options. So the addition of generate image aims to help new users overcome that empty page feeling and unleash their creativity. Adobe also enhanced the generative fill feature allowing users to add a reference image, super helpful, for guided image generation.

Jordan Wilson [00:02:15]:
So a lot of these kind of quote unquote newer features that we were seeing in, Midjourney are now making their way, to Adobe's, Adobe Firefly. And all of these new features are available in the latest, Photoshop beta app, for desktop at 22.99 a month. Very specific price there, Adobe. So our next piece of AI news, we got all heavy hitters today. So, we go from Adobe to Google. Google has consolidated their AI teams under deep mind to strengthen their AI portfolio. So Google is consolidating its teams focusing on AI model development across research and deep mind divisions. So responsible AI teams, emphasizing safe AI development, are being relocated from research, from Google Research to Google DeepMind to enhance proximity to AI model building and scaling.

Jordan Wilson [00:03:05]:
So this move from Google pretty big actually, but it comes amidst increasing global concerns about AI safety and calls for technology regulation. So not just that, but we also just saw, about a month ago, Microsoft kind of create its new division, Microsoft AI, and kind of took a lot of the AI development and put it under that arm. So we're seeing a similar, approach from from Google here, which I think is smart. Right? DeepMind is, I think, one of the leaders in the world, when it comes to, ability. So, I think it's great that Google is moving its entire AI efforts under DeepMind. Last but not least, we go from Adobe to Google to Microsoft. So Microsoft has introduced its PHY 3, mini model, a new small model that can run locally on phones. So apparently, llama's reign, didn't last too long, but Microsoft has launched the first of 3 small AI models, phi I think it's phi.

Jordan Wilson [00:04:02]:
Is it fire pie? Does anyone know? But I believe it's phi three mini, that are meant to be more affordable and efficient for personal devices. So it performs similarly to larger models, but with fewer parameters and is trained using a curriculum strategy. So Microsoft did emphasize the quality, of its training input versus, quantity. So just much higher quality, which is, you know, allowing it to be a much smaller model, with still very impressive benchmarks. So, the company also plans to release 5 three small and 5 three medium, with 7,000,000,000 parameters and 14,000,000,000 parameters, respectively. So, the mini model is 3,800,000,000 parameters, and it is reportedly already outperforming, Meta's Llama 38 billion. So that's important to know there, a 3,800,000,000 parameter model outperforming Meta's shiny new 8,000,000,000 parameter as well as reportedly out benchmarking Google's Gemma 7b and Mistral 7b models on the, in the MMLU benchmark. So, wow, a lot of AI news to, for today.

Jordan Wilson [00:05:10]:
We're gonna be breaking down those stories in a lot more depth in our newsletter, so make sure you go to your everydayai.com. So let's get to the topic of today. Today's hot take Tuesday. Is Microsoft's VASA-1 Deepfake AI tech so good that it's dangerous? So, I'm gonna start and just give you the answer because maybe you don't have time. Yes. It is too good for public consumption right now. You like, I need to tell people this. One thing I do here at everyday AI is I look at literally every single almost every single piece of major new AI technology.

Jordan Wilson [00:05:44]:
I I've used hundreds of pieces of AI software over the years, and this by far the VASA-1 is by far one of the most impressive that I've ever seen. Right? Compare it up there with probably OpenAI, Sora, and, you know, we'll just put everything under one roof, but just kind of this, under one roof, but I would say the emergence, of of Internet connected large language models. Right? So those are the three things that I've been most impressed with. So the Internet connectivity of of large language models, OpenAI, SoRUP, and, this new Microsoft, BASA. So it is not available yet. Alright? So I want we're gonna take a a close look. We're gonna do a breakdown, and I'm also gonna tell you 7 things that you need to know about this model. But, just just as a reminder, this is for you all.

Jordan Wilson [00:06:32]:
Right? So in our newsletter yesterday, I asked what you guys wanted, for today's hot take Tuesday. You all said this. But I'm also I'm also curious, you know, for our livestream audience, you know, are you worried about AI deepfakes? Let me know. Or also if you're listening on the podcast, I always love reading emails. I I keep, you know, our email in there, my my LinkedIn, so connect with me. But I wanna know from you, are are you kind of, are are you not very worried? Are you more excited? Are you kinda worried, kind of excited? Are you more worried than excited, or are you just pretty worried? Right? So I'm curious where our livestream audience stands on this. Right? If I had to, if I had to choose, I would say I'm probably, c. I'm I'm probably more worried than excited overall.

Jordan Wilson [00:07:19]:
Don't worry. We're gonna break this down, but I'm really curious, where our livestream audience it seems like a lot of people are kind of worried but but kind of excited, you know, which is which is interesting. And, hey, Douglas. I love this. Douglas is, currently in the air. Douglas, you might be the I guess the first person, joining Everyday AI from tens of thousands, of of feet in the air. So thanks for that. So you know what, what Michael said here, and we're gonna get into this, he said it's already difficult to tell what is real and what's not, even with this knowledge of the tools.

Jordan Wilson [00:07:53]:
Absolutely. Michael, I think that's a great point and that's something that we're gonna be talking about here on today's show. So let's just go into it. Let me just tell you exactly, what this new VASA, model is. And again, if someone some of my friends from Microsoft tell me is it VASA or VASA? Right? That's the only thing with all these models that get get released. Everyone's, you know, pronouncing them differently until, you know, someone from Microsoft goes on an interview and tells us all, but, we'll just call it VASA for now. So, here is what VASA is, and then we're gonna give you all a live look and a live listen for our, for our podcast audience as well. So, essentially, it is a deep fake technology.

Jordan Wilson [00:08:37]:
Right? Whether you wanna call it that or not, this is Hot Take Tuesday. I'm I'm saying it like it is. This is a deep fake technology. There's a big difference between a a a digital twin or a digital clone or a, you know, AI avatar versus deep fake technology. Right? So essentially, what this boils down to is this new VASA-1, research paper from Microsoft. Again, it's not public. It's not out. Right? But you can go look at the results, but it allows you to take any image and a voice and create a talking head, more or less.

Jordan Wilson [00:09:12]:
Right? There's obviously great applications, which we're gonna be talking about, but I'd say right now, the danger of this far outweighs, the potential benefits. Right? This is one of those AI tools where I'm like, okay, Is this a solution, quote, unquote, looking for a problem? Maybe it is, maybe it's not. And again, I think there's great, there's great positives, there's great plus sides to this, but the biggest difference between deep fake technology and, you know, digital twin or AI avatar technology is, it's use cases. Right? And for the most part, and we're gonna talk about this, but there's great kind of digital twin or, you know, AI avatar platforms out there. But they don't wield this much power. Right, where you can literally take any face and make it say anything you want in a very realistic voice with eerily similar movements to a human being. Right? So let's just, you you know, instead of me trying to spend, you know, another 10 minutes, describing this, let's just go ahead. Let's take a look at this, technology in action.

Jordan Wilson [00:10:25]:
Okay? So, if if you're on the podcast now, so so what we're doing here is we're going through, the research paper. So we shared this yesterday actually in our, in our AI news that matters Monday recap, as well as I think last week. So if you haven't taken a look at this yet, I encourage you to do so. Alright. So I'm just gonna go ahead. I'm gonna play 1 or 2 samples, and again, for for our podcast audience, I'm gonna try to do my best, to explain what's going on here. But again, as a reminder, all this is is a single image, a single audio clip, and then the user, again this isn't public, but then the user has control over a lot of different things. What Microsoft is calling control signals.

Jordan Wilson [00:11:09]:
Alright? So let's just go ahead. We're gonna watch maybe, 10 second clips of a couple of these. So, here is the first one. So So let's just go ahead, take a watch, take a listen.

AI [00:11:19]:
If you plan to go for a run and you don't have enough time to do a full run, do part of a run. If you plan to go to the gym today, but you don't have the full hour that you normally work out, do some push ups.

Jordan Wilson [00:11:33]:
The crazy thing about that one is it looks like it is in maybe intentionally adding this, like, static key background noise, which makes it seem even more human and realistic because not all of these voices have that. Alright. Let's let's go ahead and watch and listen to a couple more. Again, you know, if you are joining on the podcast, it shows you the the single image that this was created with and then you can click play and watch and listen and all of this is, generated through this VASA, VASA-1 model.

AI [00:12:00]:
Surprises me still. I ran it on someone just last night. It was fascinating. You know, she had complained of she had complained of shoulder, like, pain in her arm. It was excruciating.

Jordan Wilson [00:12:14]:
Do you hear that? The tone changes, the hesitation, right, almost like sometimes I know I stutter on the show or I, you know, kind of, mutter and, you know, go off on these little side tangents like they're doing this as well, which is crazy. Just just the the varied inflection in the voice, the cadence changes. So number 1, even if this was a text to to speech software, I'd be like, woah. Okay. That's pretty impressive. Right? Like, we talked like 11 Labs quality already, maybe even better, off the bat, which is pretty impressive. So now let's just listen to one more, and then I'm gonna I wanna know from you all, what are your thoughts on this? Right? And if you do have questions, please get them in, but let's just go ahead and watch and listen, to one more example, and then I'm gonna show you some of these advanced, capabilities that I think actually make this a little scary. Alright.

Jordan Wilson [00:13:15]:
Let's go ahead and listen to this one.

AI [00:13:17]:
But you can imagine I have

AI [00:13:18]:
a lot of questions. So, I'd love to begin with you firstly just because I I read that you started out in advertising, and now you run a wellness business.

Jordan Wilson [00:13:30]:
Alright. So, yeah, you're kinda seeing where this is going, right, with with that example there. Some more on that in a second. Okay. So now I want to talk a little bit about some of these additional features. So this one here, and for our podcast audience, essentially, you can set the eyes gaze, the eye gaze in different directions. So you have, you know, there's gonna be 4 kind of videos playing at once with only one piece of audio with the eyes going in different ways. Ready?

AI [00:13:58]:
I would say that we as readers are not meant to look at him in any other way but with

Jordan Wilson [00:14:05]:
Okay. That's extremely impressive. One voice. And this I I cannot emphasize enough for our podcast audience. Check your show notes. You can come back and watch this on LinkedIn or YouTube or, you know, wherever. You have to see this. If you haven't seen it yet, you need to understand the quality.

Jordan Wilson [00:14:25]:
I cannot emphasize enough. This looks extremely human. Alright. So now we have one more example and then this is showing, kind of different, sizes of the head. Right? So you can have kind of a a super close-up or something that's a little more, wide. So here we go with this one.

AI [00:14:43]:
But you can imagine I have

AI [00:14:45]:
a lot of questions. So, I'd love to begin with you.

Jordan Wilson [00:14:49]:
And then you can see obviously different voices for the same image. Right? Different languages. Right? So now let's go and and this is probably the one that's, you you know, kind of taken off and gone viral, on social media. But this is also important to note about the training data and the training set. So, I kinda wish Microsoft told us a little bit more about what it trained on. However, it did say that, a lot of these capabilities are on obviously generative. Right? They're generative. So it's not like, oh, it's been trained on, you know, 1,000,000 fake faces.

Jordan Wilson [00:15:29]:
No. This is generative. So you can, create obviously faces and and videos that obviously did not exist in the training set. So, this is some, some I'd say that's a worrying piece of it, but, so impressive. Right? I keep personally being torn between this is extremely troublesome versus this opens up so many doors and is extremely exciting. But let's just go ahead. So this is probably the one that's kinda taken over the Internet, because it's also works of arts. So, yes, you know, the examples that I played so far look extremely realistic, extremely human, and we're gonna break this down, in a more detailed level.

Jordan Wilson [00:16:11]:
But here we have, the Mona Lisa singing and rapping. Let's take a take a listen, take a watch.

AI [00:16:18]:
Yo. I'm a paparazzi. I don't play no Yahtzee. I go pop, pop, pop, pop, pop, pop my cameras up your crotch. See, I tell the truth from what I see and sell it to Perez Hilton. Don't call me Skuzzy, making money. That's my job. Celeb photography.

AI [00:16:34]:
What? Hell no. I'm not needy. I'm legit, not staccarazzi.

Jordan Wilson [00:16:38]:
Wild. It's wild. Right? I can't explain the level of detail in the faces. Right? I've watched some of these clips over and over and over. As someone that's, you know, on camera every single day, I see this. I see the power of this, and this is something that, you know and we're gonna draw the line here between deepfakes and, you know, did, digital twins or AI avatars. And the level of detail on the face, right, eyes going in and out, eyebrows rising, you you know, wrinkles. Right? Wrinkles in your, crow's feet area, right, in the corner of your eyes.

Jordan Wilson [00:17:19]:
You know, all of these traits that we feel are uniquely human and have never been touched by current AI technology no longer. Right? Which is what makes this both so scary and so incredibly promising if used correctly. Alright. Let's just look at, we're gonna look at just 2 more quick examples here. So this one, goes to show so we have the same voice in 3 completely different faces. Right? So this is where you start maybe you saw the first ones where it's just one person talking at a time and you're like, oh, this is great. You know, I don't see really a problem with this until you see, okay. Here's 3 people, you know, 3 unique faces and one piece of audio.

AI [00:18:04]:
Will prevent those cavities from getting worse and prevent new cavities. Just because you treat cavities, it doesn't mean they can't get cavities in any other tooth or

Jordan Wilson [00:18:16]:
So that's wild. So in in this one, you have, you know, 3 different women, 3 different ethnicity, 3 different ages, you know, talking about cavities. Maybe I should listen to them, and then I'd be able to spend more time on everyday AI and less time at the dentist. But still, this is one of those where I think most people will see that and get worried. This is one where the the marketing and the advertising, part of my brain is is just going off. Right? Because you're like, oh, wow. Now you can quickly AB test. Right? You can quickly, you know, put a bunch of these videos on your website and see which one's resonating the most with your target audience.

Jordan Wilson [00:18:54]:
Right? And to be able to do that at quickly and at scale is is is wild is wildly exciting about the type of possibilities that this opens up, but, obviously, it's it's troubling. It's troubling. Right? So, the the last one that we're gonna look at here and then I'm gonna tell you 7 things that you need to know, and get to your questions. So if you do have questions, please please drop them in now. So the last one that I'm gonna show here for our podcast audience, this essentially just gives you different controls. So in this video, it's going to show the different controls that you have. So it's things that if you are a a video editor, you're probably familiar with these. So things like pitch and roll and x axis and y axis, gaze.

Jordan Wilson [00:19:39]:
Right? So not all it's it's it's not just random. Right? You have so much fine tune control. And again, all it takes is a single image, a single piece of audio, and you have all of these controls. So when I hit play here, you're gonna see, you know, presumably this is a researcher at Microsoft who's, you know, kind of recording their screen as they're doing this. But you can have such fine tuned control, over this, you know, deep fake clone here, changing every aspect of it. So let's take a watch and a listen, and I might narrate on this one, as well or hit pause.

AI [00:20:18]:
So I decided to focus all my attention, all my time on listening. So instead of doing something else, I just listen.

Jordan Wilson [00:20:29]:
Alright. So right there, we switched to now. We went from a presumably female speaking to now presumably a male speaking even though it sounds like a female. Right? So just with the click of a button, didn't have to regenerate, didn't have to rebuffer. That's another thing we're gonna talk about is the latency, reportedly on VASA-1 is amazing, in near real time.

AI [00:20:49]:
Listened and listened. Because I'm a true believer that if you're really bad at something like listening, for example, it only shows you that, hey, you have to practice

Jordan Wilson [00:21:00]:
Okay. So now what's talk now what's happening is, someone just put in a new, kind of script. Right? And I believe that they're gonna show this generating in real time and then this is what I want to kind of talk to everyone about and show everyone.

AI [00:21:14]:
Listening as much as you can.

AI [00:21:17]:
We introduce VASA, a framework for generating lifelike talking faces with appealing

Jordan Wilson [00:21:23]:
There we go. Right there. So now we are seeing something that makes this, again, both awesome and frightening. So what the user, presumably a Microsoft researcher, because according to Microsoft, that's all that really has access to it right now, In real time, they are generating new scripts, swiping between the different faces. Right? So, again, they're going from, as an example, male, female, young, old, same voice in real time, but also they are dragging and dropping on the face that is speaking, and and and then what they're doing is they're adjusting, kind of the angle of the face. And I cannot explain how much power how much computing power this would generally take. Right? And this is, again, happening in real time, no buffer, and then they're gonna be moving this face around. I'm gonna play this for just a couple more seconds.

AI [00:22:16]:
Visual effective skills, given a single static image and a speech audio clip.

Jordan Wilson [00:22:23]:
So I'll tell you this. As someone in a previous life that did a lot of videography, that did a lot of I put together a lot of stories. And so many times, you would have to have maybe 2 or 3 cameras because you wanna catch these different angles. Right? Wow. All of a sudden, you don't need that anymore. You have infinite angles, infinite people, and they can say whatever you want, and it looks pretty realistic. So we're gonna get why to why that's a problem, but also extremely powerful. Right? Alright.

Jordan Wilson [00:23:01]:
So, I told everyone that I'd give you, 7 things that you need to know. Great question here from from Tanya. So, Tanya, thanks for the question. Saying, if it's something we don't want to listen to, does it matter if it's real or fake? That's a great question. And also, it's like, okay. When or if the world gets access to this? Alright? But I don't think that that's going to actually be an issue because regardless of what Microsoft does, I think that we're gonna be seeing this technology whether we want to the public to have it or not. Cecilia with a great, great observation here saying the facial expressions in the last two were scary real. Yes.

Jordan Wilson [00:23:46]:
They were. Alright. So let's go ahead. Let's go over the 7 things that you need to know about VASA-1. Okay. So, again, a real a quick recap of what it is and what it does. So it is deep fake. That's what it is.

Jordan Wilson [00:24:01]:
Right? It is deep fake, real time. So that's another thing that they talk about in the paper, how little processing power it actually requires. Right? You don't need a $30,000 GPU. It's a commercial of what Microsoft is saying. It's a commercial grade graphics card. So it is near real time. So what that means is in theory, you might be talking to this person on a Zoom sales call, and it looks extremely realistic. Right? It only needs one photo and a source of audio.

Jordan Wilson [00:24:35]:
You have these kind of post processing controls. It is very realistic head movements, controllable motion. And, again, the human like tendencies that this captures is amazing. You know, right now, it's only, I believe 5 12 by 5 12 pixels, so it's not, you know, HD quality or 2 k or 4 k quality yet, but I think that's probably where it's heading. Another thing that you need to know about this, it's I I can almost guarantee that this VASA model, whether they're actually on 2 now or on a next iteration, Right? It takes time to get your paper approved. It takes time, you know, to kind of go up the corporate ladder and say, are we going to release this? Is the world ready for this? So in theory, this technology could be 6 months old. It could be 18 months old. We don't know.

Jordan Wilson [00:25:30]:
Right? They could already be on VASA 2.5 that does 4 k even faster. We don't know. All we know is that this is not today's technology. It has presumably already been greatly improved. Alright. So that's number 1. This is a quick overview of what it is. Number 2, NASA, one, is not released yet.

Jordan Wilson [00:25:51]:
Alright? So Microsoft did say that they do not want to release this to the public until they feel it is safe to use. But hey. Here's hey. Hot Take Tuesday coming out. When would this be safe? That's that's an honest question. I like I don't know. Live stream audience, what do you think? When would this ever be safe? Again, I understand the upsides. I understand the overwhelmingly positive impact this could have on society.

Jordan Wilson [00:26:22]:
But is there a scenario where this is actually safe to put this especially in the hands of the public? Again, I don't know. Are they gonna have certain safeguards? You know, as an example, is it maybe you can't upload your own photo? Maybe you can only use their preset, you you know, photos or, you know, AI generated? I don't know. Right? There's so many unknowns. There's not a lot of knowns. The research paper is not very long, and, again, this isn't available. Right? So if this ever does become available, it could even it could, in theory, be robustly even more powerful or it could be stripped down from, you know, what we just saw. So we don't know, but it is not released yet. But they're also not the only one.

Jordan Wilson [00:27:09]:
Right? So China's, Baidu. BeiDou, Baidu. Gosh. I always forget, guys. I I I say so many names that I forget pronunciations, but, China's Baidu also had a similar model called Emo, e m o. I don't think it was nearly as impressive. It was about 6 months ago, so they probably have or maybe not 6. Maybe it was about 3 months ago.

Jordan Wilson [00:27:31]:
But they already have I'm sure they already have a newer version. But regardless, even though this is not released yet and Microsoft said that, you know, they don't they didn't necessarily detail plans on when or if it would be released, they said that they need to, ensure that this is safe. I don't see a scenario where this is safe. Sorry. I don't. There's I think there's way more potential for misuse than there is for positive use cases. Again, I think this is a very cool solution looking for a problem, but I don't think it matters. Right? I don't think you know, it's it's it's not like Microsoft is the decision maker on this.

Jordan Wilson [00:28:11]:
Like we said, we there's already been a pretty impressive, variation of this with with Chandispatu, their emo model. So I think whether Microsoft releases this to the public or not, there is going to be another company, whether it's Google DeepMind, whether it's, you you know, someone we've never heard of creating this. Alright. Number 3 thing you need to know. Similar technology is already public. Right? And, again, this is where we kind of have to differentiate between what is a deepfake versus what's a digital twin. There's already great platforms. So as an example, Synthesia, HeyGen, Hour 1, DID, etcetera, where you can already create a digital twin.

Jordan Wilson [00:28:53]:
For the most part, there's there's 2 sides of this. So one is, yes, you can create one of yourself. Right? You can, you know, green screen setup, you know, record yourself, different angles, all that. But that is a much more detailed process. Right? I'm actually have a secret project that I'm working on that I'll probably tell you guys about soon. I love that part of the technology. Right? Where if you wanted to, right, so let's just say you're head of HR, you're head of learning and development, and you can only do so much. Right? Okay.

Jordan Wilson [00:29:26]:
Well, what if for control purposes and, you know, and generally, you have to sign a lot of documents and, you know, there's a there's a process to go through in creating a digital avatar, but I I like that aspect of it. Right? Like, imagine if if, yeah, if if you're in charge of learning and development at a huge company and you don't have enough time to train people. Maybe you wanna train people on AI, but you don't have the time. Okay? There's a cool use case. So this technology is already similar technology is already available in these kind of digital twin or AI avatar spaces, Synthesia, HeyGen, Hour 1, DID. There's a couple others. Right? But it is not nearly as powerful, and maybe these companies made it that way. Right? Maybe they said, we don't want it to be, you know, you can go choose literally any photo, make it say anything, although you have a little bit of that, but it doesn't look as realistic.

Jordan Wilson [00:30:16]:
So this new Microsoft research paper, VASA-1, and the examples are they look so human. It is scary. As someone that's watched, you know, uncountable amount of, you know, these new products, products with these digital avatars, you know, for the most part, you can tell that they're digital. You can tell that they're AI, which I think is actually a good thing when it doesn't look too realistic, because then you start to, yeah, blur this line between what's real and what's fake. Alright. Number 4, VASA-1's quality is outstanding. So like I said, the realism is uncanny, and the control for deepfakes right now is something that is not publicly available. So even as an example, the different, you know, zooming in and zooming out of, you know, where you want the head, the eyes looking in different directions, you know, being able to click on something in real time and, you know, essentially generate all these different angles of someone talking live.

Jordan Wilson [00:31:15]:
The quality and the realism is outstanding. Again, I'm sure that there's already a 1.5 or a v 2 that researchers are already building that's way better. That's HD. That's even faster. That requires even less compute. Alright. Thing number 5 you need to know. There are so many positive use cases for VASA-1.

Jordan Wilson [00:31:35]:
Yes. We talked about the negatives, but training, personalized learning and development, helping to break down communication barriers. That's a great one. Right? You know, people that have certain, you know, learning disabilities or cognitive impairments. Right? This technology could be extremely helpful. Maybe there's people out there that, you know, don't learn well or can't understand things without, you know, really looking at a person. Speak it to them. Right? Or maybe they just learn better.

Jordan Wilson [00:32:07]:
So I think that there's so many positive use cases that can change the world in a good way, can help society. Right? So, obviously, I can't overlook that. But number 6 is it's so good it's bad. Sorry. This technology is so, so good. I just think it's bad. You know? And I am I am the type of person that loves deploying AI out in the wild. Right? If I had a client, right, and they said, hey, Jordan.

Jordan Wilson [00:32:41]:
We wanna start, you know, doing all this with with this FASA model. Again, it's not it's not openly available. It's not public. I would say, are you sure? Are you sure you want to? Right? Again, maybe maybe, you know, I think a lot of times I think that spending time doing this every single day that, you know, I'm I'm personally, quote, unquote, ahead of the AI curve and maybe this is just, you know, old school in me, you know, old man Wilson shaking his shaking his fist on the porch saying, ah, this new technology. Right? I don't know. I don't know about this one. Right? It gets to especially when we talk about misinformation, disinformation. Right? I said yesterday, but, you know, imagine this before the US election.

Jordan Wilson [00:33:28]:
Imagine if if if you could right? And, again, the technology's there, but it's not good anywhere else and it takes a long time. Imagine if this was available now. Imagine if someone could take your photo and make you say anything. Again, we don't know when and if this is released, what the capabilities are. You know, I'm not saying that that is going to be a feature or an option in this model or others, but that's where this technology is heading. And whether, you know, I guess it's luckily a company like Microsoft who, I think is is, doing things in a pretty ethical and responsible manner, but I can guarantee you that this technology very soon will be replicated by bad actors. Right? Big tech companies. What right? It could be, other countries using this for disinformation and maybe we just don't know.

Jordan Wilson [00:34:20]:
So I do think it's important to talk about this, to have a conversation, and, yes, as excited as we could be and say, oh, I wanna use this for, you you know, this project and this project and say, okay. Wait. Do we really need that? Because I do think at least early on, the potential for it being bad far outweighs the potential for it doing more good. Just because I think the overwhelming majority of people maybe not you. Right? Like, if you're tuning in here every single day, right, if if if if you are someone that's using generative AI every single day, I think you would probably have a little bit better of an understanding of how this could be properly used in society and how maybe the good could outweigh the bad. But you are in the 1%. You're in the 0.1%. Right? If you're tuning in here every day, if you're pushing generative AI use at your company, right, you are still in the in in the minority.

Jordan Wilson [00:35:14]:
You know? Because 99.9% of people that would see something like this would not know, and that just poses so many implications. And then last but not least, I think if nothing else, VASA-1 is going to prepare us for a new normal. Whether we want this or not, it's coming. You know? I know that's that's that's weird. It's weird to say whether you like it or not. This is coming again. Maybe not Microsoft. Maybe not, you know, these other companies that I named, but there's going to be maybe unnamed companies.

Jordan Wilson [00:35:57]:
This is going to happen. We are going to see deep fakes that are so incredibly realistic of people like you and me, whether we give them our consent or not. This is where the technology is heading. No. I'm not a a doomsday AI prepper. You know, I'm not being, you know, freaking out here on the Everyday AI Show. I'm being realistic. We technically, the average person, does not get a say in where this technology heads, and the capabilities and the power of of what we've seen in this FAFSA model model, I think, will unfortunately inspire bad actors.

Jordan Wilson [00:36:39]:
So we have to understand that whether we want this technology to exist or not, it probably is going to. Right? And I think that's an important conversation to have. So let me know. What are your thoughts? Is this good? Is it bad? Again, we don't know if we're gonna see VASA-1 in the public, but what are your thoughts? And a quick recap. Right? Here's the seven things you need to know. So what VASA-1 is, it is a deep fake technology. 1 photo, 1 video, tons of control. Number 2, it is not released yet.

Jordan Wilson [00:37:18]:
Microsoft said it wants it to be safe. We don't actually know if they'll even release it. Number 3, similar technology is already public. So we have the digital twins and AI avatars from companies like Synthesia, Heigen, Hour 1 DID, etcetera. But then we also have very similar technology, like China's, Baidu from, the the emo model from Baidu. Number 4, the quality is outstanding. You cannot tell. I mean, you literally cannot tell by looking at it.

Jordan Wilson [00:37:44]:
I'm staring at it on a on a HD screen, you can't tell. It is uncanny. Number 5, there's so many positive use cases for this technology, training, learning, and development, helping breaking down communication barriers. Number 6, I think it's so good it's bad. The potential for misinformation and disinformation, political campaigns, you you know, personally, I think that there's maybe more bad than good. And number 7, I think this prepares us for a new normal. Again, maybe VASA-1 is just putting this all on our radars. Maybe this will never become public.

Jordan Wilson [00:38:17]:
Maybe this is where it stops, but this is not where this story ends of AI deepfakes. Alright, y'all. I hope this was helpful. Like, hey, like what Moses says, it's good, but all things can eventually get used for evil. I agree. This is important to have a this is important to have a conversation. So, hey. If you're listening on the podcast, make sure to check out your show notes.

Jordan Wilson [00:38:44]:
Let's keep the conversation going. You you know, if you're joining on LinkedIn, YouTube, wherever, I think there's a ton of very smart people listening in, talk with each other. I think this is an important conversation to have. So thank you for tuning in. Make sure to join us tomorrow. We're gonna talk about generative AI, how you can turn trash into treasures, with James Daniel, the VP of AI from LonzaTech, as well as Thursday. Oh my gosh. Can you guys believe we've been doing this for a year? We're still here.

Jordan Wilson [00:39:12]:
We're still around. Thank you for that. So we are celebrating our 1 year anniversary of the Everyday AI Show. By doing this, we are gonna go back and redo our very first show. Our very first show posed the question, will AI take our jobs? So 1 year later, we're going to ask that same question. I think it's just like today's conversation, I think it's an important not you know, we don't like to talk about that. We don't like to talk about the downside of AI. We like to talk about, oh, I use, you know, chat gbt to do this, and, you know, I I gained back all this time.

Jordan Wilson [00:39:46]:
That's great, but we have to talk about the hard questions. That's what we do on everyday AI. Thanks for tuning in. If this was helpful, let me know. Please consider leaving us a rating, sharing this. You know, if you're listening on social media, hit that repost. It takes us sometimes 10, 15 hours to put together one single episode. It takes you 10 to 15 seconds to repost this, share this with your network, tag your friends.

Jordan Wilson [00:40:11]:
Thank you for tuning in. Go to your everydayai.com for more. Hope to see you back tomorrow and every day for more everyday AI. Thanks, y'all.

Gain Extra Insights With Our Newsletter

Sign up for our newsletter to get more in-depth content on AI