Ep 218: Winning the Probability Game in AI Visuals

Navigating the Unpredictability of Generative AI

In the realm of artificial intelligence, the unpredictability of generative AI technology presents unique challenges. However, it also opens up unprecedented opportunities to innovate and specialize in thriving image or video types. A result-oriented approach, such as focusing on well-liked images or marketing videos, can aid in boosting the probability of success.

Harnessing Creativity in AI Technology

The era of AI-influenced businesses emphasizes the pivotal role of creativity. Plenty of examples abound where AI technology has been effectively deployed for both professional and personal projects. The process, however, might often require taking small steps and progressively enhancing the strategy.

The Magic of Pika Labs in Image Animations

Imagine the power to animate your favorite images and turn them into captivating short videos. Platforms like Pica Labs have made this possible, allowing the generation of 3-second videos from static images. The key lies in a trial-and-error methodology, playing with different elements such as the direction of the camera and the intensity of motion to perfect the output.

Rolling the Dice in Generative AI

In this unpredictable sphere, patience is a virtue. While the end results may be enticing, creating these visual feats does not happen in a blink. It's a matter of rolling the dice, setting parameters, and waiting as multiple videos generate in the background. However, the payoff lies in the understanding gained from continuous testing and optimizing.

The Art of Crafting Consistency in Movement

When it comes to creating immersive videos, maintaining consistency in movement becomes vital. Experimentation with different strength levels of motion aids in deepening one's understanding and control over the final output. A common challenge is to prevent excessive motion from lowering the quality of the videos.

Avoid the Pitfalls: Lessons from AI Giants

While the landscape of generative AI is fraught with promise, lessons learned from industry titans, Meta, and Google show the potential pitfalls of AI tech, like biased and offensive outputs. It underscores the importance of thorough testing and transparency in handling AI applications.

Generative AI: A Process of Constant Evolution

Generative AI unfurls as an ongoing game of probabilities. A single solution does not fit all. The number of generations needed to achieve a desirable outcome varies according to the model and the subject at hand. But it is through this iterative process, challenges such as text-to-video methodology can be overcome.

Achieving Mastery over Media Outputs

Turning the challenges of AI visuals into rewarding successes demands organization and a deep understanding of the process. Keeping track of successful prompts for future usage becomes a useful practice. Mediation on usage and creative rights also plays a crucial role, as reckoned by the terms of the particular platform used.

Inherent Limitations of Image Models

While AI image models are advancing at a rapid pace, limitations do exist. Creating images embedded with text is one such challenge requiring repeated attempts to perfect. Even more complex is the task of animating these images into coherent videos.

The Investment of Time and Effort in AI Projects

Creating quality AI visual content necessitates a significant investment of time and effort. A single project might involve overcoming several obstacles to reach the desired output. Nevertheless, the end result is likely to be an innovative and thoroughly engaging visual treat.

The world of generative AI visuals is an enticing and increasingly important domain. Finding success in this field is akin to playing a complex game where patience, creativity, and constant evolution hold the keys to winning.

Topics Covered in This Episode

1. Unpredictability of generative AI visuals
2. Process of generating AI images and videos.
3. Organizing media outputs and importance of creative rights
4. Limitations and challenges in AI video and images


Podcast Transcript

Jordan Wilson [00:00:16]:
If you want better AI images and better AI videos, sometimes you gotta roll the dice and play the probability game. But don't worry. You don't have to do it blindly. We have an expert today that's gonna share his secrets and hopefully allow us all to get better AI images and video. So what's going on y'all? My name is Jordan Wilson, and I'm the host of Everyday AI. So we're a daily livestream podcast and free daily newsletter helping everyday people learn and leverage generative AI. So if you are joining us on the podcast, we appreciate that. Make sure to check your show notes.

Jordan Wilson [00:00:56]:
Today is gonna be one of those shows that's a little more visual. So you might wanna, check out the show notes and watch the video. I think we're all gonna learn a lot. If you're joining us on the livestream, thank you. We appreciate that. M David joining us and Tara joining us from Nashville. Thank you. So before we get in and we're gonna walk everyone through it, I think today is gonna be a very fun visual, and interactive episode.

Jordan Wilson [00:01:20]:
But before we get there, let's start as we do with the AI news. And as a reminder, you can always go to your everydayai.com and sign up for that free daily newsletter. Unlike the other guys, we're actually written by humans. I'm a human. I write the newsletter. So I'm a human now, and I'm gonna read the news. Alright. So here's what you need to know for AI news.

Jordan Wilson [00:01:39]:
So Meta's llama 3 is reportedly in the works. So Meta is set to release an open source language model, llama 3. And llama 3 is designed to be more responsive to users and provide context for difficult topics rather than blocking queries. It'll also have the ability to differentiate between words and sensitive or harmless meanings depending on context. And then llama 3 is estimated to have more than double the parameters of its predecessor, llama 2. Alright. Our next piece of AI news, speaking of AI images. So Google CEO Sundar Pichai has publicly apologized to employees for the Gemini AI image debacle.

Jordan Wilson [00:02:21]:
So Google CEO Sundar Pichai has issued the apology for the release of the company's artificial intelligence tool, Gemini, after facing backlash for its bias and sometimes offensive results. So the tool was meant to create diverse images with the built in imagine 2 image model, but instead produce images of, as an example, a man America's founding fathers as black, the pope as a woman, and other historically inaccurate and biased images. Google has also announced that it's planning to relaunch the AI generator in a few weeks. Alright. Our last piece of AI news, at least for the podcast here, did The New York Times hack OpenAI? Well, that's what OpenAI is alleging. So in a new filing, OpenAI has accused the New York Times of, quote, unquote, hacking their products by using deceptive prompts to generate copies of The New York Times articles in violation of their terms of service. The New York Times had filed a lawsuit against OpenAI for copyright infringement show showcasing how OpenAI's GPT models can produce verbatim copies of their content. But OpenAI, did claim that hacking was a rare error in their systems learning process, which could be addressed.

Jordan Wilson [00:03:37]:
The New York Times obviously has been arguing, that their actions were merely a search for evidence of copyrighted content within the AI models. And there is an ongoing debate, between OpenAI and The New York Times, regarding the use of manipulated prompts in, using copyrighted, copy. So, super interesting. If if you're interested in that, I actually had a full, like, 1 hour very deep dive. I don't know anyone else that, went in as deep as I did. So I'll make sure to leave that in the show notes so you can go, take take a look. But, there's always more, so make sure to go to your everydayai.com and sign up for that free daily newsletter. Alright.

Jordan Wilson [00:04:17]:
I'm excited. We're not gonna talk about AI news all day. We're gonna talk now about how you can improve your AI visuals. Right? So if you've ever used, an AI image tool like Midjourney or, an AI an AI video tool like Runway or Pika Labs, you know, maybe you've got great results, maybe you didn't. But our guest today is going to help us hopefully get much better results. Alright. So let's go ahead and bring to the show. There we go.

Jordan Wilson [00:04:43]:
We got him got him live here. So, Tianyu, a visual artist and founder of TYAI. Tianyu, thank you for joining the show.

Tianyu Xu [00:04:51]:
Hi, Jordan. Hi, everyone. Thank you very much. This is from based in Singapore.

Jordan Wilson [00:04:55]:
Yeah. Yeah. Well, it's yeah. Very good morning for me. It's it's a good evening, in in Singapore there. So thanks for joining us halfway, across the globe, TNU. But, maybe tell us a little bit about what you do as a visual artist and founder of, TYAI.

Tianyu Xu [00:05:10]:
Yeah. Actually, my background is quite far away from an artist. I'm I don't I won't call myself an artist even now. So I have a background in market research, social media analytics, and advertising sales. So I ventured into generative AI almost at this time last year. It's very addict it's very addictive to me personally and especially, when you're creating the visuals and videos and chatting to Chargept. Last year, I consulted in market research, data analytics, and, content production, of course, with the help of AI and only until the end of last year, I discovered that, well, education should be my focus because eventually, generative AI will reach everyone and it's an essential skill for anyone to to be successful in the future works workplace.

Jordan Wilson [00:06:01]:
That's that's so important because I think, Tianyu, there's there's a misconception. Right? Because, a lot of people think, oh, generative AI is is not for me. But I don't think you know, my personal take is you you don't really have an option. Right? So whether it's in, you know, 2 months or 2 years, I think the average person, even if you're not a creator or marketer or, you know, a visual artist, you're gonna be using, you know, some sort of generative AI tool. Is that kind of, like, your your thought on it? Is that kind of where you see things going?

Tianyu Xu [00:06:34]:
Yeah. That's that's my view as well. And I know it it can sound quite it can sound very, you know, it it can sound very complex if especially when you hear the term prompt engineering. There's prompt and there's engineering. It does sounds like something that everyone would do, but actually, it's just the it's just communication. It's just a way to communicate with AI models. You used to communicate with people, with humans, and now we are communicating with AI models. That in that way, I think eventually everyone, everyone actually has the potential to be a good prompt engineer and everyone should be, ready to learn them, to learn these skills.

Jordan Wilson [00:07:14]:
Absolutely. So so I'm I'm I'm curious, how did you transition from from market research to now you're putting out, you know, very fascinating and and great, visual content, and we'll be sharing that in the newsletter as well so you can go look at a lot of his work. But how did you make that transition, you know, from market research to, you know, now all of a sudden you're putting out amazing and educational, content on, on AI visuals?

Tianyu Xu [00:07:40]:
Yeah. This is a great question. And I, you know, even if I'm doing the the AI visuals, my methodology is still like market research. So I I study the statistics. I study the how the models are are built, and then I look for the areas where the models are particularly good at, and then I take an analytical approach to, one way when I design a prompt and eventually, optimize the prompt. So so whatever I'm doing, like, in the past 12 months has have actually us actually have a strong connection to my past experiences.

Jordan Wilson [00:08:19]:
And and, you know, I'm curious, how did you, you know, get to the point where you are breaking it down at such an analytical level? Because I feel a lot of people with with AI visuals, you know, they'll just go in there, you know, play around, try it once or twice and, you know, then they kind of give up and they're like, alright. Well, you know, it's okay, but maybe I'll just wait for the technology to get better. So how did you really push yourself to, you know, go in and and really just break this down, you know, like a like a market researcher? Like, how did it get to that point?

Tianyu Xu [00:08:52]:
It's very it's very exciting as a as a market researcher. It's like a complete new field for you to to venture into. And, you, and there's no manual for that. There's no, even if you look at the doc official official documentation from OpenAI, from MidJourney, from the other models, there's no, there's no manual for you to that teaches you to to to to talk to the model step by step. So you the only way to learn is to experiment. And I I start with the very fundamentals. I start with a very basic prompt and eventually build up all the, all the variables and all the and all the complex, complex prompts and structures. So I, yeah, I just start from the very basics.

Tianyu Xu [00:09:36]:
I think that's the, I think that's the way to to to talk to almost every large language model or image model.

Jordan Wilson [00:09:44]:
Yeah. That's that that's hitting me right there. Right? Like, I had a whole episode yesterday, you know, kind of recapping everything about Chat GPT. And I think that's that's a perfect explanation is is you should always start very simple with with natural language and and kind of see your results and and go from there. So, I'm curious, Tianyu, like, how many. Right? Like, when we talk about generations and and we'll talk through that process of what that means, but, you know, on average, you know, did you start by doing, you know, 2 generations and now are you up to, like, 10, 20, 50? I mean, in general, when you're trying to get that perfect, you know, maybe AI video out of, you know, Piccolabs or something like that, how many generations are you sometimes doing?

Tianyu Xu [00:10:30]:
It depends on it depends on model, depends on the subject. So if, I I post a lot of things about cats because it's really easy to do and I just need to generate a few images of the cat of a certain style, then it is good enough for me to showcase different art styles or different, different different things you can do with DALI or with other models, but they are more complex, topics. So, for example, if, if the model is not well trained on a subject, for example, mid journey, before b 6, mid journey was unable to create mermaids or scorpion. So maybe you need to try 100 times to get a scorpion. Yeah, in that case, you will need more iterations. For for video for the video model, it it also depends on the subject. Some of the subjects are quite easy. For example, runway and the pea are both very good at the the nature of things.

Tianyu Xu [00:11:35]:
For example, the water, moving water, waterfall. Anything with water, you just you just need to create 1 or 2 videos, and then they will be almost perfect. But if you want to create something different, if you have an image, for example, I have a cat. I always use the cat. So I have a cat, kayaking on the white water. That could that could be difficult. That that could take more more tries. Maybe it'll take 20, 30 tries to get one video right.

Jordan Wilson [00:12:08]:
You know, it's it's funny because, you know, if you're listening on the podcast, I'm kinda laughing. Same thing for me because like, even the original, like, AI image generators, like I did, like, cats too. Like, I I was trying to, you know, hey, shout out my cat, Rocky. Right? I was trying to generate pictures of him. Yeah. Some of our cats is just, like, you know, fun. So a question here from, from Monica. So thanks for this question.

Jordan Wilson [00:12:31]:
So asking, can you explain a little bit more in-depth on what generations are? Yeah. We should probably explain some of this terminology. But, yeah, what's a a generation and, you know, when you're talking about doing them over and over, does that just mean that you're running the same prompt over and over, or are you tweaking things each time? So talk a little bit more about what it means to, you know, go through another generation.

Tianyu Xu [00:12:52]:
Yeah. Sure. This is a great question. So in generative AI, generation simply means a generation of an image or generation of a video. We tend to use we don't tend to use create a video because you don't actually create the video. You generate the video, generate the image with the model. Yeah. And then, sorry, what's your question again?

Jordan Wilson [00:13:21]:
Oh, no. That was perfect. Just just explaining those because I I think a lot of people maybe, you know, on our show are, you know, using large language models a lot more. Some people might be using, you know, mid journey or something like that. So I think it's important that we just kind of talk about, you know, what what these models even are and what they do. Right? Because,

Tianyu Xu [00:13:38]:
you

Jordan Wilson [00:13:38]:
know, and and maybe it's just good to explain to the audience. So you have your, you know, your AI image generators where you can put in either a a text or a photo and get a photo on the back end. So, you know, photos like Midjourney and DALL E. And, you know, kind of what we're talking about here a little bit more is, you know, these AI video, you know, companies, like like Runway and then Pico Labs where you can either put in text and get video or put in, photos and get video. So I'm I'm I'm curious, Tianyu, like, what's been your, kind of best process? Like, do you use photos to get videos? Do you use text to get videos? Do you sometimes test, to to do both and see which one's better? What's kind of been your your best approach for generating, videos in terms of what you start with?

Tianyu Xu [00:14:24]:
I I first tried the so there are only 2 ways to generate a video. 1 is text video, which is very similar to how you interact with ChartGPT or Midjourney. So basically, you put a description, the prompt, and then the video, like, typically, is about 3 seconds or 4 second video will come out based on the prompt that works stable stable diffusion video, picar, or runway. So these are the leading models before SORA. So when SORA comes out, everything's Everything checked. Right? So we are we are still talk so I'm I'm still talking about, some, some common practices I do when, for for in in the models like runway or or pickup. So I I discovered that text to video is not that robust. It's okay if you just create a short video or few short videos and you can combine them and make a shot, a very extremely shot film, out of the text to video models.

Tianyu Xu [00:15:28]:
But, the consistency is always a challenge. When I say consistency, I mean the consistency of the characters. For example, if you have a video, you can use a text prompt to create a cat kayaking on white water, and then you need to create another it's only 4 seconds. So each generation, you only have a 4 second clip. Then to make a longer video, you need to generate another one. And then you're prompt. You're putting the prompt a wild cat kayaking on the white water in front of the waterfall. And then you might get a completely different white cat from the first video.

Tianyu Xu [00:16:06]:
So in that way, it's almost impossible to make anything consistent. That's why I do not use the text to video methodology. What I what I what I always do is to create, all the It's like building a storyboard I create all the images with DALL E or with other models with somewhat consistent character first and then I upload the images to Pika or to Runway and animate every single image making them a 4 second or 8 second video and then combine them so that they look more like a consistent mini film

Jordan Wilson [00:16:48]:
Yeah. No. It's yeah. That's that's a good point. It's something to, you know, point out there to you. And you like yeah. Like, you know, we talked about OpenAI, Sora, that you can, you know, put in a text prompt and get up to a minute of video. Right? And then you don't even have to do these multiple generations because it generates multiple scenes together.

Jordan Wilson [00:17:06]:
But, you know, the majority of of people out there do not have access to that yet, to that yet. So, you know, we are all having to go through a similar process, that that you're laying out here. So maybe what we can do is is kinda walk, our our livestream audience through this and maybe show them a little bit of what you're talking about. And, again, you're listening on the podcast, we'll do our best to, to to talk you through this a little bit, but you might wanna check out the show notes and come, come watch this. So let's go ahead and, Tianyu, maybe you can walk us through here. You know, so we have Pico Labs open. Right? And we're gonna be generating, a 4 second video. So let's yeah.

Jordan Wilson [00:17:45]:
Maybe just go ahead and and walk us through what you're doing here, and then we can talk a little bit about your methodology.

Tianyu Xu [00:17:50]:
Yeah. Sure. So right now I'm in PICA. PICA is, yeah. So PICA, you within PICA, you can do the the the text to video or image to video. And here, I have uploaded an image of a a white cat kayaking the white water. Here, we're going to animate this image and making it into a 3 second video. But we have no clue that which direction this image, this video can go.

Tianyu Xu [00:18:23]:
So it's so at the beginning of the process is very similar to rolling the dice. So it basically means that you need to, you need to try every direction. You need to try every parameter to see which one actually works well. Pika has a very good camera control feature means so you can means that you can, there's a virtual camera that you, you can control the direction of the camera so that the image will move as a video. The the image will move according to the camera movement. For example, right now, I'm at pan pan left and pan right. Each time when I roll the dice, I try to just roll just choose one direction, just use one variable because it's easier to optimize in the future. So let's try pen right, and then you can also control the strength of motion.

Tianyu Xu [00:19:25]:
Typically, the the so the the the the lowest motion is 0 and the highest is 4. But, normally, based on my experience, the higher strength of motion never work. So I only choose let's choose one strength of motion as one and then that's it. Then we can generate the video. So that is only one video, you never know, right? You never know whether it's gonna work. So, and and then then we should try the next the the next parameter. Or how how about pen left? So we can do the same. Everything remains the same, but the only variable is pan left.

Tianyu Xu [00:20:04]:
And then we create another video. So the same thing works for tilt, tilt up and tilt up

Jordan Wilson [00:20:15]:
Alright. So I'll I'll give a little bit of background here. So if you're used to maybe using something like chat GPT, you you know, sometimes you have to wait. Right? But with, you know, Pico Labs and a lot of the other AI video generators, so what Tianyu is doing is, you know, he's able to, you know, generate maybe 4 or 5, 10 at a time, and you kinda have to wait for them in the background. Right? So, these different generations kind of his his rolling of the dice, they're all kind of slowly loading 1 by 1, and then we'll be able to see, you know, which of these directions, that that he kind of put in there, which is gonna work best for this specific photo. So one one thing that, you know, I'm curious about is, you know, you said that you always start with, you know, like, a one motion because if you go much higher, you know, it it might not work very well, and you also kind of first test, you know, different directions. I guess, is is that the best way, for for people to go or maybe should they, you know, as an example, oh, let's test, you know, camera right at 1, camera right at 2, and camera right at 3. Right? Like, I guess, how did you come in and and get to that this is the best way to start the process?

Tianyu Xu [00:21:30]:
Okay. We can, since we are doing a live demo, we can I can show you, what Do it? How would the different strength of motion look like? So we do pen write with a motion. Let's do the extreme one. Let's do pen write with a strength of motion of 4, which is the fastest. Let's do the fast and furious. Yeah. And see how it works.

Jordan Wilson [00:21:54]:
Alright. Great. So our our original, the original ones that you did with the different areas of motion all Okay. With the one, it looks like they're done. So, yeah, let's let's walk us through and kind of, show us how you can, decide what's good and what's not.

Tianyu Xu [00:22:08]:
Okay. So we already have pen write. So the first one is pen write. It seems okay, right? It seems okay because the kayak is moving from the left to the right and if your camera moves towards the right, it seems like a a natural movement. And, yeah, and I think this one is alright. How about pan left? Okay. So pan left. Do you see the problem? If you do the pan left, the water also moves to the left.

Tianyu Xu [00:22:40]:
Yeah.

Jordan Wilson [00:22:41]:
So that's

Tianyu Xu [00:22:41]:
the that's the, that's that that goes the opposite direction. Yeah.

Jordan Wilson [00:22:44]:
Looks very yeah. That one looks very, unnatural. Looks a little wild. Right?

Tianyu Xu [00:22:48]:
Yeah. This one doesn't work. How about this one? This one is tilt I think this is great This one is tilt up And it's showing the kayak going up and down Yeah. I think this direction also works quite well. And the next one is, down.

Jordan Wilson [00:23:12]:
Yeah. That one looks a little weird. It looks like something ominous is about to happen to the poor cat kayaking.

Tianyu Xu [00:23:17]:
Yeah. The cat is gonna talk the cat is talking. Maybe we can do a lip sync to the cat.

Jordan Wilson [00:23:22]:
Oh, yeah. I see that. Interesting. Yeah. That's so so that's important to talk about. Right? So with with with some of these, you know, you know, video generators, you don't have always a ton of control over the actual movements. Right? So in that instance, the cat was actually just moving, its mouth, where in the other instances it wasn't. Right? So Yeah.

Jordan Wilson [00:23:46]:
You know, I guess, have you found a way, you know, aside from and we can talk about, you know, maybe runway in there, you know, multi motion brush. Right? But is there any other way to to get that consistency, or is it just, man, you just gotta keep keep doing generations until you get it exactly right?

Tianyu Xu [00:24:05]:
Yeah. Sorry. Before I answer your question, I want to show you this. This is a speed of 4. The the the strength of motion is 4, and you can see that it's totally off.

Jordan Wilson [00:24:17]:
Yeah. Yeah. There's a wave. Yeah. A wave crashing the, the white cat on the kayak, a wave crashing in mouth is going mouth is going crazy. Right? Yeah. Probably something you would never use there.

Tianyu Xu [00:24:28]:
Yeah. And the cat can turn into a complete complete different animal.

Jordan Wilson [00:24:33]:
Yeah. Okay. So so essentially, the higher we we crank the motion there in general, at least how the technology is now, probably the less usable something becomes.

Tianyu Xu [00:24:43]:
Yeah. So based on our 7 or 8 experiments here, we can see that this one, tilt up, seems to be the best direct best the best movement, the best direction for for this image. So then what we can what what I would do next is to if I'm a perfectionist, I can continue generate the same the videos with the same setting multiple times, maybe 10 times, just to get a perfect, just to get a perfect video. Or I can add 4 seconds. For example, I can extend the video to make it 8 seconds or I can make other I can edit. I can change the region, change different parts of the video to make it more interesting. Yeah. So in that sense, rolling the dice is not just to roll the dice once, but after rolling the dice, you find the right direction, then you double down on that direction, then optimize your prompt, move it, and then you continue to roll the dice.

Tianyu Xu [00:25:49]:
So that's the it's a it's a it's a chain of the activities.

Jordan Wilson [00:25:54]:
And, I love this. And and, you know, Juan, thanks for this comment saying the same thing. Love the hands on and visual with these real life examples. A couple a couple of questions here, you know, from, from from Tara. Great question here. And let's just go ahead and we'll go back to here. There we go. So, Tara, asking, could you please share your strategies for organizing your media outputs? Specifically, I'm interested in how you utilize tagging or memory aids to streamline your process for future projects.

Jordan Wilson [00:26:24]:
That's a great question because, yeah, if you're doing, you know, you know, dozens of generations maybe for the same clip, How do you how do you keep that organized?

Tianyu Xu [00:26:35]:
That's a great question. I'm, I'm I'm terrible at doing this. I organized them based on projects, so I have different folders for different, different visual projects. And I also collect all the important prompts for all the custom instructions for Chargept. I collect all of them so that I can reuse them in the future.

Jordan Wilson [00:26:58]:
Another another good question here from Cecilia. So and, you know, I'm not even sure of this one, so maybe maybe I'll be learning something. So, Cecilia asking, can you ask, you know, these these different, AI, videos for a story line after you create the video, or do you retain your creative rights? Yeah. So, like, after you get that short one, do you, you know, can you go in there and then use a text prompt, or do you just kind of keep doing what you did and, you know, keep adding on for seconds?

Tianyu Xu [00:27:29]:
So I think there are 2 parts of the question. The first one is, if you want to continue the story and you can keep you can continue at 4 seconds for but you can only add on a few but you can only add on a few time, a few instances. Maybe add up to 16 seconds. So, eventually, you still need to create multiple clips from different images or different text prompts and making them to make them into a a full film. Then in terms of the creative rights, it depends on the platforms. So for example, for me, Jenny, there are certain tiers that you you own full rights of the images. For DALL E, I think DALL E has a for for DALL E as as well as so the the users who create DALL E images own the rights of the images. And then, for the other image creation tools or video tools, it all depends on their, terms they use.

Jordan Wilson [00:28:31]:
Yeah. And that's important to read those terms because they're always changing. And sometimes and I've talked about this on the show before. Sometimes they're a bit confusing or perplexing. Right? You know, passing off, you know, you know, to always read those terms. Alright. So another great question here from Yogesh, former guest on the show. How how you doing, Yogesh? So, asking, have you found a way to create images with text embedded? Yes.

Jordan Wilson [00:29:04]:
What's what's the best way to do this, Tian, you or is there not a good way?

Tianyu Xu [00:29:08]:
Yeah. This is a great question for the rolling a dice approach because none of the image models are very good at embedding the text into the image. None of them because they are not trained to do this. But if you are lucky, DALL E, on Mejourney, even on Google's Gemini, you can impact the you can generate images with text, with simple text, simple, like less than 5 words, then, if you generate less than 5 words, you can get everything right within like 3 to 4 tries. Yeah. So if you try multiple times, like if you try 10 times, for sure, you will get one satisfactory image with the text less than 5 words with the text properly displayed.

Jordan Wilson [00:30:01]:
Yeah. Yeah. I it's, it's almost like a painful process. Right? Especially with the text because, you know, you get it so close and then you're like, oh, just gotta keep, you know, keep generating and regenerating until you get it. I'm curious. I haven't tried this. Like, have you tried, doing a, you know, something like Yogesh was asking there, you know, something with text, but then creating a video afterwards. So if you get a good, you know, image that has text on it and then trying to create a video I haven't tried that.

Jordan Wilson [00:30:31]:
I'm curious if you've tried it and if it if it works at all.

Tianyu Xu [00:30:34]:
Yeah. It works. It works. It depends on it also. So it's easier to create text on the images than to animate the the the animate the image with a video tool because the some of the video models tend to have very limited, very, very limited, very limited fonts. So for example, I have I can I have, I have some something like a welcome written on the cloud? This this photo, this image is generated with DALL E, and then that word looks like the shape of a cloud. And then I can then I upload the image to Runway. Then easily, this these words, the word welcome can turn into the same word welcome, but in the font of Ariel or in the normal font that you can find in the in the word document.

Tianyu Xu [00:31:36]:
So that's the that's the limitation there.

Jordan Wilson [00:31:38]:
Yeah. And so, you know, I'm curious, like and we'll we'll make sure to share maybe, you know, your your your favorite project in the newsletter. So everyone listen and make sure to check the newsletter out. Sign up at your everydayai.com. But I'm curious, Tanya, like, what is the most time or the most, like, generations that you've, spent on a single project and then maybe, you you know, talk a little bit about that project.

Tianyu Xu [00:32:09]:
I I I spent a lot of time at the beginning when I first used Journey AI, when I first used, yeah, when I first used Midjourney. I spent a lot of time trying different directions, and I was on my phone all the time on Discord. But but then after after a few months, I figured out some, the, the methodology, and I began to be more efficient. So I rarely spend more time on any single project. Yeah. So I don't really see anything that will be too time consuming. Yeah. So maybe yeah.

Tianyu Xu [00:32:51]:
So maybe I can tell you tell you about, a bit yeah. About, the the a video that I was making, in the, at the beginning of this year. So I wanted to make a 5 minute mini film, based on CGI character of the cats in the store with a story. And that took me yeah. That actually took me some, some time, because, to make a proper film with a proper story, you need a lot of scenes and a lot of different characters. And the the limitation of the AI model is that it's it can be a trend with a lot of data, a lot of image data from everywhere, but it may not cover everything. So some of the scenes will be almost impossible to create. Now you have to change the story.

Tianyu Xu [00:33:44]:
You have to you have to change the story based on what images you can create. That's, that's the time consuming part.

Jordan Wilson [00:33:52]:
That's a good yeah. Yeah. That's the thing with generative AI. Right? Like, sometimes you think you're gonna get something consistent and, you know, another video that can work really well with the other videos, and then it's gonna spit out something completely random that you can't even work with. Right? So so, Tanya, like, we've we've talked a lot. We've we've we've, you know, talked about, some of your processes. We we've showed the audience live, you know, how to kind of roll the dice and work with these different generations. But, you know, as we wrap up here, what is your best piece of advice, for for people out there? Maybe they're new, to these AI video programs or maybe they're just really struggling to to get good results.

Jordan Wilson [00:34:31]:
What is, you know, kind of, like, especially based on on your methodology, right, of of kind of rolling the dice. What's your best piece of, you know, advice for everyone to improve, their outputs?

Tianyu Xu [00:34:46]:
I think there's one shortcut that we can all follow. Basically for any image model or for any video model, you just look at the successful images that people that other people create or you look at their marketing video. And then based on the marketing video, you will know what kind of categories, what image categories, what video categories are easier to create. Then you can then you just double down your efforts into this category that You have a much higher chance of success if you just focus otherwise that other people already proved that, proved their success.

Jordan Wilson [00:35:22]:
That's that's great. And, you know, I think this is important because this allows, and I guess one more question, right? So I think this allows everyone to to to be more creative. Right? Because I think a lot of times when people hear about these tools, mid journey, runway, etcetera, they're like, okay. Well, that's not my background. And I think, you know, you just gave a great example, you know, going from someone in in market research to now, you know, you're you're a visual artist. So so maybe one last question is is what ways might you recommend, you know, other people if if if they find this, you know, technology fascinating, but what are just some good ways that people can can use this, you know, across their, you know, businesses or, you know, personal projects?

Tianyu Xu [00:36:04]:
I think the best way is just to get started. Just start with baby steps. Start talking to Chargegbt, start creating images, start with very simple prompts, and then, build up, build up your process little by little.

Jordan Wilson [00:36:19]:
That's great. Yeah. I mean, you have you have to start small and you have to go step by step. Well, Tianyu, thank you so much for joining the Everyday AI Show and telling us all and teaching us all how to roll the dice for better AI visuals. We very much appreciate your time.

Tianyu Xu [00:36:36]:
You're welcome, and thank you so much for your time.

Jordan Wilson [00:36:38]:
Alright. And, hey, as a reminder, y'all, we're gonna be sharing some of his best examples. If you wanna watch the video, maybe if you're listening on the podcast, make sure to go to your everydayai.com. Sign up for the free daily newsletter. If this episode was valuable, please consider sharing it or sharing it with a friend. Leave us a rating, but we hope to see you back tomorrow and every day for more everyday AI. Thanks, y'all.

Gain Extra Insights With Our Newsletter

Sign up for our newsletter to get more in-depth content on AI