{ "hq": [ { "speaker": 0, "text": "Gauge and splatted 3 d environments.", "start": 0.16, "end": 2.42 }, { "speaker": 1, "text": "Yeah.", "start": 3.6799998, "end": 4.18 }, { "speaker": 0, "text": "Is that right?", "start": 5.6, "end": 6.42 }, { "speaker": 1, "text": "Yeah.", "start": 6.72, "end": 7.22 }, { "speaker": 0, "text": "Okay. Because that's what I've been working on.", "start": 8.0, "end": 10.24 }, { "speaker": 1, "text": "Gotcha. Yeah. That's one of the feature. That's right now. That? That is the kind of mixing between", "start": 10.24, "end": 17.785 }, { "speaker": 2, "text": "It's a mixing between 2 and 3.", "start": 19.465, "end": 21.385 }, { "speaker": 1, "text": "Oh, wait. Your two is a works through a space. The two w two is just a straight view.", "start": 21.385, "end": 27.145 }, { "speaker": 2, "text": "Yeah. It's a 3 has people.", "start": 27.145, "end": 28.9 }, { "speaker": 0, "text": "Okay. So I've been working", "start": 30.4, "end": 32.56 }, { "speaker": 3, "text": "on making a model of, you know, you you capture yourself using a", "start": 32.56, "end": 33.288002 }, { "speaker": 0, "text": "selfie type app or something. You take picture. And turn that into a model where you can put yourself through the three d view of yourself and do three d environment. That's what I've been working on. Got you. Got you. Can I generate", "start": 33.288002, "end": 53.735 }, { "speaker": 1, "text": "OVIT you out from that?", "start": 57.66, "end": 59.12 }, { "speaker": 0, "text": "I'm just working on the person model right now, not anything else.", "start": 60.62, "end": 65.28 }, { "speaker": 1, "text": "Got you. Got you. So the the goals that come up like this, so I can't put my solved. I can put other people. It's all into, like, a video or something. So basically, imagine I", "start": 65.74, "end": 76.965004 }, { "speaker": 0, "text": "Talk about other people. Last night, we were just talking about the user. So when you say other people, what do you mean?", "start": 77.125, "end": 86.47 }, { "speaker": 1, "text": "So for that one so the let me see. Maybe it's really that yvonne. So it's not slack. Compose. A photo of yourself with other people into, environment. You have seen some check that. So I could put myself there if I forgot to take a photo, and I can also potentially put other people with me to put a group photo, and I'm generating a selfie with me, a couple other people, walk through all the space.", "start": 86.770004, "end": 136.285 }, { "speaker": 3, "text": "May I have", "start": 136.285, "end": 136.605 }, { "speaker": 0, "text": "those people know or these generic", "start": 136.605, "end": 139.57 }, { "speaker": 3, "text": "strangers? Those could be so those are so people", "start": 139.79001, "end": 140.046 }, { "speaker": 0, "text": "you'll meet, like, people you", "start": 140.046, "end": 140.93001 }, { "speaker": 1, "text": "have multiple photo of, at least.", "start": 150.545, "end": 152.565 }, { "speaker": 0, "text": "So you're gonna have", "start": 154.385, "end": 154.95763 }, { "speaker": 3, "text": "these people given consent to generate a photo of them? So so, basically, you don't need a consent for", "start": 154.95763, "end": 156.64499 }, { "speaker": 1, "text": "is not to quit a selfie for just for that owner, like, off the device. So, basically, it's like, they they know their own photo, but, like, they don't know your they are being generated, regenerated by your computer self. You is, like, it's not people you're on the internet, but you have able to compose so if I meet 3 person for dinner, but I forgot to go selfie with them, I'm able to compose a selfie with me, without a 2 with with other, 2 three persons.", "start": 165.81, "end": 192.61499 }, { "speaker": 0, "text": "How are you how are you gonna create a photo of them.", "start": 192.61499, "end": 196.35 }, { "speaker": 1, "text": "Oh, because you because you", "start": 197.21, "end": 198.57 }, { "speaker": 0, "text": "haven't taken a photo.", "start": 198.57, "end": 199.77 }, { "speaker": 1, "text": "Oh, because you'll have to go photo because you have the wearable. You're you took a photo. You you took a you took a video off that. You'll actually have sequence of image more than 1, at least a multiple photo. I mean, like, if you if you're lucky, you'll have a video of them, like, over 1 hour or something. So, yeah, like, I don't otherwise, you at least have set several photo of them.", "start": 199.77, "end": 217.745 }, { "speaker": 0, "text": "K. That. That sounds like", "start": 219.72499, "end": 221.44286 }, { "speaker": 3, "text": "a gray area of you", "start": 221.44286, "end": 222.12857 }, { "speaker": 0, "text": "have photo of somebody and you're gonna generate something new using their likeness. Like that sounds legally. Not you're not in the clear there.", "start": 222.12857, "end": 240.975 }, { "speaker": 1, "text": "You just don't want to laugh about it. We're we're trying to make a little dumb it's for it.", "start": 241.275, "end": 245.73236 }, { "speaker": 3, "text": "Yeah. Ignoring the legal area, the the technical area, like, if you don't have", "start": 245.73236, "end": 246.3253 }, { "speaker": 1, "text": "good photos or any", "start": 246.3253, "end": 247.06999 }, { "speaker": 0, "text": "it's gonna be hard to generate.", "start": 260.295, "end": 261.755 }, { "speaker": 1, "text": "Yeah. So this one do 4 k for so so so so so this one do 4 k, videos right now or 4 k, like, camera based thing. And, I kind of want to see how far we can pull with the data, like, maybe, like,", "start": 263.015, "end": 280.67 }, { "speaker": 0, "text": "Just going off of that, you, the user have taken, you know, take a video of your face and body and then able to regenerate that, make a AI model of you and be", "start": 281.85, "end": 294.95898 }, { "speaker": 4, "text": "able to put you in new environments. Yeah. So that that's that's just how the, basically, the log logistic works.", "start": 294.95898, "end": 296.01498 }, { "speaker": 1, "text": "But, like, so it's more like, I would imagine this feature is mostly only between friends, like, like, especially, like, college students or something like that. And, they go to class together, but they don't have good and, like, for people who are with their children, they they they do record in the whole whole, like, soccer game, but they don't have, a good photo. They they forgot to take a selfie with their wife or something like that. They can't regenerating. Is it the goal is to having it to be, like, social media, content generation or, like, between friends, so like, something like that. Yeah. With this, some customer testing about those type of features, like, like, the I was about to study at least earlier, but it took me a while because we was asking a bunch of people would do that I can see in the interaction with that. And, it seems like those are the few features that people really cannot stick with, and, like, a basic creator, put themself, put their friends, and put, like, people they meet in, sync with a self, if you haven't took one, like, like, basically re reattached the lighting of the photo it took and making the photo it took, like, a creative work if it if they're wearing this, go into international travel in Japan or, like, China or, like, some they can create an infinite walk through off the street or something like that. They like that. And they also really like to mix. So they kind of, like, get the sky from Switzerland, Pudi, Berlingen, and regenerating a the videos. Those are basically the couple features that seems like people feels like it's not just fun. It's actually useful. It's not just like, hey. I can cut a person into a cake or something like that because that's not really useful.", "start": 305.17, "end": 411.405 }, { "speaker": 0, "text": "Do you or the last time we talked, I asked what kind of photo or poses what poses the animations you wanted? People.", "start": 413.065, "end": 422.68 }, { "speaker": 1, "text": "Oh, yeah. Yeah.", "start": 423.13998, "end": 424.81998 }, { "speaker": 3, "text": "Did you? Yeah.", "start": 424.81998, "end": 425.3 }, { "speaker": 1, "text": "Let me put something there.", "start": 425.3, "end": 426.25998 }, { "speaker": 3, "text": "Did you", "start": 426.25998, "end": 426.58 }, { "speaker": 0, "text": "come up with a list?", "start": 426.58, "end": 427.63998 }, { "speaker": 1, "text": "Yeah. Yeah. Let me see. One moment. I'll just add this to be another contact.", "start": 427.665, "end": 431.545 }, { "speaker": 3, "text": "I kinda agree. Animate people", "start": 431.545, "end": 431.785 }, { "speaker": 1, "text": "being on scene. I'm Thank you. Let's see.", "start": 431.785, "end": 458.415 }, { "speaker": 3, "text": "One.", "start": 469.59, "end": 470.09 }, { "speaker": 1, "text": "Something out there.", "start": 487.63498, "end": 488.199 }, { "speaker": 5, "text": "It's like, like, the a couple, which is a janitor of DV polls, and they're like, oh, wait. Wait. Wait.", "start": 488.199, "end": 488.85498 }, { "speaker": 1, "text": "The I just like to, like, say hi to this thing. Yeah. Something like that. Yeah. So", "start": 497.36, "end": 513.125 }, { "speaker": 6, "text": "so are you training a model for that right now, or how's the I'm building a data set first Uh-huh.", "start": 515.905, "end": 516.2977 }, { "speaker": 1, "text": "Before I train", "start": 516.30133, "end": 516.885 }, { "speaker": 3, "text": "a month.", "start": 525.96, "end": 526.44 }, { "speaker": 0, "text": "So I'm using building a dataset of synthetic people because I don't have access to real data set. Gotcha. Gotcha. Train using three d people and see how that works. And if that works, then it", "start": 526.44, "end": 545.125 }, { "speaker": 3, "text": "can try it with real people. Got you. Got you. Cool. Is there any", "start": 545.125, "end": 545.865 }, { "speaker": 1, "text": "missing tool that can potentially use to do some pausing or is there, like, an Alibaba library, baby?", "start": 553.86, "end": 560.12 }, { "speaker": 2, "text": "For the Alibaba stuff is all for videos. I mean, the the standard way to pose a person is to use control net, which is train to post people.", "start": 561.3, "end": 569.665 }, { "speaker": 1, "text": "How good does that answer?", "start": 570.685, "end": 571.96497 }, { "speaker": 2, "text": "Pretty pretty decent, like, decent enough to be a legal problem.", "start": 571.96497, "end": 575.425 }, { "speaker": 1, "text": "Show me an example. You don't have to just show something else. So", "start": 577.165, "end": 581.005 }, { "speaker": 2, "text": "Let let me find an example. Justin, do you have any good control at demos?", "start": 582.845, "end": 586.76 }, { "speaker": 0, "text": "Wasn't acceptable because you wanted 3 d camera rotation, things like that.", "start": 599.865, "end": 606.685 }, { "speaker": 1, "text": "We want a camera pen. Like, is camera pen good is the difficulty good enough to do a camera panel space?", "start": 607.225, "end": 615.07 }, { "speaker": 2, "text": "Yeah. But I I thought you were generating photos. I thought you were generating photos.", "start": 615.07, "end": 620.85004 }, { "speaker": 0, "text": "No. This you said you want a video So that's why we were talking about Okay.", "start": 621.95, "end": 625.52496 }, { "speaker": 3, "text": "Now now", "start": 625.52496, "end": 625.925 }, { "speaker": 2, "text": "I see we've been upgraded to analytics.", "start": 625.925, "end": 627.365 }, { "speaker": 0, "text": "Control net and all that, but said it wasn't adequate for videos. And so", "start": 627.685, "end": 633.365 }, { "speaker": 1, "text": "It's not all the adequate for video.", "start": 633.52496, "end": 635.205 }, { "speaker": 0, "text": "Yeah. Yes.", "start": 635.205, "end": 636.58496 }, { "speaker": 1, "text": "Not for people.", "start": 637.31995, "end": 638.04 }, { "speaker": 0, "text": "Making my own model to", "start": 638.12, "end": 639.56 }, { "speaker": 1, "text": "do videos.", "start": 639.56, "end": 640.69995 }, { "speaker": 3, "text": "You could start with", "start": 641.24, "end": 642.12 }, { "speaker": 2, "text": "one of the Alibaba lab Alibaba ones, like, watch them. And I don't remember what the models call it.", "start": 642.12, "end": 647.48 }, { "speaker": 0, "text": "I try list. Play different, available models, and their code bases are all shit.", "start": 647.48, "end": 655.41504 }, { "speaker": 2, "text": "Wow. That doesn't mean the model is bad.", "start": 655.47504, "end": 657.575 }, { "speaker": 0, "text": "Right. But I I couldn't you try to get the libraries all installed, and they require libraries that are three years old.", "start": 658.11505, "end": 669.91003 }, { "speaker": 2, "text": "Yeah. You", "start": 669.91003, "end": 670.39 }, { "speaker": 0, "text": "download a specific one, and then it doesn't work. Finally get it to", "start": 670.39, "end": 673.6023 }, { "speaker": 2, "text": "work and the output doesn't look very good. You got I use for the for the Chinese models? I'm trying to find the do do do we remember? I don't remember what it was called. What was that model last year that could do, like, dancing people?", "start": 673.6023, "end": 694.38 }, { "speaker": 1, "text": "Addie Baba, dancing people model. It don't show drafting. The the I'm Joseph Chorus, like, little say, the field.", "start": 695.48004, "end": 705.295 }, { "speaker": 2, "text": "Field. Yeah. Okay. Mhmm. Let me see. Ding", "start": 705.755, "end": 720.4 }, { "speaker": 3, "text": "ding ding. Here we go.", "start": 720.82, "end": 724.56 }, { "speaker": 2, "text": "This is some modestly modestly well venture funded company that clearly is using open source models to generate memes.", "start": 729.735, "end": 743.3879 }, { "speaker": 0, "text": "So Yeah. Mhmm.", "start": 744.3493, "end": 746.22003 }, { "speaker": 2, "text": "Yeah. If if you're fighting the Chinese models, you have to use condo or VM for them. The Ali Alibaba models in particular. Yeah. Alibaba models in particular use unreasonably out of date packages? Yeah.", "start": 746.76, "end": 761.495 }, { "speaker": 0, "text": "The packages, it's hard to get. They use because it's not you can download one library. It's major version, but then it's also the CUDA tooled kit version? Can they mismatch?", "start": 761.495, "end": 774.16003 }, { "speaker": 2, "text": "I I've seen it. I've seen it be reasonably CUDA agnostic before. Have you have you actually had failures due to, like, wrong CUDA?", "start": 774.54004, "end": 781.52 }, { "speaker": 0, "text": "It was I'm trying to think of what the name of the library was. Maybe it was PyTorch 3 d. And they're compiled with different CUDA tool kit versions. But, yeah, the GitHub says install it this way, and you follow instructions and it breaks. It just does. They're like, well, not our problem.", "start": 784.105, "end": 814.275 }, { "speaker": 2, "text": "You you you have to also start with, like, the right version of Linux and stuff.", "start": 814.275, "end": 818.29504 }, { "speaker": 0, "text": "Yeah. That's I've got Ubuntu too, you know, the latest version.", "start": 819.47504, "end": 823.73505 }, { "speaker": 2, "text": "Probably the problem. You have to use, like, 2204.", "start": 824.09, "end": 826.27 }, { "speaker": 0, "text": "I had 22. I think I had 20. I see.", "start": 829.85004, "end": 832.635 }, { "speaker": 3, "text": "And it I actually got it to work with an output. Okay. Which do", "start": 832.635, "end": 835.31 }, { "speaker": 2, "text": "remember which model you were using?", "start": 839.985, "end": 841.605 }, { "speaker": 0, "text": "What?", "start": 842.385, "end": 842.885 }, { "speaker": 2, "text": "Do you remember which model you were looking at?", "start": 843.90497, "end": 846.165 }, { "speaker": 0, "text": "These were, head based models.", "start": 847.02496, "end": 849.785 }, { "speaker": 2, "text": "Head based models. Alright. I'm gonna stop sharing this. I'm gonna stop sharing this.", "start": 850.14496, "end": 856.12 }, { "speaker": 0, "text": "Yeah. They're they're, you know, like, recreate a person's head from images and then recreate their body.", "start": 856.12, "end": 864.22003 }, { "speaker": 3, "text": "And", "start": 866.92, "end": 867.17 } ] }