{ "hq": [ { "speaker": 0, "text": "Here, you can check out the my my quick aesthetic score selection at work. It picked the best frame. Like, Kelly actually looks good in this frame.", "start": 0.48, "end": 9.84 }, { "speaker": 1, "text": "I I barely should also.", "start": 9.84, "end": 11.12 }, { "speaker": 0, "text": "Yeah. I can't", "start": 11.12, "end": 11.92 }, { "speaker": 1, "text": "I can't take a photo of that. This is Michael.", "start": 11.92, "end": 14.24 }, { "speaker": 0, "text": "Hey, this feature right here, taking photos of people who don't smile well.", "start": 14.24, "end": 17.865 }, { "speaker": 1, "text": "I'll make that smile even if you don't.", "start": 18.005, "end": 19.945 }, { "speaker": 0, "text": "Yeah. Sorry. I wanted to show a demo of this thing, but I ran into some weird flake. So I had to go restart the demo. We we can we can let me take one last look at the video upload. It's just", "start": 20.085, "end": 29.359999 }, { "speaker": 2, "text": "like, the video just upload. Yeah.", "start": 29.359999, "end": 32.079998 }, { "speaker": 0, "text": "Yeah. So there's like a generic, like, like, post process, which could include other activities beyond encoding. Right?", "start": 32.079998, "end": 38.079998 }, { "speaker": 2, "text": "Like, when it's encoded, then you wanna send it off to what, like, another message does all the trimming and stuff like that.", "start": 38.079998, "end": 44.024998 }, { "speaker": 0, "text": "It's like", "start": 46.004997, "end": 46.405 }, { "speaker": 1, "text": "That's this is a so this is our server side.", "start": 46.965, "end": 50.085 }, { "speaker": 2, "text": "Yeah. That's likely to Amazon EC 2. And", "start": 50.085, "end": 52.684998 }, { "speaker": 1, "text": "Yeah. Okay.", "start": 52.805, "end": 53.445 }, { "speaker": 2, "text": "And so is what the front end sends to.", "start": 53.445, "end": 55.64 }, { "speaker": 1, "text": "I think I think there will be some sense in between here, either probably before this or somewhere. We should do, like, deduplication, frame optimization. There's a a small video process.", "start": 55.699997, "end": 68.02 }, { "speaker": 2, "text": "Did we wanna do that with is after encoding or before encoding? I'm not I'm not a pro at that part.", "start": 68.02, "end": 73.675 }, { "speaker": 1, "text": "I", "start": 73.675, "end": 74.175 }, { "speaker": 0, "text": "I think it comes after. Right? The the so so okay. I I can I can now give, like, a painstakingly detailed description of what actually happens because I just implemented it? The we we the hardware encoder on this device outputs a stream of bytes, which form like a like a whatever. Mpeg standard H 265 block. And block doesn't have a strong understanding of what frames are anymore because it's it's like variable length encoded based on how much compression is applied to each frame. So when you have like a finished video, that's like the video. It get sent to the back end. And So we're at this point. Right? Yeah. So so so so at object store raw video, you receive a compressed set of frames. Then at encoding, you would first decode all of the frames, then do all of your media processing, like key frame selection. Like key frame key you first decode it, then you can do a key frame selection, like, best frame identification, like, moment fraction, whatever, then at the very end, you can optionally re like, reencode the video with this extra information if you care. So then", "start": 74.395, "end": 139.74 }, { "speaker": 1, "text": "why we encode that the the decode?", "start": 139.74, "end": 142.24 }, { "speaker": 0, "text": "The encoding makes the video 500 times smaller for uploading. Otherwise, you can't upload that. Yeah. It's done in hardware on device. The the, like, soul of this device is hardware 8265 encoder.", "start": 142.3, "end": 153.035 }, { "speaker": 1, "text": "Like, so our reach, is this in total, more standard encoder, or for a different video or different query? The multiple in total.", "start": 153.175, "end": 158.775 }, { "speaker": 0, "text": "100% standards, just 8265.", "start": 158.775, "end": 160.715 }, { "speaker": 1, "text": "Okay. But, like, when you will we all do some video processing, like, raw some diffusion on top of some frames or some part of the videos. I don't think those can operate.", "start": 161.015, "end": 170.28 }, { "speaker": 0, "text": "You know, you decode in the back end.", "start": 170.28, "end": 171.64 }, { "speaker": 2, "text": "Yeah. Yeah. Yeah. So that so that that's my, I might have okay. Is this is typically like a user buzz on YouTube and uploads a video of raw", "start": 171.64, "end": 180.062 } ] }