{ "hq": [ { "speaker": 0, "text": "This meeting. Okay. So", "start": 0.0, "end": 1.36 }, { "speaker": 1, "text": "The letters are poor.", "start": 1.68, "end": 2.3999999 }, { "speaker": 0, "text": "Moving forward on, this is basically key frame processing systems. That's what I'm thinking about. They are. Yeah. Why does audio?", "start": 2.3999999, "end": 10.66 }, { "speaker": 1, "text": "Audio matched to people's names as floating? I don't I don't I'm putting it down here. This is one of the things we need to do because, sorry. I don't mean, okay. I speaking of which, did did I put I put no. That was in the trans but I didn't put it yet. Video based diarize.", "start": 11.679999, "end": 32.52 }, { "speaker": 0, "text": "What the hell does that even mean?", "start": 33.62, "end": 35.239998 }, { "speaker": 1, "text": "You know, how like", "start": 35.54, "end": 36.26 }, { "speaker": 0, "text": "context has, like, an example.", "start": 36.42, "end": 38.2 }, { "speaker": 1, "text": "Okay. Fine. Speed one speaker to base Yeah. I'm sorry. Video footage.", "start": 39.059998, "end": 45.464996 }, { "speaker": 0, "text": "I would never take such a", "start": 46.085, "end": 47.204998 }, { "speaker": 1, "text": "Holy crap.", "start": 47.285, "end": 48.004997 }, { "speaker": 0, "text": "I have 2 kids. I can't be in the bathroom so long.", "start": 48.004997, "end": 50.664997 }, { "speaker": 1, "text": "You're you you can share some with Brendan.", "start": 52.004997, "end": 54.120003 }, { "speaker": 0, "text": "Brandon also. I'll save it for", "start": 54.420002, "end": 57.620003 }, { "speaker": 1, "text": "later, I guess.", "start": 57.620003, "end": 57.940002 }, { "speaker": 0, "text": "I know it sounds weird for me. I'm usually, searching, but I just ordered a pizza. Okay. Oh, sweet. Shop. It's a Delfina.", "start": 57.940002, "end": 67.575 }, { "speaker": 1, "text": "Okay. Do do some explanation of the 3 things at the bottom. Refrigerate. Whole you you do refrigerate wire explain. Whole video short description is to help the rag. It it gives you, like, a additional context that you can rag on.", "start": 68.354996, "end": 80.87 }, { "speaker": 0, "text": "Which is what I'm talking about is contextual people.", "start": 80.87, "end": 83.43 }, { "speaker": 1, "text": "Yeah. Yeah. That audio matched to people names means that as you interact with people, they might speak their names. You want to go form that pairing in order to name the people you've met so that later you can search them up in, like, database of the people you know, which is a very useful feature.", "start": 83.43, "end": 97.534996 }, { "speaker": 0, "text": "Okay. So you said audio, lots of people Yeah. Go back to the whole frame for instruction. This is basically the index your videos.", "start": 97.935, "end": 104.895 }, { "speaker": 1, "text": "Yeah. Yeah. It's it's it's additional on top of the, like, frames.", "start": 104.895, "end": 108.41 }, { "speaker": 0, "text": "That is, content based on on the frames?", "start": 108.41, "end": 111.79 }, { "speaker": 1, "text": "On the whole video. On the whole video.", "start": 112.17, "end": 113.53 }, { "speaker": 0, "text": "Let me try to Yeah. Yeah. Yeah. On entire video Oh. A video description, video context. Yeah. Yeah. That's a the caption or description. Yep. It's, maybe you don't caption. It's captioning the entire video. But that's a description for you.", "start": 113.53, "end": 130.305 }, { "speaker": 1, "text": "I I think", "start": 130.305, "end": 130.865 }, { "speaker": 0, "text": "captioning or description.", "start": 131.105, "end": 132.225 }, { "speaker": 1, "text": "I think they call it sparse captioning as the technical first term I've heard.", "start": 132.225, "end": 135.54744 }, { "speaker": 0, "text": "But the result is a it's not a summary of everything. I don't know. So I'm just It's not the potential for that. Some", "start": 136.02744, "end": 143.33743 } ] }