{ "hq": [ { "speaker": 0, "text": "2, 3, 4, 5.", "start": 0.39999998, "end": 3.06 }, { "speaker": 1, "text": "Woah. Beautiful.", "start": 3.28, "end": 4.98 }, { "speaker": 0, "text": "Woah. Very good selection.", "start": 5.6, "end": 7.7799997 }, { "speaker": 1, "text": "Because there's no selection. Well, actually, there is. There's an aesthetic score for that photo. Wait. You I", "start": 7.9199996, "end": 13.679999 }, { "speaker": 0, "text": "thought you just put in, like, 30th frame?", "start": 13.679999, "end": 16.025002 }, { "speaker": 1, "text": "No. Actually, there is a very small aesthetic score. Oh. Out of the 5 seconds, it will go try to find the best frame.", "start": 16.725, "end": 22.565 }, { "speaker": 0, "text": "Okay. So we just make sure that when we tap, we stand in front of something beautiful. Oh, like, brilliant.", "start": 22.565, "end": 30.63 }, { "speaker": 1, "text": "Alright. So now if, you can't stop the tap. Oh.", "start": 33.489998, "end": 37.75 }, { "speaker": 0, "text": "Oh, it's fixed to be 1 minute.", "start": 37.89, "end": 40.39 }, { "speaker": 1, "text": "So so you probably Why do you show me So so you probably the restaurant. Why do you be extremely careful? Why does it say request for transcription summary? Wait. What what's this what's this at? Where is that? Why does it call", "start": 43.595, "end": 58.070187 } ] }