{ "hq": [ { "speaker": 0, "text": "Okay. Hello. Hello. Hello.", "start": 3.1999998, "end": 8.5 }, { "speaker": 1, "text": "Hello.", "start": 9.12, "end": 9.62 }, { "speaker": 2, "text": "English right now. They do Chinese right now.", "start": 14.095, "end": 16.755001 }, { "speaker": 1, "text": "It does Chinese or English. Chinese and English results in very dubious performance.", "start": 17.855, "end": 24.595001 }, { "speaker": 0, "text": "I had", "start": 28.74, "end": 29.06 }, { "speaker": 1, "text": "to switch everything to, William, it was a tragedy. So you know how Deepgram says they're better than whisper? It's because Deepgram has a non zero performance in foreign languages and in code switch in code switch audio, whereas the whisper just returns empty stream if there's code switching. Okay.", "start": 29.06, "end": 48.595 }, { "speaker": 2, "text": "I see.", "start": 48.595, "end": 49.155 }, { "speaker": 1, "text": "Whisper can't do code switching. It returns empty stream.", "start": 49.155, "end": 51.975 }, { "speaker": 0, "text": "We're using Deepgram. Right?", "start": 52.515, "end": 54.135002 }, { "speaker": 1, "text": "I switch whisper is actually better for English. So we're", "start": 54.390003, "end": 57.83 }, { "speaker": 2, "text": "Actually, Bailey. Bailey. Bailey. Bailey. You should do a default portfolio, and, that, like, if a person choose the person's main language is Chinese or something, can switch to a different model. And, also, in the future, it can be automatically switched because you can detect the conversation scene and switch it kind of routing in between.", "start": 57.91, "end": 76.495 }, { "speaker": 1, "text": "You can't detect the language until you transcribe it.", "start": 77.195, "end": 80.335 }, { "speaker": 2, "text": "Mhmm. Yeah.", "start": 80.475, "end": 81.295 }, { "speaker": 0, "text": "I think we can default to Deepgram and then save some metadata. Like, save the metadata of Yeah.", "start": 81.354996, "end": 87.52 }, { "speaker": 1, "text": "I guess you can if you the techno type, then you can retranscribe it with for to get better accuracy or something. Right?", "start": 87.52, "end": 92.96 }, { "speaker": 0, "text": "Like, if you keep getting, like, you know, like, for the past 7 days, this person only used Deepgram to transcribe English, then we can probably Yeah. Yeah. That would be the switch to whisper.", "start": 92.96, "end": 104.265 }, { "speaker": 1, "text": "Okay. Now you can stop the recording unless whether we", "start": 105.21638, "end": 107.69637 }, { "speaker": 0, "text": "can Okay.", "start": 107.69637, "end": 108.096375 }, { "speaker": 2, "text": "Let me say something because, otherwise, it's all you. So it's clearly you and I have different voice. Right? Okay. Stop.", "start": 108.096375, "end": 114.651436 } ] }