SegmentDiarization This is a Speaker Diarization model trained based on the Whisper tiny encoder. it's only for fun and cannot be used in a production environment. Datasets AMI dataset (best val loss: 1.129) VoxConverse (best val loss: 0.112)