I'd like to find out how to transcribe an mp3 file with 2 speakers in it. Currently I am able to transcribe however it outputs both speakers into 1 paragraph. I see that google has some tools to help with this however I do not want to link this to a google api service as I need to test the accuracy of the speech recognition against a large volume of audio files before billing can occur.Any help with this as well as improving the dictionary would be highly appreciated :)

The code I have so far is :

import speech_recognition as srfrom pydub import AudioSegmentsound = AudioSegment.from_mp3("transcript.mp3")sound.export("transcript.wav", format="wav")AUDIO_FILE = "transcript.wav" r = sr.Recognizer()with sr.AudioFile(AUDIO_FILE) as source: audio = r.record(source) try:print(r.recognize_google(audio, language = 'en-ZA'))except Exception as e:print (e)
1

Best Answer


You can check Google's docs for speaker diarization along with this library as a code reference, specifically check out the code in google_speech_wrapper.py https://github.com/saharmor/fullstack-transcribe

Disclaimer: I'm the author of this library