https://www.thepythoncode.com/article/using-speech-recognition-to-convert-speech-to-text-python
How to Convert Speech to Text in Python Learning how to use Speech Recognition Python library for performing speech recognition to convert audio speech to text in Python.
bdou Rockikz · 7 min read · Updated Oct 2020 · 37.2K · Machine Learning · Application Programming Interfaces · Sponsored
Speech recognition is the ability of a computer software to identify words and phrases in spoken language and convert them to human readable text. In this tutorial, you will learn how you can convert speech to text in Python using SpeechRecognition library.
As a result, we do not need to build any machine learning model from scratch, this library provides us with convenient wrappers for various well known public speech recognition APIs (such as Google Cloud Speech API, IBM Speech To Text, etc.).
Learn also: How to Translate Text in Python.
Alright, let's get started, installing the library using pip
:
Okey, open up a new Python file and import it:
The nice thing about this library is it supports several recognition engines:
- CMU Sphinx (offline)
- Google Speech Recognition
- Google Cloud Speech API
- Wit.ai
- Microsoft Bing Voice Recognition
- Houndify API
- IBM Speech To Text
- Snowboy Hotword Detection (offline)
We gonna use Google Speech Recognition here, as it's straightforward and doesn't require any API key.
Reading from a File
Make sure you have an audio file in the current directory that contains english speech (if you want to follow along with me, get the audio file here):
This file was grabbed from LibriSpeech dataset, but you can use any audio WAV file you want, just change the name of the file, let's initialize our speech recognizer:
The below code is responsible for loading the audio file, and converting the speech into text using Google Speech Recognition:
This will take few seconds to finish, as it uploads the file to Google and grabs the output, here is my result:
The above code works well for small or medium size audio files. In the next section, we gonna write code for large files.
Reading Large Audio Files
If you want to perform speech recognition of a long audio file, then the below function handles that quite well:
Note: You need to install Pydub using pip
for the above code to work.
The above function uses split_on_silence()
function from pydub.silence
module to split audio data into chunks on silence. min_silence_len
parameter is the minimum length of a silence to be used for a split.
silence_thresh
is the threshold in which anything quieter than this will be considered silence, I have set it to the average dBFS minus 14, keep_silence
argument is the amount of silence to leave at the beginning and the end of each chunk detected in milliseconds.
These parameters won't be perfect for all sound files, try to experiment with these parameters with your large audio needs.
After that, we iterate over all chunks and convert each speech audio into text and adding them up all together, here is an example run:
Note: You can get 7601-291468-0006.wav
file here.
Output:
So, this function automatically creates a folder for us and puts the chunks of the original audio file we specified, and then it runs speech recognition on all of them.
Reading from the Microphone
This requires PyAudio to be installed in your machine, here is the installation process depending on your operating system:
Windows
You can just pip install it:
Linux
You need to first install the dependencies:
MacOS
You need to first install portaudio, then you can just pip install it:
Now let's use our microphone to convert our speech:
This will hear from your microphone for 5 seconds and then tries to convert that speech into text !
It is pretty similar to the previous code, but we are using Microphone() object here to read the audio from the default microphone, and then we used duration parameter in record() function to stop reading after 5 seconds and then uploads the audio data to Google to get the output text.
You can also use offset parameter in record() function to start recording after offset seconds.
Also, you can recognize different languages by passing language parameter to recognize_google() function. For instance, if you want to recognize spanish speech, you would use:
Check out supported languages in this stackoverflow answer.
Conclusion
As you can see, it is pretty easy and simple to use this library for converting speech to text. This library is widely used out there in the wild, check their official documentation.
If you don't wanna use Python and want a service that does that automatically for you, I recommend you use audext, which converts your audio into text online quickly and cost effectively. Check it out!
If you want to convert text to speech in Python as well, check this tutorial.
Read Also: How to Recognize Optical Characters in Images in Python.
No hay comentarios:
Publicar un comentario