Programming a microchip to speak text
Programming a microchip to speak text
I am starting a project where I want at least 15 minutes or more time to record words. Should I use a IDS Sinngle Chip Voice Recorder playback device or should I use a PIC that has a program to do this task. I want to program sentences at different intervals. Don
Fifteen minutes? ISD gives around 15 seconds or 2 minutes last I looked.
How about using an old flash card, like a 16MB, and program a PIC to record on it?
Here is a PIC project that emulates an obsolete Text To Speech Synthesizer.
http://home.alphalink.com.au/~derekw/pictalker/main.htm
It can say anything but it sounds very robotic.
How about using an old flash card, like a 16MB, and program a PIC to record on it?
Here is a PIC project that emulates an obsolete Text To Speech Synthesizer.
http://home.alphalink.com.au/~derekw/pictalker/main.htm
It can say anything but it sounds very robotic.
If you want to use a micro...
say the speech frequency is 2.5 kHz
The sampled analog voice signal is digitized by the integrated ADC peripheral of the micro.
then theres Nyquist Theorem...
http://zone.ni.com/reference/en-XX/help ... t_theorem/
The Nyquist theorem states that a signal must be sampled at least twice as fast as the highest frequency component of the signal to accurately reconstruct the waveform.
The sampling frequency of 5.5 KHz could work ..BUT ... this is the trade off.... Sample higher for better quality... Sample less for maximum duration of voice that can be stored in the flash memory.
You would then setup and ISR Interrupt Service Routine to sample and store data during a RECORD.
Say you have 60K room of flash and a 12 bit data stream you may get 6 sec of record time...
The formula for determining the size (in bytes)
(# seconds) * (# channels) * (sampling rate) * (bit resolution) / 8 = file size
(6sec*1*5.5k*12/8)=55,000 leaving some room for code...
15 min = 900 sec
12 bit res
(5.5k*900 sec*12/8)=7.42Meg
8 bit res
(5.5k*900 sec*8/8)=4.95Meg
thats just for one speaker...
you can try compressing the voice data with Mu-law and A-law
http://musik.ringofsaturn.com/compress.php
hope that helps!
say the speech frequency is 2.5 kHz
The sampled analog voice signal is digitized by the integrated ADC peripheral of the micro.
then theres Nyquist Theorem...
http://zone.ni.com/reference/en-XX/help ... t_theorem/
The Nyquist theorem states that a signal must be sampled at least twice as fast as the highest frequency component of the signal to accurately reconstruct the waveform.
The sampling frequency of 5.5 KHz could work ..BUT ... this is the trade off.... Sample higher for better quality... Sample less for maximum duration of voice that can be stored in the flash memory.
You would then setup and ISR Interrupt Service Routine to sample and store data during a RECORD.
Say you have 60K room of flash and a 12 bit data stream you may get 6 sec of record time...
The formula for determining the size (in bytes)
(# seconds) * (# channels) * (sampling rate) * (bit resolution) / 8 = file size
(6sec*1*5.5k*12/8)=55,000 leaving some room for code...
15 min = 900 sec
12 bit res
(5.5k*900 sec*12/8)=7.42Meg
8 bit res
(5.5k*900 sec*8/8)=4.95Meg
thats just for one speaker...
you can try compressing the voice data with Mu-law and A-law
http://musik.ringofsaturn.com/compress.php
hope that helps!
Having done a lot of speech synthesis I'd suggest 8-bit at 10Ks/s is a good trade off,
with 6.5 seconds per 64KB.
I think I tried A/Mu law on 4-bits to get 11 sec per 64KB but got better results by normalizing
8 bits and then dumping the lower 4.
Using the Roman Black method you get surprisingly good results with a minute per 64KB.
(at around 10k samples per second, compressed)
http://www.romanblack.com/picsound.htm
with 6.5 seconds per 64KB.
I think I tried A/Mu law on 4-bits to get 11 sec per 64KB but got better results by normalizing
8 bits and then dumping the lower 4.
Using the Roman Black method you get surprisingly good results with a minute per 64KB.
(at around 10k samples per second, compressed)
http://www.romanblack.com/picsound.htm
Didn't Engineer1138 build the roman black BTc device?
http://www.romanblack.com/picsound.htm
That should be able to hold quite a bit more than a typical encoding solution won't it? He lists 17sec = 32KB which is a little under 2KB/sec. You could fit 18min in 2MB. Don't know where to find 2MB of flash though.
http://www.romanblack.com/picsound.htm
That should be able to hold quite a bit more than a typical encoding solution won't it? He lists 17sec = 32KB which is a little under 2KB/sec. You could fit 18min in 2MB. Don't know where to find 2MB of flash though.
you may want to consider MP3 players chips
have a look at the following
http://194.201.138.187/epages/Store.sto ... cts/SPE020
it is MP3 module that uses a compact flash card to hold the sound files and can be controled using a pic
have a look at the following
http://194.201.138.187/epages/Store.sto ... cts/SPE020
it is MP3 module that uses a compact flash card to hold the sound files and can be controled using a pic
Hi,
Some years ago I was involved in a design of an alarm announcement system with a number of stored messages.
It was built around a speech co-processor from DSP Group, Inc, CT8016 and a codec from OKI, MSM7578. This setup could record and playback recorded speech with telephone quality.
It could also playback .wav type files from a PC, if in the correct format.
The great thing with this setup was the compression rate, resulting in a datasize/rate of 1kbyte/second. With an 1Mb memory you would get 17minutes of speech. With this slow data rate, almost any micro could do the job, together with some analogue interface op-amps.
TOK
Some years ago I was involved in a design of an alarm announcement system with a number of stored messages.
It was built around a speech co-processor from DSP Group, Inc, CT8016 and a codec from OKI, MSM7578. This setup could record and playback recorded speech with telephone quality.
It could also playback .wav type files from a PC, if in the correct format.
The great thing with this setup was the compression rate, resulting in a datasize/rate of 1kbyte/second. With an 1Mb memory you would get 17minutes of speech. With this slow data rate, almost any micro could do the job, together with some analogue interface op-amps.
TOK
Gorgon the Caretaker - Character in a childrens TV-show from 1968.
The Roman Black will need a rather sharp filter to make the sound good at 1kB/s, since the 'silence' or pause sequences in the sound will have a ripple of 4kHz. This low max frequency will also increase the noise level. I suppose that is why 15625Hz is use to sample the sound, making nearly 2kB/s data rate.
I remember I did some test with a similar system (made in pure hardware) many years ago. The theory is good but I found that the sample frequency needed, was higher than usable for me at that time. (I tried using 8ksamples/s)
With a micro you can of course make decisions based on previous, and the next samples, and tweak the sound in that way.
One of the 'big' problems I found, was the startup of the sequence. To make a valid midpoint or neutral level, you need to ramp up the output to this level, and keep it there to have full positive and negative dynamic range. This problem is not so big, when you have a micro to work with, but it was very big when working with logic.
VIRAND
I'm sure your comment about obsolete parts is correct, ant least today in the RoHS world, but I think it may be possible to get some, if you really want to try it.
TOK
I remember I did some test with a similar system (made in pure hardware) many years ago. The theory is good but I found that the sample frequency needed, was higher than usable for me at that time. (I tried using 8ksamples/s)
With a micro you can of course make decisions based on previous, and the next samples, and tweak the sound in that way.
One of the 'big' problems I found, was the startup of the sequence. To make a valid midpoint or neutral level, you need to ramp up the output to this level, and keep it there to have full positive and negative dynamic range. This problem is not so big, when you have a micro to work with, but it was very big when working with logic.
VIRAND
I'm sure your comment about obsolete parts is correct, ant least today in the RoHS world, but I think it may be possible to get some, if you really want to try it.
TOK
Gorgon the Caretaker - Character in a childrens TV-show from 1968.
Yep 15 minutes of voice recording will take some ram.
Using my 1-bit system for good speech results i'd suggest 19531 bits/sec (20MHz PIC) which is going to be 19531 x 60 x 15 /8 which is 2.2Mbyte storage. That should be pretty cheap these days.
Part of the problem for recording speech with the PIC is not the encoding to 1-bit sound but getting decent microphonics and some good analogue dynamic compression. Easy in a studio, difficult for a cheap product in the field.
PS. My Picsound 1-bit sound encoder has been updated recently to windows version and there is some more stuff on my page;
http://www.romanblack.com
Using my 1-bit system for good speech results i'd suggest 19531 bits/sec (20MHz PIC) which is going to be 19531 x 60 x 15 /8 which is 2.2Mbyte storage. That should be pretty cheap these days.
Part of the problem for recording speech with the PIC is not the encoding to 1-bit sound but getting decent microphonics and some good analogue dynamic compression. Easy in a studio, difficult for a cheap product in the field.
PS. My Picsound 1-bit sound encoder has been updated recently to windows version and there is some more stuff on my page;
http://www.romanblack.com
Who is online
Users browsing this forum: No registered users and 63 guests