Page 1 of 1

Programming a microchip to speak text

Posted: Mon Jul 31, 2006 6:27 pm
by Don
I am starting a project where I want at least 15 minutes or more time to record words. Should I use a IDS Sinngle Chip Voice Recorder playback device or should I use a PIC that has a program to do this task. I want to program sentences at different intervals. Don

Posted: Mon Jul 31, 2006 7:12 pm
by jwax
ISD chips are plug and play, but with limited editing features.
More versatility with a PIC, but then there's programming. If you know or want to know programming, the PIC is the way to go.

Posted: Tue Aug 01, 2006 2:53 am
by VIRAND
Fifteen minutes? ISD gives around 15 seconds or 2 minutes last I looked.

How about using an old flash card, like a 16MB, and program a PIC to record on it?

Here is a PIC project that emulates an obsolete Text To Speech Synthesizer.
http://home.alphalink.com.au/~derekw/pictalker/main.htm
It can say anything but it sounds very robotic.

Posted: Tue Aug 01, 2006 3:44 am
by jwax

Posted: Tue Aug 01, 2006 7:54 am
by Sambuchi
If you want to use a micro...

say the speech frequency is 2.5 kHz


The sampled analog voice signal is digitized by the integrated ADC peripheral of the micro.


then theres Nyquist Theorem...
http://zone.ni.com/reference/en-XX/help ... t_theorem/

The Nyquist theorem states that a signal must be sampled at least twice as fast as the highest frequency component of the signal to accurately reconstruct the waveform.


The sampling frequency of 5.5 KHz could work ..BUT ... this is the trade off.... Sample higher for better quality... Sample less for maximum duration of voice that can be stored in the flash memory.

You would then setup and ISR Interrupt Service Routine to sample and store data during a RECORD.

Say you have 60K room of flash and a 12 bit data stream you may get 6 sec of record time...


The formula for determining the size (in bytes)

(# seconds) * (# channels) * (sampling rate) * (bit resolution) / 8 = file size

(6sec*1*5.5k*12/8)=55,000 leaving some room for code...

15 min = 900 sec

12 bit res
(5.5k*900 sec*12/8)=7.42Meg
8 bit res
(5.5k*900 sec*8/8)=4.95Meg

thats just for one speaker...

you can try compressing the voice data with Mu-law and A-law
http://musik.ringofsaturn.com/compress.php


hope that helps!

Posted: Tue Aug 01, 2006 10:14 am
by VIRAND
Having done a lot of speech synthesis I'd suggest 8-bit at 10Ks/s is a good trade off,
with 6.5 seconds per 64KB.
I think I tried A/Mu law on 4-bits to get 11 sec per 64KB but got better results by normalizing
8 bits and then dumping the lower 4.

Using the Roman Black method you get surprisingly good results with a minute per 64KB.
(at around 10k samples per second, compressed)
http://www.romanblack.com/picsound.htm

Posted: Tue Aug 01, 2006 11:04 am
by Newz2000
Didn't Engineer1138 build the roman black BTc device?
http://www.romanblack.com/picsound.htm

That should be able to hold quite a bit more than a typical encoding solution won't it? He lists 17sec = 32KB which is a little under 2KB/sec. You could fit 18min in 2MB. Don't know where to find 2MB of flash though.

Posted: Tue Aug 01, 2006 12:41 pm
by VIRAND
well the cheapest PIC compatible digital camera memory flash cards are a lot more
than 2MB but there's no requirement to fill it all up.

2MB is probably the biggest size of DIP packaged flash memory I've seen (as a BIOS),
and everything after that is probably only made too small too solder by hand.

Posted: Tue Aug 01, 2006 2:15 pm
by philba
if size is an issue, I'd look at the serial eeproms - 8 pin package. an 8 pin PIC could be used to drive it. add in an amplifier (LM386?) and you've got a very small package.

Posted: Wed Aug 02, 2006 5:36 am
by Colinr
you may want to consider MP3 players chips

have a look at the following

http://194.201.138.187/epages/Store.sto ... cts/SPE020

it is MP3 module that uses a compact flash card to hold the sound files and can be controled using a pic

Posted: Thu Aug 03, 2006 1:00 pm
by VIRAND
The problem with MP3 chips is how to record sound in MP3 format with a PIC?
PCM and PWM encodings are very simple. The MP3 encoding process is very complicated.

A playback-only system (PCM/PWM) would need only a binary counter and an EPROM, no PIC.

Posted: Thu Aug 10, 2006 3:58 am
by Gorgon
Hi,
Some years ago I was involved in a design of an alarm announcement system with a number of stored messages.

It was built around a speech co-processor from DSP Group, Inc, CT8016 and a codec from OKI, MSM7578. This setup could record and playback recorded speech with telephone quality.

It could also playback .wav type files from a PC, if in the correct format.

The great thing with this setup was the compression rate, resulting in a datasize/rate of 1kbyte/second. With an 1Mb memory you would get 17minutes of speech. With this slow data rate, almost any micro could do the job, together with some analogue interface op-amps.

TOK ;)

Posted: Sat Aug 12, 2006 2:46 am
by VIRAND
I assume those chips are obsolete now.
Roman Black is easy and sounds good at 1KByte/sec.

Posted: Sat Aug 12, 2006 10:46 am
by Gorgon
The Roman Black will need a rather sharp filter to make the sound good at 1kB/s, since the 'silence' or pause sequences in the sound will have a ripple of 4kHz. This low max frequency will also increase the noise level. I suppose that is why 15625Hz is use to sample the sound, making nearly 2kB/s data rate.

I remember I did some test with a similar system (made in pure hardware) many years ago. The theory is good but I found that the sample frequency needed, was higher than usable for me at that time. (I tried using 8ksamples/s)
With a micro you can of course make decisions based on previous, and the next samples, and tweak the sound in that way.

One of the 'big' problems I found, was the startup of the sequence. To make a valid midpoint or neutral level, you need to ramp up the output to this level, and keep it there to have full positive and negative dynamic range. This problem is not so big, when you have a micro to work with, but it was very big when working with logic.

VIRAND
I'm sure your comment about obsolete parts is correct, ant least today in the RoHS world, but I think it may be possible to get some, if you really want to try it. ;)

TOK ;)

Posted: Fri Aug 25, 2006 8:06 am
by Blackman
Yep 15 minutes of voice recording will take some ram.

Using my 1-bit system for good speech results i'd suggest 19531 bits/sec (20MHz PIC) which is going to be 19531 x 60 x 15 /8 which is 2.2Mbyte storage. That should be pretty cheap these days.

Part of the problem for recording speech with the PIC is not the encoding to 1-bit sound but getting decent microphonics and some good analogue dynamic compression. Easy in a studio, difficult for a cheap product in the field.

PS. My Picsound 1-bit sound encoder has been updated recently to windows version and there is some more stuff on my page;
http://www.romanblack.com