Programming a microchip to speak text

This is the place for any magazine-related discussions that don't fit in any of the column discussion boards below.
Post Reply
Don
Posts: 4
Joined: Sun Jul 30, 2006 6:28 pm
Contact:

Programming a microchip to speak text

Post by Don »

I am starting a project where I want at least 15 minutes or more time to record words. Should I use a IDS Sinngle Chip Voice Recorder playback device or should I use a PIC that has a program to do this task. I want to program sentences at different intervals. Don
User avatar
jwax
Posts: 2234
Joined: Mon Feb 09, 2004 1:01 am
Location: NY
Contact:

Post by jwax »

ISD chips are plug and play, but with limited editing features.
More versatility with a PIC, but then there's programming. If you know or want to know programming, the PIC is the way to go.
VIRAND
Posts: 88
Joined: Tue Aug 23, 2005 1:01 am
Location: New York
Contact:

Post by VIRAND »

Fifteen minutes? ISD gives around 15 seconds or 2 minutes last I looked.

How about using an old flash card, like a 16MB, and program a PIC to record on it?

Here is a PIC project that emulates an obsolete Text To Speech Synthesizer.
http://home.alphalink.com.au/~derekw/pictalker/main.htm
It can say anything but it sounds very robotic.
User avatar
jwax
Posts: 2234
Joined: Mon Feb 09, 2004 1:01 am
Location: NY
Contact:

Post by jwax »

Sambuchi
Posts: 366
Joined: Tue Jan 18, 2005 1:01 am
Location: Orlando FL
Contact:

Post by Sambuchi »

If you want to use a micro...

say the speech frequency is 2.5 kHz


The sampled analog voice signal is digitized by the integrated ADC peripheral of the micro.


then theres Nyquist Theorem...
http://zone.ni.com/reference/en-XX/help ... t_theorem/

The Nyquist theorem states that a signal must be sampled at least twice as fast as the highest frequency component of the signal to accurately reconstruct the waveform.


The sampling frequency of 5.5 KHz could work ..BUT ... this is the trade off.... Sample higher for better quality... Sample less for maximum duration of voice that can be stored in the flash memory.

You would then setup and ISR Interrupt Service Routine to sample and store data during a RECORD.

Say you have 60K room of flash and a 12 bit data stream you may get 6 sec of record time...


The formula for determining the size (in bytes)

(# seconds) * (# channels) * (sampling rate) * (bit resolution) / 8 = file size

(6sec*1*5.5k*12/8)=55,000 leaving some room for code...

15 min = 900 sec

12 bit res
(5.5k*900 sec*12/8)=7.42Meg
8 bit res
(5.5k*900 sec*8/8)=4.95Meg

thats just for one speaker...

you can try compressing the voice data with Mu-law and A-law
http://musik.ringofsaturn.com/compress.php


hope that helps!
VIRAND
Posts: 88
Joined: Tue Aug 23, 2005 1:01 am
Location: New York
Contact:

Post by VIRAND »

Having done a lot of speech synthesis I'd suggest 8-bit at 10Ks/s is a good trade off,
with 6.5 seconds per 64KB.
I think I tried A/Mu law on 4-bits to get 11 sec per 64KB but got better results by normalizing
8 bits and then dumping the lower 4.

Using the Roman Black method you get surprisingly good results with a minute per 64KB.
(at around 10k samples per second, compressed)
http://www.romanblack.com/picsound.htm
Newz2000
Posts: 507
Joined: Wed May 18, 2005 1:01 am
Location: Des Moines, Iowa, USA
Contact:

Post by Newz2000 »

Didn't Engineer1138 build the roman black BTc device?
http://www.romanblack.com/picsound.htm

That should be able to hold quite a bit more than a typical encoding solution won't it? He lists 17sec = 32KB which is a little under 2KB/sec. You could fit 18min in 2MB. Don't know where to find 2MB of flash though.
VIRAND
Posts: 88
Joined: Tue Aug 23, 2005 1:01 am
Location: New York
Contact:

Post by VIRAND »

well the cheapest PIC compatible digital camera memory flash cards are a lot more
than 2MB but there's no requirement to fill it all up.

2MB is probably the biggest size of DIP packaged flash memory I've seen (as a BIOS),
and everything after that is probably only made too small too solder by hand.
User avatar
philba
Posts: 2050
Joined: Tue Nov 30, 2004 1:01 am
Location: Seattle
Contact:

Post by philba »

if size is an issue, I'd look at the serial eeproms - 8 pin package. an 8 pin PIC could be used to drive it. add in an amplifier (LM386?) and you've got a very small package.
Colinr
Posts: 33
Joined: Wed Oct 23, 2002 1:01 am
Location: UK
Contact:

Post by Colinr »

you may want to consider MP3 players chips

have a look at the following

http://194.201.138.187/epages/Store.sto ... cts/SPE020

it is MP3 module that uses a compact flash card to hold the sound files and can be controled using a pic
VIRAND
Posts: 88
Joined: Tue Aug 23, 2005 1:01 am
Location: New York
Contact:

Post by VIRAND »

The problem with MP3 chips is how to record sound in MP3 format with a PIC?
PCM and PWM encodings are very simple. The MP3 encoding process is very complicated.

A playback-only system (PCM/PWM) would need only a binary counter and an EPROM, no PIC.
Gorgon
Posts: 325
Joined: Wed May 04, 2005 1:01 am
Location: Norway
Contact:

Post by Gorgon »

Hi,
Some years ago I was involved in a design of an alarm announcement system with a number of stored messages.

It was built around a speech co-processor from DSP Group, Inc, CT8016 and a codec from OKI, MSM7578. This setup could record and playback recorded speech with telephone quality.

It could also playback .wav type files from a PC, if in the correct format.

The great thing with this setup was the compression rate, resulting in a datasize/rate of 1kbyte/second. With an 1Mb memory you would get 17minutes of speech. With this slow data rate, almost any micro could do the job, together with some analogue interface op-amps.

TOK ;)
Gorgon the Caretaker - Character in a childrens TV-show from 1968. ;)
VIRAND
Posts: 88
Joined: Tue Aug 23, 2005 1:01 am
Location: New York
Contact:

Post by VIRAND »

I assume those chips are obsolete now.
Roman Black is easy and sounds good at 1KByte/sec.
Gorgon
Posts: 325
Joined: Wed May 04, 2005 1:01 am
Location: Norway
Contact:

Post by Gorgon »

The Roman Black will need a rather sharp filter to make the sound good at 1kB/s, since the 'silence' or pause sequences in the sound will have a ripple of 4kHz. This low max frequency will also increase the noise level. I suppose that is why 15625Hz is use to sample the sound, making nearly 2kB/s data rate.

I remember I did some test with a similar system (made in pure hardware) many years ago. The theory is good but I found that the sample frequency needed, was higher than usable for me at that time. (I tried using 8ksamples/s)
With a micro you can of course make decisions based on previous, and the next samples, and tweak the sound in that way.

One of the 'big' problems I found, was the startup of the sequence. To make a valid midpoint or neutral level, you need to ramp up the output to this level, and keep it there to have full positive and negative dynamic range. This problem is not so big, when you have a micro to work with, but it was very big when working with logic.

VIRAND
I'm sure your comment about obsolete parts is correct, ant least today in the RoHS world, but I think it may be possible to get some, if you really want to try it. ;)

TOK ;)
Gorgon the Caretaker - Character in a childrens TV-show from 1968. ;)
Blackman
Posts: 1
Joined: Fri Aug 25, 2006 7:54 am
Contact:

Post by Blackman »

Yep 15 minutes of voice recording will take some ram.

Using my 1-bit system for good speech results i'd suggest 19531 bits/sec (20MHz PIC) which is going to be 19531 x 60 x 15 /8 which is 2.2Mbyte storage. That should be pretty cheap these days.

Part of the problem for recording speech with the PIC is not the encoding to 1-bit sound but getting decent microphonics and some good analogue dynamic compression. Easy in a studio, difficult for a cheap product in the field.

PS. My Picsound 1-bit sound encoder has been updated recently to windows version and there is some more stuff on my page;
http://www.romanblack.com
Post Reply

Who is online

Users browsing this forum: Ahrefs [Bot] and 21 guests