Sei sulla pagina 1di 85

Text-to-Speech Text to Speech Synthesis Research @ MILE

AGRamakrishnan Professor MedicalIntelligence&LanguageEngineering(MILE)Lab DepartmentofElectricalEngineering,IISc,Bangalore.

Celebrating the Centenary of the Department !


MILELab,IISc,Bangalore 1/29/2012 1

PhilosophyofMILE
Researchrelevanttopeopleandlifearoundus. Do not download research topics data or code ! Donotdownloadresearchtopics,dataorcode! Commitmenttodeliversomethingusefulbyitselfthrowsup meaningfulresearchissues. Havingchosentoworkonanappliedarea,wedealwith everythingthatisrequiredtoreachthegoal. Allthedataweusehavebeencollectedbyus:Indiahasahuge populationandso,thereisnodearthforcreationofstandard databases. Researchislearning;andLearningisfun&itsownreward!

MILELab,IISc,Bangalore

1/29/2012

Whatwecreated:Vision2010
IndicLanguageReadingMachinesforPeoplewith VisualDisability(PWVD). AutomatedBookReader(ABR)forIndicscripts. d k d ( )f d AnyprintedmaterialinIndianlanguagesbecomes accessible document analysis & recognition accessible documentanalysis&recognition. Texttospeech(TTS)conversion. N d t d l ith bili Needtodealwithbilingual&trilingualtext l & t ili lt t scriptrecognitionatthewordlevel. Posters road signs menu card notice boards Posters,roadsigns,menucard,noticeboards Camerabaseddocumentanalysis&recognition. Coloured text printed on complex background. Colouredtextprintedoncomplexbackground. OnlineHandwritingRecognition(OHWR)
MILELab,IISc,Bangalore 1/29/2012 3

Wherearewe,today?
UsingourTamilOCR,WorthTrust,Chennaihasalready digitized200Tamilbooks(>30,000pages)andtheBraille booksarealreadybeingusedbyaround100PWVD. b k l d b i db d Inventionlabs,ChennaiwillbringoutTamilandKannada versionsofAvaz usingourTTSbyJuly2012.(Usedby versions of Avaz using our TTS by July 2012 (Used by childrenwithcerebralpalsy;awardfromPresident). TTS with SAPI to be used by National Association for Blind TTSwithSAPItobeusedbyNationalAssociationforBlind. Clinicaltrials p a ed plannedwithSt S JohnsHospital, Bangalore:
OHWR Speech
1/29/2012 4 MILELab,IISc,Bangalore

AcknowledgmentforourTTS
JVRama Partha RMuralishankar Ranjani HGShivaKumarHR

Lakshmish KPrathibha L k h i hK P hibh

Abhinava S Abhi SArun S i A Sriraman

Vikram LR Vik

Ajit Narayanan,CEOofInventionLabs,Chennai: YourTTSisthebestIndianlanguageTTSIhaveseensofar.

WebDemoofTirukkural&Vak(MILETTS): Web Demo of Tirukkural & Vak (MILE TTS): http:\\mile.ee.iisc.ernet.in\tts


MILELab,IISc,Bangalore 1/29/2012 5

WritingtoSpeechDevice
Laryngectomy

WorkingwithSt.JohnsHospital&Medical CollegetotestonPersonswithVocalDisability
PatentPending
MILELab,IISc,Bangalore 1/29/2012 6

OtherResearchatMILE
BrainMapsatMedicaidSystems HeartRateVariability(BiologicalCybernetics) Fetallungmaturityfromultrasound&3DMRIComp. FieldExtractionfromDocumentImages VitiligoQuantification CURRENT MachineListening MultilingualRecognition. AssessmentofDiabeticRetinopathy FUTURE Earlydetectionofretinaldiseases Tamil KannadaMachineTranslation
MILELab,IISc,Bangalore 1/29/2012 7

ApplicationsofTTS
Naturallanguageinterfaceforcomputers Digitalpersonalassistantwithtranslation Digital personal assistant with translation EMailreaderinlocallanguage Interactiontoolforphysicians Automatictelephonebasedenquirysystem Virtualteacher Automaticdocumentreadingmachines Automatic document reading machines InternetNewsChannels AccessibilityandreadingAidsfortheblind y g Communicationaidforcerebralpalsychildren Aidforpersonswithlaryngectomy
MILELab,IISc,Bangalore 1/29/2012 8

MethodEmployed

WaveformConcatenationBased W f C t ti B d Gooddatabaseofover1100separatelyspoken, phoneticallyrichsentences,segmentedasphones. phonetically rich sentences segmented as phones Phoneticequivalentoftext(wayitispronounced) l t d t f d d h C f ll Carefullyselectedsegmentsfromrecordedspeech Signalprocessingforsmoothconcatenation. Si l Signalprocessingfornaturalness. i f l Specialprovisions.

Issuestobeaddressed
PhoneticallyRichTextSelection Recordingfromagoodspeaker Recording from a good speaker Segmentation&Annotation TextNormalization G2PConversionandexceptions ProsodyPrediction UnitSelection Unit Selection PitchModification DurationModification DeterminingthePointofConcatenation SpokenLanguagevs.WrittenLanguage Indianlanguage,alongwithEnglishwords Indian language along with English words

MILELab,IISc,Bangalore

1/29/2012

11

RichTextSelection
OptimalCoverage allspeechunitsofthelanguage shouldbecoveredinthedatabase HugeTextCorpus PhoneticCorpus Generatethesupersetofallphones(phonemes),in p p (p ), allthecontexts. Searchthecorpusformin.#ofsentences=>Greedy algorithm. Addwordsorsentencestocomplete. Bilingual,ifnecessary.

TextSelectionAlgorithm
Greedyalgorithmisused. Sentencecoveringthehighestnumberofunitsis S t i th hi h t b f it i firstselectedandtakenoutofcorpus. Count(requiredminimumnumber)ofcovered C ( i d i i b ) f d unitsreducedaccordingly. Nextbestcoveringsentenceisselectedand removedfromthecorpus. Wordbeginning/ending&sentencebeginning/ endingcontextsareused. Alinguistcreatedwordstocovertherest.

RecordingaGoodVoice

Selectionofspeakerishalfthejob! p j Normalspeed,declarativestyle. Familiaritywiththetext. y Rerecordingwhennecessary. Fatigueofthespeakertobehandled. Availabilityofspeaker,forfutureadditions(whatwe added isolatedcharacters,etc. Mispronunciations textcorrection. MissingUnits SpokenLanguage|Issues.

SpeechDatabaseRecording

JayamKondan,ex AIRNewsReader Tamil;also,a Jayam Kondan, exAIR News Reader Tamil; also, a completeTamildrama,withandwithoutemotions. LaterEnglishandisolatedalphabetsrecorded. g p ProfessionalStudioRecording Manuallisteningandrerecording Manual listening and rerecording

Segmentation&Annotation

Labourintensive,needstrainedpeople. Labour intensive, needs trained people. Nocompletelyautomatedtechniqueyet. Levelsofsegmentation phone,diphone, Levels of segmentation phone diphone polyphone,demisyllable,syllable. Annotation silence,pause,phrasebreak,matching Annotation silence, pause, phrase break, matching texttospokenphones. Automatingit stillaresearchissue. g Databaseorganisationforunitselection.

Motivationforsubspacebased segmentation
Segmentation using Energy based method 0.6 /a/ 0.4 0.2 0 0.2 0.4 Speech signal /aka/ 0.6 0.8 (a) 0 0.05 0.1 0.15 0.2 0.25 Time (sec) 0.3 0.35 0.4 0.45 /k/ /a/

0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 (b) 0 0.05 0.1 0.15 0.2 0.25 Time (sec) 0.3 Speech signal /eyo/ /y/ Actual consonant (/y/) position 0.35 0.4 0.45 0.5 /e/ /y/ /o/

TextProcessing
Tokenization groupingtextintowords,sentences, Tokenization grouping text into words, sentences, utterances handleabbreviations,initials,etc. TextNormalization convertingnonstandardwords (numerals,abbreviations,acronyms,punctuation marks,etc.)intostandardwords. Identifypropernamesandforeignwords. Tagsforuppercaseletters,etc.

TextNormalization
The"textnormalization"componentconvertsanynon textinputintoaseriesofappropriatespokenwords. i i i f i k d PriceoflandisRs.2,66,60,000.Formoreinformation, call26660000. Firstoneis/irandu kodiye arubattu aaru latchattu arubadaayiram/ Secondoneis/irandu aaru aaru aaru suzhi suzhi suzhi / suzhi/

MILELab,IISc,Bangalore

1/29/2012

19

TextNormalization contd..
1)Isolateswordsinthetext 2)Integers,floatingpointnumbers,range,ratio, 2) Integers, floating point numbers, range, ratio, alphanumericstrings,times,dates,andothersymbolic representationsareconvertedintowords.Weneedto codetherulesfortheconversionofthesesymbolsinto words,sincetheydifferdependinguponthelanguage andcontext. and context 3)Abbreviationsareconvertedintowords.The normalizer usesadatabaseofabbreviationsandwhat li d t b f bb i ti d h t theyareexpandedto. 4)Acronymshavesufficientvowelstobepronounced 4) A h ffi i l b d Eg./manumozhi/
MILELab,IISc,Bangalore 1/29/2012 20

TN(contd..)
5) The normalizer will have rules dictating if the ) g punctuation causes a word to be spoken or if it is silent. e.g.: Periods at the end of sentences are not normally spoken, but a period in an Internet address is spoken as "dot. In S/W for visually challenged, periods are also voiced. voiced Once the text has been normalized and simplified into a series of words, it is passed onto the next module, i f d i i d h d l namely grapheme to phoneme converter.

MILELab,IISc,Bangalore

1/29/2012

21

Grapheme Phoneme Conversion


Lettertosoundandcontextualrules Useslookuptable(lexicon)forforeignwords, Uses look up table (lexicon) for foreign words foreignnames,etc. Intervocalick,T,t,pbecomeg,D,dandb.K,ch,T,t, Intervocalic k, T, t, p become g, D, d and b. K, ch, T, t, pafterhomorganicnasalsbecomeg,j,D,d,b. Eg.:pattam=pattam,patam=padam;pantam= pandam;manchaL=manjaL.=>rulebasedG2P. English:eg.put,but,use,utter=>Lexiconforwhole vocabulary. vocabulary

ProsodyPrediction
Thetermprosodyreferstocertainpropertiesofthe speechsignal,whicharerelatedtoaudiblechanges h i l hi h l d dibl h inpitch,loudness,syllablelength,intonation. E Diff Eg.Different/m/shaveverydifferentmeanings. t / / h diff t i Prosodicfeaturescreateasegmentationofthe speechchainintogroupsofsyllables(Isyllableis speech chain into groups of syllables (I syllable is stressed,etc.) They give rise to the grouping of words into larger Theygiverisetothegroupingofwordsintolarger chunks syntacticandphonologicalphrases.

PitchcontourofaYesNo question

/Vidiyaimadiyaalvellamudiyumaa/?

Pitchcontourofan Affirmativesentence

/ /Akash nalla paiyan./ p y /

Pitchcontourofan Exclamatorysentence

/avaninguvandaanaa/?

/avan nejammaa varaan/

ProsodyModeling

Precisedurationofeachphoneme/syllableandof Precise duration of each phoneme/syllable and of silences,aswellastheintonationtoapplyonthem needtobeobtainedfromthemodel. Theabovesteprequiresformalizingalotofphonetic orphonologicalknowledge,automaticallyacquired fromdatawithstatistical(machinelearning) methods.

MeasurableProsody parameters
Stress(syllablemeasure) (y ) F0peak PositiveandnegativeF0slopevalues g p PositiveandnegativeRMSEnergypeak Duration Segmentdurations Tone SentenceF0contour(individualsegments)

ProsodyModelsforIndian languages
Presentscenario Nocomputationallinguisticknowledgeormodelis No computational linguistic knowledge or model is availablefordevelopinganyIndianlanguageTTS. GoodqualityTTSarenotavailableforSouthIndian Good quality TTS are not available for South Indian languages. CommercialHindiTTSisalsoofpoorquality. p q y Littleornoresearchhasbeenconductedonprosody modelingforanyIndianlanguage. g y g g

Prosody interrogation
plain vs.intonated interrogative sentences 320 300 280 260 p itch in H z 240 220 200 180 160 140 0

0.2

0.4

0.6 0.8 time in sec

1.2

1.4

Pitchcontourofplainutteredandintonated interrogativesentence(noteyaxisstartvalue)

Prosody exclamation
plain vs.intonated interrogative sentences 500

450

400

p hinH itc z

350

300

250

200

150

0.2

0.4

0.6 0.8 time in sec

1.2

1.4

Pitch contours of a plainly uttered and intonated exclamatory sentence

Increaseinenergy&durationof intonatedoverplainutteredsentences
Typeof sentences Rangeof Increasein meanenergy mean energy (%) 10 10 28 5 35 Rangeof Average Increasein Increasein basalenergy basal energy mean mean (%) energy(%) 5 5 13 5 35 17.5 17 5 16.3 Average Increasein basal basal energy(%) 13.6 13 6 14.6

Interrogative I t ti Exclamatory

Typeofsentence Interrogative Exclamatory

Averageincreasein totalduration(%) 19.7 19 7 16.4

High speaking rate Highspeakingrate

reducedvowels. reduced vowels

UnitSelection
TargetUnit:Aunit,withdesired(predicted)acoustic, spectralandcontextualfeatures.Selectingthebest p g unitoftherequiredtypefromthedatabase. Targetcost:betweenthetargetunitanddatabase units. Join/concatenationcost:betweenthecandidateunit anditspredecessorandsuccessorunitsin and its predecessor and successor units in concatenation. Requiresefficientorganisationofthedatabaseand q g precomputingoffeatures.

UnitSelectionalgorithm Totalcost
Totalcostiscalculatedforeachsegmentunitineach Total cost is calculated for each segment unit in each candidateclusterforthespeechtobesynthesized.

UnitSelectionTechnique

Viterbi search path through the best sequence of candidate searchpaththroughthebestsequenceofcandidate units(thickline)

DCTbasedpitchsynchronouspitch modificationinthesourcedomain

Pitchismodifiedinthesourcedomain Linearpredictionanalysisofthepitchframe Inversefiltertogettheresidual(vocalcordsignal, containingthepitchinformation) ) ObtainDCToftheLPresidualvector. Padzeros/truncatetomodifythelengthofthe P d /t t t dif th l th f th vector(therefore,thepitchperiod) Energynormalization. Energy normalization IDCTgivespitchmodifiedresidual. Forwardfilter,usingthesameLPcoefficients. , g

PitchModificationinthe SourceDomainusingDCT
Pitch Sync. S Speech frame LP Residue

A(z)

N1 PointDCT

Pad zeros / Truncate

Normalization N1/N2

Pitch Modified frame

G/A(z)

Modified Residue

N2Point IDCT

BlockSchematicofPitchSynchronousPitchModificationSystem
MILELab,IISc,Bangalore 1/29/2012 38

PitchModificationResults
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 (f) 0 50 100 150 200 250 300 350 400 450 500 (e) 50 100 150 200 250 300 350 400 450 500 (d) 50 100 150 200 250 300 350 400 450 500 (c) 50 100 150 200 250 300 350 400 450 500 (b) 50 100 150 200 250 300 350 400 450 500 (a) 50 100 150 200 250 300 350 400 450 500

Pitchmodification

Original Speech

0.6 0.8 1.1 1.3 1.5 Pitch modified versions of the original speech with different factors

PitchModificationResults (samplesounds)
Pitch Modification Direct LPC Factor Modified LPC Modified WLPC

0.5 0.7 07 0.8 1.2 12 1.5 1.8 18 2.0


Original
MILELab,IISc,Bangalore 1/29/2012 41

Simpleemotionsynthesis
Emotional Speech Synthesis Emotional Signal Pitch Marking Max Pitch Period N1 Reference Signal

DCT

LP Analysis

Instantaneous Pitch

N1 ___ N2

Correction Factor

Emotional Speech

Pitch S Pit h Synchronous LP h Coefficient Forward Filtering G / A(z) IDCT

Block Schematic of Emotional Speech Synthesis


MILELab,IISc,Bangalore 1/29/2012 42

DemoofEmotionalsynthesisand Pitchmodification

original

synthesized

Timevaryingpitchmodification

300

0 0 Time (s) 0.995188 0.98375 1.01038 0.98025 1.00081

Synthesizedinterrogativesentenceusing timevaryingpitchmodification

TD PSOLA

Analysis:waveformisdecomposedintoasequenceofoverlapping fragmentsofspeech.Synthesis:Fragmentsofspeechrecombinedas desired,withpitchandtimescalemodification.Necessitates preliminaryPitchmarking

Pitchmodificationfactorsrequiredtoconvert maletofemale(1.4 maletofemale(1.4 1.6) maletochild(1.7 2.0) maletooldman svoice (0.5 0.7) (0 5 0 7) male to old mans voice Makingthefundamentalfrequencyhighby30% andshiftingtheformantsupby25%converts and shifting the formants up by 25% converts malevoicetofemalevoice

DurationModification

Changeofpitchimplieschangeofduration Change of pitch implies change of duration Changeofspeakingratenecessitatesittoo. Stressedsyllablehasincreasedduration. Stressed syllable has increased duration Syllableembeddedinalongwordhaslessduration. Addingorremovingwholepitchperiods Adding or removing whole pitch periods

Importanceofthe DurationalInformation

Variation in the duration of a word /SENDRAAN/ in different circumstances

OptimalPointof Concatenation
Concatenateatpointsofminimalenergy Concatenate at points of minimal energy Matchingthespectraloracousticcharacteristics InterpolationofLinearPredictioncoefficients Interpolation of Linear Prediction coefficients

Lexicons

ProperNames places,people,roads,etc. Proper Names places, people, roads, etc. CompoundWords Functionwords Function words Verbrootsandcommonnouns.

SpokenLanguageIssues

MultipleLanguagesmixed Multiple Languages mixed Distortionsinspokenwordsnecessitatesunits beyondthephonotacticconstraints y p Foreignwordsrequirenewphones.

ResearchinComputational Linguistics
SystematicallystudyingspokenTamiland documentingalltheadditionalphonesused. Identifyingandparsinglexicalandphonological phrases forinsertingpauses place&duration. Predictingtheemotionalcontentfromanalyzingthe text foremotionalspeechsynthesis. Durationmodificationofdifferentclassesofphones Duration modification of different classes of phones withchangesinspeakingrate forhearingimpaired. Translationoftechnicaltermsindifferentfieldsto a s at o o tec ca te s d e e t e ds to Tamil,incollaborationwithfieldexperts. StudyingprosodyinTamil pitch,durationand amplitudecontoursindifferenttypesofsentences.
MILELab,IISc,Bangalore 1/29/2012 53

ResearchinTechnology
Weneedprecisionautomatedsegmentation. MachineTranslationusingmachinelearning g g methods. Analyzingphonotactic exceptionsfromreallife spokenTamildatabase.

MILELab,IISc,Bangalore

1/29/2012

54

CorpusBuilding
TextNormalization foreignwords,spokenTamil, Studyingandmodelingnaturalvariationsinduration y g g ofsyllables,pauses,intonation,energycontour,etc. G2PforspokenTamizh differentdialects. Parallelcorpusformachinetranslation. Errorfreehugetextcorpuscoveringallfields. Segmentedandannotatedspeechcorpusfor automatedspeechrecognitionandlimiteddomain applications. applications

MILELab,IISc,Bangalore

1/29/2012

55

NeedforSegmentationin Concatenative SpeechSynthesis

ItisKnowledgedrivenspeechsynthesissystem Basicunits,whenconcatenated,needtomatchthe Basic units, when concatenated, need to match the predicteddurationoftheword. Basicunits:V,VC,CV,VCV,VCCVandVCCCV DurationModification:canonlybeperformedonthe vowelpartsofthebasicunits WeneedAutomatedsegmentation W d A d i Manualsegmentation:inconsistent&tedious.

Synthesized word /kamala/ 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 0

/kamala/
0.2 0.4 Time (Sec) Basicunit /ama/ Basicunit /ala/ 0.6 0.4 0.2 0 0.2 0.4

BasicUnit /ka/ 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 0 0.3 0.2 0.1 0 0.1 0.2

/ka/
0.1 0.2 Time (Sec)

0.3 0.4 0

/ama/
0.2 Time (Sec) 0.4

0.6 0.8 0

/ala/
0.2 Time (Sec) 0.4

Subspace based segmentation of consonants & vowels

Plosives,nasals,affricatesandfricativeshavea os es, asa s, a cates a d cat es a e a commonpropertyoflowenergycomparedtovowels, whereasglideshavecomparableenergy. Hence,energybasedsegmentationisineffective. TestfeaturevectorsprojectedontheVoweland Consonantsubspaces C t b V&CsubspacesarerepresentedbyGeneralized eigenvectorsobtainedfromthefeaturevectorsfrom eigenvectors obtained from the feature vectors from thetrainingset.

Fisher'sDiscriminant

ProjectionPlane1: P j ti Pl 1
Properprojection Leads to perfect Leadstoperfect classification

ProjectionPlane2:
Projectioninvolves overlap overlap Leadstoimproper Classification

Motivationforsubspacebased segmentation
Segmentation using Energy based method 0.6 /a/ 0.4 0.2 0 0.2 0.4 Speech signal /aka/ 0.6 0.8 (a) 0 0.05 0.1 0.15 0.2 0.25 Time (sec) 0.3 0.35 0.4 0.45 /k/ /a/

0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 (b) 0 0.05 0.1 0.15 0.2 0.25 Time (sec) 0.3 Speech signal /eyo/ /y/ Actual consonant (/y/) position 0.35 0.4 0.45 0.5 /e/ /y/ /o/

AccurateCVsegmentationisobtainedusing energybasedmethodfornoncoarticulatedbasic energy based method for non co articulated basic units Butnotforcoarticulatedphones. p Inourapproach: Wecollectensembleoffeature vectorsoflengthNcorrespondingtodifferent vowelsandobtainthevowelcovariancematrixCv l d bt i th l i ti C SimilarlyforconsonantsCc GeneralizedeigenvectorscorrespondingtoC &C Generalized eigenvectors corresponding to Cv& Cc areused EffectiveforCoarticulatedbasicunits

FeatureTransformation

FeatureslikeLPC,LPCCandMFCCmodel statisticalpropertiesofvowelsandconsonants Fromthepointofviewofbasicunitsegmentation VI:Vowelinformationinfeaturevectors(signal) CI:Consonantinformation(noise) Linearfeaturetransformation:aimsatfindinga subspace,ofthefeaturespacewithmaximumSNR b f th f t ith i SNR

FeatureTransformation(contd..) RepresentingVIandCIbytrainingvectorsobtained usingManualsegmentation using Manual segmentation DirectioninthefeaturespacewithmaxSNRobtained usingGEVdecompositionofCv andCc Covariance matricesoffeaturevectorsofV&C i ff f & LineartransformationmatrixW

) x =WTx dim ( x ) = n ) dim ( x ) = m m<n

FeatureTransformation(contd..)

Letd betrainingvectorscontainingVI Let dv be training vectors containing VI anddc betrainingvectorscontainingCI

Cv = E{(dv dv )(dv dv )T } Cc = E{(dc dc )(dc dc ) }


T

FindWsuchthatthevariancecausedbyVItoCIis Find W such that the variance caused by VI to CI is maximizedaftertransformation

FeatureTransformation(contd..)
Densityfunctionsofdvanddc areassumedtobe normallydistributed normally distributed Covariancematricesaftertransformation ) Cv = W T CvW ) T Cc = W CcW Measureofthevarianceorthescatteris determinantofthecovariancematrix Determinantisequaltotheproductofthe eigenvalues&hencetheproductofthevariancesin theprincipaldirections the principal directions

FeatureTransformation(contd..)

Criterionfunctiontobemaximized Criterion function to be maximized

) T Cv W C vW J (W ) = ) = T W C cW Cc

ColumnsofoptimalWareobtainedasGEVV Columns of optimal W are obtained as GEVV (generalizedeigenvectorsforvowels)tothe valueordered(largesttosmallest)eigenvaluesin

Cvi ( v ) = i Cci ( v )

FeatureTransformation(contd..)
C c i ( c ) = i C v i ( c ) Similarlyforconsonants,GEVC Similarly for consonants, GEVC Thus,thetransformationWdiagonalizesbothCv &Cc ThevarianceofVIalong is whileCIhasunit The variance of VI along i(v) is i while CI has unit varianceinalldirections UsingSNRmeasureintroducedinMalayathetal. for g y GEVV

trace(W CvW ) i =1 = = T trace(W CcW ) m

VowelConsonantSegmentation

GEVVsandGEVCsareobtained GEVVs and GEVCs are obtained Evaluatingnormcontours

NCv (k ) = NCc (k ) =

(i ( v ) )T xk
i =1 L

(i ( c ) )T xk
i =1

NCv and NCc are norm contours from V and C andNC arenormcontoursfromVandC subspaces Thesenorm contoursrepresentVIandCI These normcontours represent VI and CI

VowelConsonantSegmentation

Normcontourscrosseachother NCv (k ) = NCc (k ) Segmentationpoints L= Wefoundoptimumresultsfor3 Relativeimportanceofthedifferentfrequencybands forvowelsandconsonantsisconveyedbyfirstthree principalfilters VI:Midfrequencyregionofthespeechspectrum VI Mid f i f th h t CI:Low&Highfrequencyregions Speech&Speakerinformation[Vijayakrishna] S h&S k i f ti [Vij kih ]

PerformanceofVowelConsonant Segmentation

GEVVs&GEVCsareobtainedfromTamilspeech databasespokenbyamalevolunteer ForobtainingMFCC,theMelscalewassimulated usingasetof24triangularfilters ForLPCC,a12 F LPCC 12th orderLPCanalysiswasperformed d LPC l i f d afterpreemphasiswith =0.95 Segmentationtestswerecarriedoutonthebasic Segmentation tests were carried out on the basic unitsofKannadaspeechdatabasespokenbya femalevolunteer

PerformanceofVowelConsonant Segmentation
0.5 0 0.5 Speech Signal /eyo/ (a) 1 0 0.05 0.1 6 4 2 0 4000 (b) 0

0.15

0.2 0.25 Time(sec)

0.3

0.35

0.4

0.45

Norm

0.05

0.1

0.15

0.2 0.25 Time(sec)

0.3

0.35

0.4

0.45

Frequency (Hz) z)

3000 2000 1000 0 0 0.05 0.1 0.15 0.2 0.25 Time(sec) 0.3 0.35 0.4 0.45

Performanceof VowelConsonantSegmentation
0.4 0.2 0 0.2 0.4 6 /i/ 4 2 0 4000 (b) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 (a) 0 0.05 0.1 0.15 /y/ 0.2 0.25 /o/ 0.3 0.35

Speech signal /iyo/

Frequency

3000 2000 1000 0 0 0.05 0.1 0.15 Time 0.2 0.25 0.3 0.35

PerformanceofVowelConsonant Segmentation
0.4

Speech signal /aul1o/


0.2 0 0.2 0.4 8 /au/ 6 /l1/ /o/ (a) 0 0.1 0.2 0.3 0.4 Time (sec) 0.5 0.6 0.7

Norm

4 2 0 (b) 0 0.1 0.2 0.3 0.4 Time (sec) 0.5 0.6 0.7

4000

Frequency (Hz) z)

3000 2000 1000 0 0 0.1 0.2 0.3 0.4 Time (sec) 0.5 0.6 0.7

PerformanceofVowelConsonant Segmentation
0.4 Speech Signal /auyi/ 0.2 0 0.2 0.4 8 6 /au/ /y/ /i/ (a) 0 0.1 0.2 0.3 Time(sec) 0.4 0.5 0.6 0.7

Norm

4 2 0 (b) 0 0.1 0.2 0.3 Time(sec) 0.4 0.5 0.6 0.7

8000

Frequency

6000 4000 2000 0 0 0.1 0.2 0.3 Time 0.4 0.5 0.6 0.7

Publications
R. Muralishankar, A. G. Ramakrishnan, Modification of Pitch using DCT in the Source Domain, Speech Communication, 2005. R. R Muralishankar and A G Ramakrishnan Discrete Cosine Transformed A. G. Ramakrishnan, Discrete Cepstrum, International Journal of Speech Technology, 2002. R.MuralishankarandA.G.Ramakrishnan,DCTbasedpseudocomplex cepstrum Proc ICASSP 2002 Orlando Florida May 13 17,2002. cepstrum,Proc.ICASSP2002,Orlando,Florida,May13 17 2002 R.MuraliShankarandAGR, DCTbasedPitchModification,Proc. SPCOM01,IISc,Bangalore,July1518,2001. JayavardhanaRamaandAGR,Thirukkural:atexttospeechsynthesis system,TamilInternet2001,KualaLumpur,Aug2628,2001. R.MuralishankarandAGR,NaturalisingtheTamilsynthesizer,Tamil Internet2001,KualaLumpur,August2628,2001.

Publications contd.

K G Aparna, G L Jayavardhana Rama and A. G. Ramakrishnan, Machine reading of Tamil books an aid for the blind, Proc. International Conf. on Biomedical Engg., Bangalore, Dec. 2124, 2001. K.SureshandAGR,"ADCTbasedEstimationofPitch",Proc.Intern. , , Conf.MultimediaProc.Systems,Chennai,Aug.1315,2000. R.Murali ShankarandAGR,"RobustPitchdetectionusingDCTbased SpectralAutocorrelation ,Proc.Intern.Conf.MultimediaProcessing Spectral Autocorrelation", Proc. Intern. Conf. Multimedia Processing andSystems,Chennai,Aug.1315,2000. RMuraliShankar andAGR,"SynthesisofSpeechwithEmotions,Proc. Intern.Conf.Commn.Comp.Devices,KGP,Dec.1416,2000. Intern Conf Commn Comp Devices KGP Dec 1416 2000

Acknowledgement

MinistryofSocialJustice&Empowerment DepartmentofInformationTechnology, GovernmentofIndia. KarnatakaStateCouncilforScienceand Technology. Technology TamilSoftwareDevelopmentFund LDCILandCIIL THEDISTINGUISHEDAUDIENCE THE DISTINGUISHED AUDIENCE

BlockdiagramofTTS

Input/Output of Thirukkural/Vaachaka

Input Input
Textinputthroughmultiplekeyboards Printedtextthathasundergoneoptical Printed text that has undergone optical characterrecognition ExistingUnicodefilesorfromwebsites Output Intelligible,naturalTamil/Kannadaspeech

Textanalysis
Offline Recordingbasicunits g Observationofdurationinnaturalspeech CIILbookonphoneduration Online Parsing Graphemetophonemeconversion Applyingdurationrules

SpeechSignalProcessing

Consonantvowelsegmentation Pitch detection and marking Pitchdetectionandmarking Concatenation(Pitch,amplitudeand ) durationmodification)

ConsonantVowel Segmentation

Energy based segmentation Energybasedsegmentation


FailsforcoarticulatedCVsuchas/yi/

LPCepstrumbasedsegmentation

PitchDetectionandMarking

PitchDetection
DCTbasedSpectralautocorrelation DCT b d S t l t l ti

PitchMarking
Markedatzerocrossings Marked at zero crossings

Featuresofthesoftware

AcceptableQualitymalevoice p y Textinputusinganykeyboardinterfaceor existingUnicodefile existing Unicode file DisplaystextinTamil/Kannada A Acceptablynaturalandintelligible. t bl t l d i t lli ibl

Scopeforfurtherwork

Comprehensive notmissingconsonantclusters Comprehensive not missing consonant clusters Naturalprosody Simulatingdifferentcharacteristicsofthespeaker Simulating different characteristics of the speaker Emotionscouldbeadded Provisionforalienwords,English Provision for alien words English

Potrebbero piacerti anche