Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
PhilosophyofMILE
Researchrelevanttopeopleandlifearoundus. Do not download research topics data or code ! Donotdownloadresearchtopics,dataorcode! Commitmenttodeliversomethingusefulbyitselfthrowsup meaningfulresearchissues. Havingchosentoworkonanappliedarea,wedealwith everythingthatisrequiredtoreachthegoal. Allthedataweusehavebeencollectedbyus:Indiahasahuge populationandso,thereisnodearthforcreationofstandard databases. Researchislearning;andLearningisfun&itsownreward!
MILELab,IISc,Bangalore
1/29/2012
Whatwecreated:Vision2010
IndicLanguageReadingMachinesforPeoplewith VisualDisability(PWVD). AutomatedBookReader(ABR)forIndicscripts. d k d ( )f d AnyprintedmaterialinIndianlanguagesbecomes accessible document analysis & recognition accessible documentanalysis&recognition. Texttospeech(TTS)conversion. N d t d l ith bili Needtodealwithbilingual&trilingualtext l & t ili lt t scriptrecognitionatthewordlevel. Posters road signs menu card notice boards Posters,roadsigns,menucard,noticeboards Camerabaseddocumentanalysis&recognition. Coloured text printed on complex background. Colouredtextprintedoncomplexbackground. OnlineHandwritingRecognition(OHWR)
MILELab,IISc,Bangalore 1/29/2012 3
Wherearewe,today?
UsingourTamilOCR,WorthTrust,Chennaihasalready digitized200Tamilbooks(>30,000pages)andtheBraille booksarealreadybeingusedbyaround100PWVD. b k l d b i db d Inventionlabs,ChennaiwillbringoutTamilandKannada versionsofAvaz usingourTTSbyJuly2012.(Usedby versions of Avaz using our TTS by July 2012 (Used by childrenwithcerebralpalsy;awardfromPresident). TTS with SAPI to be used by National Association for Blind TTSwithSAPItobeusedbyNationalAssociationforBlind. Clinicaltrials p a ed plannedwithSt S JohnsHospital, Bangalore:
OHWR Speech
1/29/2012 4 MILELab,IISc,Bangalore
AcknowledgmentforourTTS
JVRama Partha RMuralishankar Ranjani HGShivaKumarHR
Vikram LR Vik
WritingtoSpeechDevice
Laryngectomy
WorkingwithSt.JohnsHospital&Medical CollegetotestonPersonswithVocalDisability
PatentPending
MILELab,IISc,Bangalore 1/29/2012 6
OtherResearchatMILE
BrainMapsatMedicaidSystems HeartRateVariability(BiologicalCybernetics) Fetallungmaturityfromultrasound&3DMRIComp. FieldExtractionfromDocumentImages VitiligoQuantification CURRENT MachineListening MultilingualRecognition. AssessmentofDiabeticRetinopathy FUTURE Earlydetectionofretinaldiseases Tamil KannadaMachineTranslation
MILELab,IISc,Bangalore 1/29/2012 7
ApplicationsofTTS
Naturallanguageinterfaceforcomputers Digitalpersonalassistantwithtranslation Digital personal assistant with translation EMailreaderinlocallanguage Interactiontoolforphysicians Automatictelephonebasedenquirysystem Virtualteacher Automaticdocumentreadingmachines Automatic document reading machines InternetNewsChannels AccessibilityandreadingAidsfortheblind y g Communicationaidforcerebralpalsychildren Aidforpersonswithlaryngectomy
MILELab,IISc,Bangalore 1/29/2012 8
MethodEmployed
WaveformConcatenationBased W f C t ti B d Gooddatabaseofover1100separatelyspoken, phoneticallyrichsentences,segmentedasphones. phonetically rich sentences segmented as phones Phoneticequivalentoftext(wayitispronounced) l t d t f d d h C f ll Carefullyselectedsegmentsfromrecordedspeech Signalprocessingforsmoothconcatenation. Si l Signalprocessingfornaturalness. i f l Specialprovisions.
Issuestobeaddressed
PhoneticallyRichTextSelection Recordingfromagoodspeaker Recording from a good speaker Segmentation&Annotation TextNormalization G2PConversionandexceptions ProsodyPrediction UnitSelection Unit Selection PitchModification DurationModification DeterminingthePointofConcatenation SpokenLanguagevs.WrittenLanguage Indianlanguage,alongwithEnglishwords Indian language along with English words
MILELab,IISc,Bangalore
1/29/2012
11
RichTextSelection
OptimalCoverage allspeechunitsofthelanguage shouldbecoveredinthedatabase HugeTextCorpus PhoneticCorpus Generatethesupersetofallphones(phonemes),in p p (p ), allthecontexts. Searchthecorpusformin.#ofsentences=>Greedy algorithm. Addwordsorsentencestocomplete. Bilingual,ifnecessary.
TextSelectionAlgorithm
Greedyalgorithmisused. Sentencecoveringthehighestnumberofunitsis S t i th hi h t b f it i firstselectedandtakenoutofcorpus. Count(requiredminimumnumber)ofcovered C ( i d i i b ) f d unitsreducedaccordingly. Nextbestcoveringsentenceisselectedand removedfromthecorpus. Wordbeginning/ending&sentencebeginning/ endingcontextsareused. Alinguistcreatedwordstocovertherest.
RecordingaGoodVoice
Selectionofspeakerishalfthejob! p j Normalspeed,declarativestyle. Familiaritywiththetext. y Rerecordingwhennecessary. Fatigueofthespeakertobehandled. Availabilityofspeaker,forfutureadditions(whatwe added isolatedcharacters,etc. Mispronunciations textcorrection. MissingUnits SpokenLanguage|Issues.
SpeechDatabaseRecording
JayamKondan,ex AIRNewsReader Tamil;also,a Jayam Kondan, exAIR News Reader Tamil; also, a completeTamildrama,withandwithoutemotions. LaterEnglishandisolatedalphabetsrecorded. g p ProfessionalStudioRecording Manuallisteningandrerecording Manual listening and rerecording
Segmentation&Annotation
Labourintensive,needstrainedpeople. Labour intensive, needs trained people. Nocompletelyautomatedtechniqueyet. Levelsofsegmentation phone,diphone, Levels of segmentation phone diphone polyphone,demisyllable,syllable. Annotation silence,pause,phrasebreak,matching Annotation silence, pause, phrase break, matching texttospokenphones. Automatingit stillaresearchissue. g Databaseorganisationforunitselection.
Motivationforsubspacebased segmentation
Segmentation using Energy based method 0.6 /a/ 0.4 0.2 0 0.2 0.4 Speech signal /aka/ 0.6 0.8 (a) 0 0.05 0.1 0.15 0.2 0.25 Time (sec) 0.3 0.35 0.4 0.45 /k/ /a/
0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 (b) 0 0.05 0.1 0.15 0.2 0.25 Time (sec) 0.3 Speech signal /eyo/ /y/ Actual consonant (/y/) position 0.35 0.4 0.45 0.5 /e/ /y/ /o/
TextProcessing
Tokenization groupingtextintowords,sentences, Tokenization grouping text into words, sentences, utterances handleabbreviations,initials,etc. TextNormalization convertingnonstandardwords (numerals,abbreviations,acronyms,punctuation marks,etc.)intostandardwords. Identifypropernamesandforeignwords. Tagsforuppercaseletters,etc.
TextNormalization
The"textnormalization"componentconvertsanynon textinputintoaseriesofappropriatespokenwords. i i i f i k d PriceoflandisRs.2,66,60,000.Formoreinformation, call26660000. Firstoneis/irandu kodiye arubattu aaru latchattu arubadaayiram/ Secondoneis/irandu aaru aaru aaru suzhi suzhi suzhi / suzhi/
MILELab,IISc,Bangalore
1/29/2012
19
TextNormalization contd..
1)Isolateswordsinthetext 2)Integers,floatingpointnumbers,range,ratio, 2) Integers, floating point numbers, range, ratio, alphanumericstrings,times,dates,andothersymbolic representationsareconvertedintowords.Weneedto codetherulesfortheconversionofthesesymbolsinto words,sincetheydifferdependinguponthelanguage andcontext. and context 3)Abbreviationsareconvertedintowords.The normalizer usesadatabaseofabbreviationsandwhat li d t b f bb i ti d h t theyareexpandedto. 4)Acronymshavesufficientvowelstobepronounced 4) A h ffi i l b d Eg./manumozhi/
MILELab,IISc,Bangalore 1/29/2012 20
TN(contd..)
5) The normalizer will have rules dictating if the ) g punctuation causes a word to be spoken or if it is silent. e.g.: Periods at the end of sentences are not normally spoken, but a period in an Internet address is spoken as "dot. In S/W for visually challenged, periods are also voiced. voiced Once the text has been normalized and simplified into a series of words, it is passed onto the next module, i f d i i d h d l namely grapheme to phoneme converter.
MILELab,IISc,Bangalore
1/29/2012
21
ProsodyPrediction
Thetermprosodyreferstocertainpropertiesofthe speechsignal,whicharerelatedtoaudiblechanges h i l hi h l d dibl h inpitch,loudness,syllablelength,intonation. E Diff Eg.Different/m/shaveverydifferentmeanings. t / / h diff t i Prosodicfeaturescreateasegmentationofthe speechchainintogroupsofsyllables(Isyllableis speech chain into groups of syllables (I syllable is stressed,etc.) They give rise to the grouping of words into larger Theygiverisetothegroupingofwordsintolarger chunks syntacticandphonologicalphrases.
PitchcontourofaYesNo question
/Vidiyaimadiyaalvellamudiyumaa/?
Pitchcontourofan Affirmativesentence
Pitchcontourofan Exclamatorysentence
/avaninguvandaanaa/?
ProsodyModeling
Precisedurationofeachphoneme/syllableandof Precise duration of each phoneme/syllable and of silences,aswellastheintonationtoapplyonthem needtobeobtainedfromthemodel. Theabovesteprequiresformalizingalotofphonetic orphonologicalknowledge,automaticallyacquired fromdatawithstatistical(machinelearning) methods.
MeasurableProsody parameters
Stress(syllablemeasure) (y ) F0peak PositiveandnegativeF0slopevalues g p PositiveandnegativeRMSEnergypeak Duration Segmentdurations Tone SentenceF0contour(individualsegments)
ProsodyModelsforIndian languages
Presentscenario Nocomputationallinguisticknowledgeormodelis No computational linguistic knowledge or model is availablefordevelopinganyIndianlanguageTTS. GoodqualityTTSarenotavailableforSouthIndian Good quality TTS are not available for South Indian languages. CommercialHindiTTSisalsoofpoorquality. p q y Littleornoresearchhasbeenconductedonprosody modelingforanyIndianlanguage. g y g g
Prosody interrogation
plain vs.intonated interrogative sentences 320 300 280 260 p itch in H z 240 220 200 180 160 140 0
0.2
0.4
1.2
1.4
Pitchcontourofplainutteredandintonated interrogativesentence(noteyaxisstartvalue)
Prosody exclamation
plain vs.intonated interrogative sentences 500
450
400
p hinH itc z
350
300
250
200
150
0.2
0.4
1.2
1.4
Increaseinenergy&durationof intonatedoverplainutteredsentences
Typeof sentences Rangeof Increasein meanenergy mean energy (%) 10 10 28 5 35 Rangeof Average Increasein Increasein basalenergy basal energy mean mean (%) energy(%) 5 5 13 5 35 17.5 17 5 16.3 Average Increasein basal basal energy(%) 13.6 13 6 14.6
Interrogative I t ti Exclamatory
UnitSelection
TargetUnit:Aunit,withdesired(predicted)acoustic, spectralandcontextualfeatures.Selectingthebest p g unitoftherequiredtypefromthedatabase. Targetcost:betweenthetargetunitanddatabase units. Join/concatenationcost:betweenthecandidateunit anditspredecessorandsuccessorunitsin and its predecessor and successor units in concatenation. Requiresefficientorganisationofthedatabaseand q g precomputingoffeatures.
UnitSelectionalgorithm Totalcost
Totalcostiscalculatedforeachsegmentunitineach Total cost is calculated for each segment unit in each candidateclusterforthespeechtobesynthesized.
UnitSelectionTechnique
Viterbi search path through the best sequence of candidate searchpaththroughthebestsequenceofcandidate units(thickline)
DCTbasedpitchsynchronouspitch modificationinthesourcedomain
Pitchismodifiedinthesourcedomain Linearpredictionanalysisofthepitchframe Inversefiltertogettheresidual(vocalcordsignal, containingthepitchinformation) ) ObtainDCToftheLPresidualvector. Padzeros/truncatetomodifythelengthofthe P d /t t t dif th l th f th vector(therefore,thepitchperiod) Energynormalization. Energy normalization IDCTgivespitchmodifiedresidual. Forwardfilter,usingthesameLPcoefficients. , g
PitchModificationinthe SourceDomainusingDCT
Pitch Sync. S Speech frame LP Residue
A(z)
N1 PointDCT
Normalization N1/N2
G/A(z)
Modified Residue
N2Point IDCT
BlockSchematicofPitchSynchronousPitchModificationSystem
MILELab,IISc,Bangalore 1/29/2012 38
PitchModificationResults
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 (f) 0 50 100 150 200 250 300 350 400 450 500 (e) 50 100 150 200 250 300 350 400 450 500 (d) 50 100 150 200 250 300 350 400 450 500 (c) 50 100 150 200 250 300 350 400 450 500 (b) 50 100 150 200 250 300 350 400 450 500 (a) 50 100 150 200 250 300 350 400 450 500
Pitchmodification
Original Speech
0.6 0.8 1.1 1.3 1.5 Pitch modified versions of the original speech with different factors
PitchModificationResults (samplesounds)
Pitch Modification Direct LPC Factor Modified LPC Modified WLPC
Simpleemotionsynthesis
Emotional Speech Synthesis Emotional Signal Pitch Marking Max Pitch Period N1 Reference Signal
DCT
LP Analysis
Instantaneous Pitch
N1 ___ N2
Correction Factor
Emotional Speech
DemoofEmotionalsynthesisand Pitchmodification
original
synthesized
Timevaryingpitchmodification
300
Synthesizedinterrogativesentenceusing timevaryingpitchmodification
TD PSOLA
Pitchmodificationfactorsrequiredtoconvert maletofemale(1.4 maletofemale(1.4 1.6) maletochild(1.7 2.0) maletooldman svoice (0.5 0.7) (0 5 0 7) male to old mans voice Makingthefundamentalfrequencyhighby30% andshiftingtheformantsupby25%converts and shifting the formants up by 25% converts malevoicetofemalevoice
DurationModification
Changeofpitchimplieschangeofduration Change of pitch implies change of duration Changeofspeakingratenecessitatesittoo. Stressedsyllablehasincreasedduration. Stressed syllable has increased duration Syllableembeddedinalongwordhaslessduration. Addingorremovingwholepitchperiods Adding or removing whole pitch periods
Importanceofthe DurationalInformation
OptimalPointof Concatenation
Concatenateatpointsofminimalenergy Concatenate at points of minimal energy Matchingthespectraloracousticcharacteristics InterpolationofLinearPredictioncoefficients Interpolation of Linear Prediction coefficients
Lexicons
ProperNames places,people,roads,etc. Proper Names places, people, roads, etc. CompoundWords Functionwords Function words Verbrootsandcommonnouns.
SpokenLanguageIssues
ResearchinComputational Linguistics
SystematicallystudyingspokenTamiland documentingalltheadditionalphonesused. Identifyingandparsinglexicalandphonological phrases forinsertingpauses place&duration. Predictingtheemotionalcontentfromanalyzingthe text foremotionalspeechsynthesis. Durationmodificationofdifferentclassesofphones Duration modification of different classes of phones withchangesinspeakingrate forhearingimpaired. Translationoftechnicaltermsindifferentfieldsto a s at o o tec ca te s d e e t e ds to Tamil,incollaborationwithfieldexperts. StudyingprosodyinTamil pitch,durationand amplitudecontoursindifferenttypesofsentences.
MILELab,IISc,Bangalore 1/29/2012 53
ResearchinTechnology
Weneedprecisionautomatedsegmentation. MachineTranslationusingmachinelearning g g methods. Analyzingphonotactic exceptionsfromreallife spokenTamildatabase.
MILELab,IISc,Bangalore
1/29/2012
54
CorpusBuilding
TextNormalization foreignwords,spokenTamil, Studyingandmodelingnaturalvariationsinduration y g g ofsyllables,pauses,intonation,energycontour,etc. G2PforspokenTamizh differentdialects. Parallelcorpusformachinetranslation. Errorfreehugetextcorpuscoveringallfields. Segmentedandannotatedspeechcorpusfor automatedspeechrecognitionandlimiteddomain applications. applications
MILELab,IISc,Bangalore
1/29/2012
55
ItisKnowledgedrivenspeechsynthesissystem Basicunits,whenconcatenated,needtomatchthe Basic units, when concatenated, need to match the predicteddurationoftheword. Basicunits:V,VC,CV,VCV,VCCVandVCCCV DurationModification:canonlybeperformedonthe vowelpartsofthebasicunits WeneedAutomatedsegmentation W d A d i Manualsegmentation:inconsistent&tedious.
Synthesized word /kamala/ 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 0
/kamala/
0.2 0.4 Time (Sec) Basicunit /ama/ Basicunit /ala/ 0.6 0.4 0.2 0 0.2 0.4
BasicUnit /ka/ 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 0 0.3 0.2 0.1 0 0.1 0.2
/ka/
0.1 0.2 Time (Sec)
0.3 0.4 0
/ama/
0.2 Time (Sec) 0.4
0.6 0.8 0
/ala/
0.2 Time (Sec) 0.4
Plosives,nasals,affricatesandfricativeshavea os es, asa s, a cates a d cat es a e a commonpropertyoflowenergycomparedtovowels, whereasglideshavecomparableenergy. Hence,energybasedsegmentationisineffective. TestfeaturevectorsprojectedontheVoweland Consonantsubspaces C t b V&CsubspacesarerepresentedbyGeneralized eigenvectorsobtainedfromthefeaturevectorsfrom eigenvectors obtained from the feature vectors from thetrainingset.
Fisher'sDiscriminant
ProjectionPlane1: P j ti Pl 1
Properprojection Leads to perfect Leadstoperfect classification
ProjectionPlane2:
Projectioninvolves overlap overlap Leadstoimproper Classification
Motivationforsubspacebased segmentation
Segmentation using Energy based method 0.6 /a/ 0.4 0.2 0 0.2 0.4 Speech signal /aka/ 0.6 0.8 (a) 0 0.05 0.1 0.15 0.2 0.25 Time (sec) 0.3 0.35 0.4 0.45 /k/ /a/
0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 (b) 0 0.05 0.1 0.15 0.2 0.25 Time (sec) 0.3 Speech signal /eyo/ /y/ Actual consonant (/y/) position 0.35 0.4 0.45 0.5 /e/ /y/ /o/
AccurateCVsegmentationisobtainedusing energybasedmethodfornoncoarticulatedbasic energy based method for non co articulated basic units Butnotforcoarticulatedphones. p Inourapproach: Wecollectensembleoffeature vectorsoflengthNcorrespondingtodifferent vowelsandobtainthevowelcovariancematrixCv l d bt i th l i ti C SimilarlyforconsonantsCc GeneralizedeigenvectorscorrespondingtoC &C Generalized eigenvectors corresponding to Cv& Cc areused EffectiveforCoarticulatedbasicunits
FeatureTransformation
FeatureTransformation(contd..) RepresentingVIandCIbytrainingvectorsobtained usingManualsegmentation using Manual segmentation DirectioninthefeaturespacewithmaxSNRobtained usingGEVdecompositionofCv andCc Covariance matricesoffeaturevectorsofV&C i ff f & LineartransformationmatrixW
FeatureTransformation(contd..)
FeatureTransformation(contd..)
Densityfunctionsofdvanddc areassumedtobe normallydistributed normally distributed Covariancematricesaftertransformation ) Cv = W T CvW ) T Cc = W CcW Measureofthevarianceorthescatteris determinantofthecovariancematrix Determinantisequaltotheproductofthe eigenvalues&hencetheproductofthevariancesin theprincipaldirections the principal directions
FeatureTransformation(contd..)
) T Cv W C vW J (W ) = ) = T W C cW Cc
Cvi ( v ) = i Cci ( v )
FeatureTransformation(contd..)
C c i ( c ) = i C v i ( c ) Similarlyforconsonants,GEVC Similarly for consonants, GEVC Thus,thetransformationWdiagonalizesbothCv &Cc ThevarianceofVIalong is whileCIhasunit The variance of VI along i(v) is i while CI has unit varianceinalldirections UsingSNRmeasureintroducedinMalayathetal. for g y GEVV
VowelConsonantSegmentation
NCv (k ) = NCc (k ) =
(i ( v ) )T xk
i =1 L
(i ( c ) )T xk
i =1
NCv and NCc are norm contours from V and C andNC arenormcontoursfromVandC subspaces Thesenorm contoursrepresentVIandCI These normcontours represent VI and CI
VowelConsonantSegmentation
Normcontourscrosseachother NCv (k ) = NCc (k ) Segmentationpoints L= Wefoundoptimumresultsfor3 Relativeimportanceofthedifferentfrequencybands forvowelsandconsonantsisconveyedbyfirstthree principalfilters VI:Midfrequencyregionofthespeechspectrum VI Mid f i f th h t CI:Low&Highfrequencyregions Speech&Speakerinformation[Vijayakrishna] S h&S k i f ti [Vij kih ]
PerformanceofVowelConsonant Segmentation
GEVVs&GEVCsareobtainedfromTamilspeech databasespokenbyamalevolunteer ForobtainingMFCC,theMelscalewassimulated usingasetof24triangularfilters ForLPCC,a12 F LPCC 12th orderLPCanalysiswasperformed d LPC l i f d afterpreemphasiswith =0.95 Segmentationtestswerecarriedoutonthebasic Segmentation tests were carried out on the basic unitsofKannadaspeechdatabasespokenbya femalevolunteer
PerformanceofVowelConsonant Segmentation
0.5 0 0.5 Speech Signal /eyo/ (a) 1 0 0.05 0.1 6 4 2 0 4000 (b) 0
0.15
0.3
0.35
0.4
0.45
Norm
0.05
0.1
0.15
0.3
0.35
0.4
0.45
Frequency (Hz) z)
3000 2000 1000 0 0 0.05 0.1 0.15 0.2 0.25 Time(sec) 0.3 0.35 0.4 0.45
Performanceof VowelConsonantSegmentation
0.4 0.2 0 0.2 0.4 6 /i/ 4 2 0 4000 (b) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 (a) 0 0.05 0.1 0.15 /y/ 0.2 0.25 /o/ 0.3 0.35
Frequency
3000 2000 1000 0 0 0.05 0.1 0.15 Time 0.2 0.25 0.3 0.35
PerformanceofVowelConsonant Segmentation
0.4
Norm
4 2 0 (b) 0 0.1 0.2 0.3 0.4 Time (sec) 0.5 0.6 0.7
4000
Frequency (Hz) z)
3000 2000 1000 0 0 0.1 0.2 0.3 0.4 Time (sec) 0.5 0.6 0.7
PerformanceofVowelConsonant Segmentation
0.4 Speech Signal /auyi/ 0.2 0 0.2 0.4 8 6 /au/ /y/ /i/ (a) 0 0.1 0.2 0.3 Time(sec) 0.4 0.5 0.6 0.7
Norm
8000
Frequency
6000 4000 2000 0 0 0.1 0.2 0.3 Time 0.4 0.5 0.6 0.7
Publications
R. Muralishankar, A. G. Ramakrishnan, Modification of Pitch using DCT in the Source Domain, Speech Communication, 2005. R. R Muralishankar and A G Ramakrishnan Discrete Cosine Transformed A. G. Ramakrishnan, Discrete Cepstrum, International Journal of Speech Technology, 2002. R.MuralishankarandA.G.Ramakrishnan,DCTbasedpseudocomplex cepstrum Proc ICASSP 2002 Orlando Florida May 13 17,2002. cepstrum,Proc.ICASSP2002,Orlando,Florida,May13 17 2002 R.MuraliShankarandAGR, DCTbasedPitchModification,Proc. SPCOM01,IISc,Bangalore,July1518,2001. JayavardhanaRamaandAGR,Thirukkural:atexttospeechsynthesis system,TamilInternet2001,KualaLumpur,Aug2628,2001. R.MuralishankarandAGR,NaturalisingtheTamilsynthesizer,Tamil Internet2001,KualaLumpur,August2628,2001.
Publications contd.
K G Aparna, G L Jayavardhana Rama and A. G. Ramakrishnan, Machine reading of Tamil books an aid for the blind, Proc. International Conf. on Biomedical Engg., Bangalore, Dec. 2124, 2001. K.SureshandAGR,"ADCTbasedEstimationofPitch",Proc.Intern. , , Conf.MultimediaProc.Systems,Chennai,Aug.1315,2000. R.Murali ShankarandAGR,"RobustPitchdetectionusingDCTbased SpectralAutocorrelation ,Proc.Intern.Conf.MultimediaProcessing Spectral Autocorrelation", Proc. Intern. Conf. Multimedia Processing andSystems,Chennai,Aug.1315,2000. RMuraliShankar andAGR,"SynthesisofSpeechwithEmotions,Proc. Intern.Conf.Commn.Comp.Devices,KGP,Dec.1416,2000. Intern Conf Commn Comp Devices KGP Dec 1416 2000
Acknowledgement
MinistryofSocialJustice&Empowerment DepartmentofInformationTechnology, GovernmentofIndia. KarnatakaStateCouncilforScienceand Technology. Technology TamilSoftwareDevelopmentFund LDCILandCIIL THEDISTINGUISHEDAUDIENCE THE DISTINGUISHED AUDIENCE
BlockdiagramofTTS
Input/Output of Thirukkural/Vaachaka
Input Input
Textinputthroughmultiplekeyboards Printedtextthathasundergoneoptical Printed text that has undergone optical characterrecognition ExistingUnicodefilesorfromwebsites Output Intelligible,naturalTamil/Kannadaspeech
Textanalysis
Offline Recordingbasicunits g Observationofdurationinnaturalspeech CIILbookonphoneduration Online Parsing Graphemetophonemeconversion Applyingdurationrules
SpeechSignalProcessing
ConsonantVowel Segmentation
LPCepstrumbasedsegmentation
PitchDetectionandMarking
PitchDetection
DCTbasedSpectralautocorrelation DCT b d S t l t l ti
PitchMarking
Markedatzerocrossings Marked at zero crossings
Featuresofthesoftware
AcceptableQualitymalevoice p y Textinputusinganykeyboardinterfaceor existingUnicodefile existing Unicode file DisplaystextinTamil/Kannada A Acceptablynaturalandintelligible. t bl t l d i t lli ibl
Scopeforfurtherwork
Comprehensive notmissingconsonantclusters Comprehensive not missing consonant clusters Naturalprosody Simulatingdifferentcharacteristicsofthespeaker Simulating different characteristics of the speaker Emotionscouldbeadded Provisionforalienwords,English Provision for alien words English