Sei sulla pagina 1di 7

Speech recognition is an important field of study. Mastering the technology gives huge impact towards humans ways of life.

Today, human interaction with machines use devices like mouse and keyboards which depend much on hand movements, the speech technology can change the norms by allowing interaction via speech which is faster, easy and comfort

a speech recognition system that able to recognize speech like humans ability is yet to be achieved. Nevertheless, the technology has reached a level which it can be applied to specific industrial purpose applications like flight information query, car making industries, post package delivery request [13] and some software for simple application like word dictation are also available commercially.

speech recognition by machine is a difficult task because it requires machine to have depth knowledge about humans everyday speaking and conversation experience and linguistic knowledge

Artificial Intelligent (AI). The approach combines study of pattern recognition with machine ability to see, analyzed, learn and make decision imitating humans ability.

Human speech perception starts with receiving speech waveform through the ears. The speech will enter membrane basilar situated in the inner middle ear at which the signal waveform will be analysed and produced spectrum signal. The spectrum signal will enter neural transducer which converts the signal to neural activities at the ears nerve. The neural signal activities are translated into language code and the message will be sent to the brain for perception.

Figure 1 shows the schematic diagram of the process. Figure 1. Human perception model

The speech recognition system model built in this paper imitates the human perception model. Three levels of processes are required for recognition by machine that are the acoustic processing, features extraction and recognition. The analogue speech will be sampled, digitised and filtered. These processes convert the signal into discrete form. The signal is then decided for the start and end points so that only the signal with information will be analysed. The next process is to extract features from the signal within the start and end points.

WHAT DD U USE FOR SPECTRUAL ANALYSIS

1 1. Introduction Speech is a natural mode of communication for people. We learn all the relevant skills during early childhood, without instruction, and we continue to rely on speech communication throughout our lives. It comes so naturally to us that we dont realize how complex a phenomenon speech is. The human vocal tract and articulators are biological organs with nonlinear properties, whose operation is not just under conscious control but also affected by factors ranging from gender to upbringing to emotional state. As a result, vocalizations can vary widely in terms of their accent, pronunciation, articulation, roughness, nasality, pitch, volume, and speed; moreover, during transmission, our irregular speech patterns can be further distorted by background noise and echoes, as well as electrical characteristics (if telephones or other electronic equipment are used). What makes people so good at recognizing speech? Intriguingly, the human brain is known to be wired differently than a conventional computer; in fact it operates under a radically different computational paradigm. While conventional computers use a very fast &

complex central processor with explicit program instructions and locally addressable memory, by contrast the human brain uses a massively parallel collection of slow & simple processing elements (neurons), densely connected by weights (synapses) whose strengths are modified with experience, directly supporting the integration of multiple constraints, and providing a distributed form of associative memory.

1.1. Speech Recognition What is the current state of the art in speech recognition? This is a complex question, because a systems accuracy depends on the conditions under which it is evaluated: under sufficiently narrow conditions almost any system can attain human-like accuracy, but its much harder to achieve good accuracy under general conditions. The conditions of evaluation and hence the accuracy of any system can vary along the following dimensions: Vocabulary size and confusability Speaker dependence vs. independence. Isolated, discontinuous, or continuous speech. Task and language constraints. Read vs. spontaneous speech. Adverse conditions.

The central issue in speech recognition is dealing with variability. Currently, speech recognition systems distinguish between two kinds of variability: acoustic and temporal. Acoustic variability covers different accents, pronunciations, pitches, volumes, and so on, while temporal variability covers different speaking rates. These two dimensions are not completely independent when a person speaks quickly, his acoustical patterns become distorted as well but its a useful simplification to treat them independently Of these two dimensions, temporal variability is easier to handle.

Neural Networks Connectionism, or the study of artificial neural networks, was initially inspired by neurobiology, but it has since become a very interdisciplinary field, spanning computer science, electrical engineering, mathematics, physics, psychology, and linguistics as well. Some researchers are still studying the neurophysiology of the human brain, but much attention is 1. Although there remain unresolved secondary issues of duration constraints, speaker-dependent speaking rates, etc. 1.2. Neural Networks 5 now being focused on the general properties of neural computation, using simplified neural models. These properties include: Trainability. Networks can be taught to form associations between any input and output patterns. This can be used, for example, to teach the network to classify speech patterns into phoneme categories. Generalization. Networks dont just memorize the training data; rather, they learn the underlying patterns, so they can generalize from the training data to new examples. This is essential in speech recognition, because acoustical patterns are never exactly the same. Nonlinearity. Networks can compute nonlinear, nonparametric functions of their input, enabling them to perform arbitrarily complex transformations of data. This is useful since speech is a highly nonlinear process. Robustness. Networks are tolerant of both physical damage and noisy data; in fact noisy data can help the networks to form better generalizations. This is a valuable feature, because speech patterns are notoriously noisy. Uniformity. Networks offer a uniform computational paradigm which can easily integrate constraints from different types of inputs. This makes it easy to use both basic and differential speech inputs, for example, or to combine acoustic and

visual cues in a multimodal system. Parallelism. Networks are highly parallel in nature, so they are well-suited to implementations on massively parallel computers. This will ultimately permit very fast processing of speech or other data.

The concept of artificial neural networks is rooted deep into the recognition that though the human brain performs the functions about a million times slower than the digital computers, yet the human brain is more efficient when it comes to performing a complex set of the tasks such as speech synthesis, visual information processing, handwriting analysis etc. this is partially attributed to the fact that human brain is massively parallel structure of biological neurons. ANNs are physical cellular system, which can acquire, store and utilize experimental knowledge. ANNs have been applied to an increasingly number of real world problems of considerable complexity. Their most important advantage is in solving problems that are too complex for conventional technologies, problems that do not have an algorithmic solution or for which an algorithmic solution is too complex to be found. In general, because of their abstraction from the biological brain, ANNs are well suited to problems that people are good at solving, but for which computers are not. These problems include pattern recognition, forecasting etc

Neural nets offer two potential advantages over existing approaches.

First, their use of many processors operating in parallel mayprovide the computational power required for continuous-speech recognition. Second, new neural net algorithms, which could self-organize and build an internal speech model that maximizes performance, would perform even better than existing algorithms. These new algorithms could mimic the type oflearning used by a child who is mastering new words and phrases.

For example, speaker-dependent (SD) systems which accept the speech from specific speakers are usually applied in security systems. On the other hand, speaker independent (SI) recognisers are designed to recognise speech from different speakers such as speech to text enginesin word processing programs, as a substitute to a keyboardBroadly speaking, speech recognition systems are usually built upon three common approaches, namely, theacoustic-phonetic approach, the pattern recognition approach and the artificial intelligence approach

Speech Signal Representation A speech signal is usually classified into three states. The first state is silence, where no speech is produced. Thesecond state is unvoiced, in which the vocal cords are not vibrating and the resulting signal is random in nature. last state is voices, in which the vocal cords vibrate and produce a quasi-periodic signal. The silence state is usuallythe unwanted state and has to be removed in order to save the processing time of the speech recognition system as well as to improve the accuracy of the system.In the time domain, the amplitude of the speech signal at each sampling time is plotted over time. Thisrepresentation gives the picture on how a speech varies over time, and requires large storages

Potrebbero piacerti anche