Sei sulla pagina 1di 51

The Simon Handbook

2Contents
1 Introduction 9
2 Overview 10
2.1
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 10
2.2 Required Resources for a Working Simon
Setup . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 12
2.2.2 Acoustic
model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2.1 Backends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.2.2.2 Types of base models . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2.2.1 Static base model . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2.2.2 Adapted base model . . . . . . . . . . . . . . . . . . . . . 13
2.2.2.2.3 User-generated model . . . . . . . . . . . . . . . . . . . . 14
2.2.2.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2.4 Where to get base models . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2.5 Phoneme set issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Using Simon: Typical user 16
3.1 First run
wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 16
3.1.1 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 16
3.1.2 Base models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 18
3.1.3
Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 19
3.1.4 Sound configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 19
3.1.5 Volume
calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 The Simon Main Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 21
3.2.1 Main window:
Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.2 Main window: Training . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 21
3.2.3 Main window: Acoustic model . . . . . . . . . . . . . . . . . . . . . . . . .
. 22
3.2.4 Main window:
Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . 22
3.3.1 Import Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 23
3.3.2 Delete Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 23
3.4
Recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 23
3.4.1
Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 23
3.4.1.1 Simon Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24The Simon Handbook
3.4.1.2 Audacity Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4.2 Silence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 25
3.4.3 Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 26
3.4.4
Microphone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 27
3.4.5 Sample Quality
Assurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Contribute
Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.6 Manage training
data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.6.1 Modifying samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 30
3.6.2 Clear training data . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 31
3.6.3 Importing Training
Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.7 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 32
3.7.1 General Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 32
3.7.2
Recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 33
3.7.2.1 Device Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.7.2.2 Voice Activity Detection . . . . . . . . . . . . . . . . . . . . . . . . 35
3.7.2.3 Training settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
3.7.2.4 Postprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
3.7.2.5 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
3.7.3 Speech
Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.7.3.1 Base model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
3.7.3.2 Training data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
3.7.3.3 Language Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
3.7.4 Model
Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.7.5 Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 42
3.7.5.1
Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.7.5.1.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.7.5.1.2 Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.7.5.2 Synchronization and Model Backup . . . . . . . . . . . . . . . . . . 44
3.7.6 Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 45
3.7.6.1 Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
3.7.6.2 Dialog font . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
3.7.6.3 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 46
3.7.7 Text-to-
speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.7.7.1 Backends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
3.7.7.2 Recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
3.7.7.3 Webservice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
3.7.8 Social
desktop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.7.9 Webcam
configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.7.10 Advanced: Adjusting the recognition parameters manually . . . . . . . . . 49
3.7.10.1 Julius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 49
4The Simon Handbook
4 Advanced: Creating new scenarios with Simon 51
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 51
4.2 Speech recognition:
background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.1 Language
Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.1.1 Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
4.2.1.1.1 Active Dictionary . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.1.1.2 Shadow Dictionary . . . . . . . . . . . . . . . . . . . . . . 53
4.2.1.1.3 Language profile . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.1.2 Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
4.2.2 Acoustic
Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . 55
4.3.1 Scenario
hierarchies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3.2 Adding a new Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 56
4.3.3 Edit Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 57
4.3.4 Export Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 57
4.4
Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 59
4.4.1 Adding
Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4.1.1 Defining the Word . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
4.4.1.1.1 Manually Selecting a Category . . . . . . . . . . . . . . . 62
4.4.1.1.2 Manually Providing the Phonetic Transcription . . . . . . 62
4.4.1.2 Training the Word . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
4.4.2 Editing a
word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4.3 Removing a word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 66
4.4.4 Special
Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4.5 Importing a
Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4.5.1 HADIFIX Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4.5.2 HTK Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4.5.3 PLS Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
4.4.5.4 SPHINX Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4.5.5 Julius Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
4.4.6 Create language profile . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 70
4.5 Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 70
4.5.1 Import a
Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5.2 Renaming Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 74
4.5.3 Merging
Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.6
Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 75
4.6.1 Storage Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 76
4.6.2 Adding
Texts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.6.2.1 Add training texts . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
4.6.2.2 Local text
files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5The Simon Handbook
4.6.3 On-The-Fly Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 79
4.7 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . 80
4.7.1 Scenario
selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.7.2 Sample groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 82
4.7.3 Context
conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.7.3.1 Active window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
4.7.3.2 D-Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
4.7.3.3 Face detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
4.7.3.4 File
content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.7.3.5 Lip detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
4.7.3.6 Or condition association . . . . . . . . . . . . . . . . . . . . . . . . 85
4.7.3.7 Process opened . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
4.8
Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 86
4.8.1 Executable Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 89
4.8.1.1 Importing Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.8.2 Place
Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.8.2.1 Importing Places . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
4.8.3 Shortcut Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 92
4.8.4 Text-Macro Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 93
4.8.5 List Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 93
4.8.5.1 List Command Display . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.8.5.2 Configuring list elements . . . . . . . . . . . . . . . . . . . . . . . .
96
4.8.6 Composite
Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.8.7 Desktop
grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.8.8 Input
Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.8.9 Dictation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 102
4.8.10 Artificial
Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.8.11 Calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 104
4.8.12 Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 106
4.8.13 Pronunciation Training . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 107
4.8.14 Keyboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 108
4.8.15 Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 110
4.8.15.1 Dialog design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
110
4.8.15.2 Dialog: Bound values . . . . . . . . . . . . . . . . . . . . . . . . . .
112
4.8.15.3 Template options . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
115
4.8.15.4
Avatars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.8.15.5 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
116
4.8.16
Akonadi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 117
4.8.17 D-
Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 118
4.8.18 JSON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 118
4.8.19 VRPN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 119
5 Questions and Answers 121
6 Credits and License 122
A Installation 123
6The Simon Handbook
List of Tables
2.1 Ways to an acoustic model . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 12
2.2 Base model requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 14
3.1 Julius Configuration
Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.1 Sample Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 52
4.2 Sample Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 53
4.3 Improved Sample
Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4 Sample Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 54
4.5 Improved Sample
Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7Abstract
Simon is an open source speech recognition solution.The Simon Handbook
Chapter 1
Introduction
Simon is the main front end for the Simon open source speech recognition solution.
It is a Simond
client and provides a graphical user interface for managing the speech model and
the commands.
Moreover, Simon can execute all sorts of commands based on the input it receives
from the server:
Simond.
In contrast to existing commercial offerings, Simon provides a unique do-it-
yourself approach to
speech recognition. Instead of predefined, pre-trained speech models, Simon does
not ship with
any model whatsoever. Instead, it provides an easy to use end-user interface to
create language
and acoustic models from scratch.
Additionally the end-user can easily download created use cases from other users
and share his
/ her own.
The current release can be used to set up command-and-control solutions especially
suitable
for disabled people. However, because of the amount of training necessary,
continuous, free
dictation is neither supported nor reasonable with current versions of Simon.
Because of its architecture, the same version of Simon can be used with all
languages and dialects.
One can even mix languages within one model if necessary.
9The Simon Handbook
Chapter 2
Overview
2.1 Architecture
The main recognition architecture of Simon consists of three applications.
• Simon
This is the main graphical interface.
It acts as a client to the Simond server.
• Simond
The recognition server.
• KSimond
A graphical front-end for Simond.
These three components form a real client / server solution for the recognition.
That means that
there is one server (Simond) for one or more clients (Simon; this application).
KSimond is just a
front-end for Simond which means it adds no functionality to the system but rather
provides a
way to interact with Simond graphically.
Additionally to the Simon, Simond and KSimond other, more specialized applications
are also
part of this integrated Simon distribution.
• Sam
Provides more in-depth control to your speech model and allows to test the acoustic
model.
• SSC / SSCd
These two applications can be used to collect large amount of speech samples from
different
persons more easily.
• Afaras
This simple utility allows users to quickly check large corpora of speech data for
erroneous
samples.
Please refer to the individual handbooks of those applications for more details.
10The Simon Handbook
Simon is used to create and maintain a representation of your pronunciation and
language. This
representation is then sent to the server Simond which compiles it into a usable
speech model.
Simon then records sound from the microphone and transmits it to the server which
runs the
recognition on the received input stream. Simond sends the recognition result back
to the client
(Simon).
Simon then uses this recognition result to execute commands like opening programs,
following
links, etc.
Simond identifies its connections with a user / password combination which is
completely in-
dependent from the underlying operating system and its users. By default a standard
user is set
up in both Simon and Simond so the typical use case of one Simond server per Simon
client will
work ‘out of the box’.
Every Simon client logs onto the server with a user / password combination which
identifies a
unique user and thus a unique speech model. Every user maintains his own speech
model but
may use it from different computers (different, physical Simon instances) simply by
accessing the
same Simond server. One Simond instance can of course also serve multiple users.
If you want to open up the server to the Internet or use multiple users on one
server, you will
have to configure Simond. Please see the Simond manual for details.
2.2 Required Resources for a Working Simon Setup
NOTE
For background information about speech models, please refer to the Speech
Recognition: Back-
ground section.
To get Simon to recognize speech and react to it you need to set up a speech model.
Speech models describe how your voice sounds, what words exist, how they sound and
what
word combination (‘sentences’ or ‘structures’) exist.
A speech model basically consists of two parts:
11The Simon Handbook
• Language model: Describes all existing words and what sentences are grammatically
correct
• Acoustic model: Describes how words sound
You need both these components to get Simon to recognize your voice.
In Simon, the language model will be created from your active scenarios and the
acoustic model
will be either built solely through your voice recordings (training) or with the
help of a base model.
2.2.1 Scenarios
One scenario makes up one complete use case of Simon. To control Firefox, for
example, the user
just installs the Firefox scenario.
In other words, scenarios tell Simon what words and phrases to listen for and what
to do when
they are recognized.
Because scenarios do not contain information about how these words and phrases
actually sound,
they can be shared and exchanged between different Simon users without problems. To
accom-
modate this community based repository pool, a category for Simon scenarios has
been created
on the KDE Store where the scenarios, which are just simple text files (XML
format), can be ex-
changed easily.
In most cases scenarios are tailored to work best with a specific base model to
avoid issues with
the phoneme set.
For information on how to use scenarios in Simon, please refer to the Scenario
section in the Use
Simon chapter.
2.2.2 Acoustic model
As mentioned above, you need an acoustic model to activate Simon.
You can either create your own or use and even adapt a base model. Base models are
already
generated, most often speaker independent, acoustic models that can be used with
Simon.
The following table shows what is required, depending on your Simon configuration:
Training required Base model
required
Model creation
backend required
Static base model No Yes No
Adapted base
model Yes Yes Yes
User-generated
model Yes No Yes
Table 2.1: Ways to an acoustic model
2.2.2.1 Backends
Simon uses external software to build acoustic models and to recognize speech.
Usually, these backends can be split into two distinct components: The ´´model
compiler´´ or
´´model generation´´ backend used to create or adapt acoustic models and the
´´recognizer´´ used
to recognize speech with the help of these models.
12The Simon Handbook
Not all operation modes of Simon will require a model compiler backend. Please
refer to the next
section about details on when this is the case.
Two different backends are supported:
• Julius / HTK
Models will be created with the HTK. Julius will be used as recognizer.
To use this backend, please make sure that you have an up-to-date version of both
these tools
installed.
• CMU SPHINX
This backend, also often simply referred to as ´´SPHINX backend´´, uses the
PocketSphinx
recognizer and the SphinxTrain model generation backend. Please refer to the CMU
SPHINX
website for more details.
The CMU SPHINX backend requires that Simon is built with the optional SPHINX
support. If
you have not compiled Simon from source, please refer to your distribution for more
informa-
tion.
If you are using base models, Simon will automatically select the appropriate
backend for you.
However, if you want to build your own models from scratch (user-generated model,
see below)
and have a certain preference, please refer to the Simond configuration for more
information.
Base models created for one backend are not compatible with any other backend.
Please refer to
the compatibility matrix for details.
2.2.2.2 Types of base models
There are three types of base models:
• Static base model
• Adapted base model
• User-generated model
For information on how to use base models in Simon, please refer to the Base Models
section in
the Use Simon chapter.
2.2.2.2.1 Static base model
Static base models simply use a pre-compiled acoustic model without modifying it.
Any training data collected through Simon will not be used to improve the
recognition accuracy.
This type of model does not require the model creation backend to be installed.
2.2.2.2.2 Adapted base model
By adapting a pre-compiled acoustic model you can improve accuracy by adapting it
to your
voice.
Collected training data will be compiled in an adaption matrix which will then be
applied to the
selected base model.
This type of model does require the model creation backend to be installed.
13The Simon Handbook
2.2.2.2.3 User-generated model
When using user-generated models, the user is responsible for training his own
model. No base
model will be used.
The training data will be used to compile your own acoustic model allowing you to
create a
system which directly reflects your voice.
This type of model does require the model creation backend to be installed.
2.2.2.3 Requirements
To build, adapt or use acoustic models of different types, certain software needs
to be installed.
CMU SPHINX Julius / HTK
Static base model PocketSphinx Julius
Adapted base model SphinxTrain, PocketSphinx HTK, Julius
User-generated model SphinxTrain, PocketSphinx HTK, Julius
Table 2.2: Base model requirements
All four tools, HTK, Julius, PocketSphinx and SphinxTrain, can safely be installed
at the same
time.
SPHINX support in Simon must be enabled during compile time and might not be
available on
your platform. Please refer to your distribution.
NOTE
The Simon Windows installer includes Julius, PocketSphinx and SphinxTrain but not
the HTK. Please
refer to the installation section for information on how to install it should you
find the need for it.
2.2.2.4 Where to get base models
Simon base models are packaged as .sbm files. If you happen to have raw model files
for your
backend, you can package them into a compatible SBM container within Simon. Please
refer to
the speech model configuration for details.
Not all SBM models may work for you. Please refer to the model backends section for
details.
To keep this list of available base models up to date, please refer to the list in
our online wiki.
2.2.2.5 Phoneme set issues
In order for base models to work, both your scenarios and your base model need to
use the same
set of phonemes.
In practice, this often just means that you need to match scenarios to your base
model. The name
of Simon base models will most likely start with a tag like ´´[EN/VF/JHTK]´´. Try
to download
scenarios that start with the same tag.
You can not use scenarios designed for different phoneme set (different base
model). If Simon
recognizes this error, it will try to disable affected words by removing them from
the created
speech model. These words will be marked with a red background in the vocabulary of
the
scenario. To re-enable them, transcribe them with the proper phoneme set or use a
user-generated
model.
14The Simon Handbook
HINT
If you design a new scenario it is therefore a good idea to use the dictionary that
was used to create the
base model as shadow dictionary. This way Simon will suggest the ‘correct’ phonemes
when adding
the words automatically.
15The Simon Handbook
Chapter 3
Using Simon: Typical user
The following sections will describe how to use Simon.
3.1 First run wizard
On the first start of Simon, this assistant will guide you through the initial
configuration of Simon.
The configuration consists of five easy steps which are outlined below. You can
skip each step
and even the whole wizard if you want to - in that case, the system will be set up
with default
values.
However, please note that without any configuration, there won’t be any
recognition.
3.1.1 Scenarios
In this step you can add or download scenarios.
16The Simon Handbook
To download scenarios from the online repository, select Open → Download to open
the down-
load dialog pictured below.
Especially for new users it is recommended to try some scenarios first to see how
the system
works before diving into configuring it exactly for your use case.
After completing the assistant, you can change the scenario configuration with the
use of the
scenario management dialog.
If you are planning to use a base model, make sure that you download matching
scenarios.
17The Simon Handbook
3.1.2 Base models
In this step you can set up Simon to use base models.
Again, you can download base models from an online repository through Open model →
Download.
To use a user-generated model, select Do not use a base model.
After completing or aborting the first run wizard you can change configuration
options defined
here in the Simon configuration.
18The Simon Handbook
3.1.3 Server
Internally, Simon is a server / client application. If you want to take advantage
of a network
based installation, you can provide the server address here.
The default configuration is sufficient for a ‘normal’ installation and will assume
that you use a
local Simond server that will be started automatically and stopped with Simon.
After completing or aborting the first run wizard you can change configuration
options defined
here in the server configuration.
3.1.4 Sound configuration
Because Simon recognizes sound from one or more microphones, you have to tell Simon
which
devices you want to use for recognition and training.
19The Simon Handbook
Simon can use one or more input- and output devices for different tasks. You can
find more
information about Simon’s multiple device capabilities in the Simon sound
configuration section.
If you don’t have at least one working input device for recognition, you will not
be able to activate
Simon.
After completing or aborting the first run wizard you can change configuration
options defined
here in the sound configuration.
3.1.5 Volume calibration
For Simon to work correctly, you need to configure your microphones volume to a
sensible level.
20The Simon Handbook
For more details on this, please see the general section about Volume Calibration.
3.2 The Simon Main Window
The Simon main window is split into four logical sections. On the top left, you can
see the
scenario section, to its right you find the training section, on the bottom left is
the acoustic model
and finally, on the right of that, the recognition section.
The Simon main window can be hidden at any time by clicking on the Simon logo in
the system
tray (usually next to the system clock in the task bar) which will minimize Simon
to the tray.
Click it again to show the main window again.
3.2.1 Main window: Scenarios
A list of scenarios shows the currently loaded scenarios. You can manage this
selection by clicking
Manage scenarios which will open the scenario management dialog.
To modify a scenario, select it from the list and open it by pressing Open ´´<name>
´´.
3.2.2 Main window: Training
This section shows all training texts from all currently active scenarios.
Selecting a training text will highlight the parent scenario in the scenario
section.
You can start to train the recognition by selecting a text and clicking on Start
training. Please note
that, depending on your selected model type, training may or may not improve your
recognition
accuracy. The acoustic model section (see below) in the Simon main menu tells you
if training
will have an effect for your specific configuration. For more information, please
refer to the base
model section for background information on this subject.
The gathered training corpus can be managed by selecting Manage training data which
will open
the sample management dialog.
21The Simon Handbook
To help build a general, open speech corpus, please consider contributing your
training corpus
to the Voxforge project by selecting File → Contribute samples to bring up the
sample upload
assistant.
3.2.3 Main window: Acoustic model
Here, Simon shows information about the currently used base- and active model.
Select Configure acoustic model to configure the base model.
3.2.4 Main window: Recognition
This section shows information about the recognition status.
If Simon is connected to the server, you can activate and deactivate the
recognition by toggling
the Activate button. If this control element is not available, make sure you are
connected by
selecting File → Connect from Simons menu.
An integrated volume calibration widget monitors the configured recognition
devices. The
sound setup can be modified by selecting Configure audio to bring up the sound
configuration.
3.3 Scenarios
This section describes how to import and remove scenarios to your Simon
configuration. For
general information about scenarios, please refer to the background chapter. If you
want to
create, edit or export scenarios, please refer to the advanced usage section.
To modify your scenario configuration, first open the scenario management dialog by
pressing
Manage scenarios in the Simon main window.
To activate or deactivate a scenario you can use the arrow buttons between the two
lists or simply
double click the option you want to load / unload.
More information about individual scenarios can be found in the tooltips of the
list items.
22The Simon Handbook
3.3.1 Import Scenario
Scenarios can be imported from a local file in Simon’s XML scenario file format but
can also be
directly downloaded and imported from the internet.
When downloading scenarios, the list of scenarios is retrieved from Simon Scenarios
subsection
of the OpenDesktop site KDE Store.
If you create a scenario that might be valuable for other Simon users, please
consider uploading
it to this online repository and help other Simon users.
3.3.2 Delete Scenario
To delete a scenario, select the scenario and click the Delete button.
Because scenarios are synchronized with the recognition server, you can restore
deleted scenarios
through the model synchronization backup.
3.4 Recordings
If you are using user-generated or adapted models, Simon builds its acoustic model
based on
transcribed samples of the users voice. Because of this, the recorded samples are
of vital impor-
tance for the recognition performance.
3.4.1 Volume
It is important that you check your microphone volume before recording any samples.
23The Simon Handbook
3.4.1.1 Simon Calibration
The current version of Simon includes a simple way of ensuring that your volume is
configured
correctly.
By default the volume calibration is displayed before starting any recording in
Simon.
To calibrate simply read the text displayed.
The calibration will monitor the current volume and tell you to either raise or
lower the volume
but you have to do that manually in your systems audio mixer.
During calibration, try to talk normally. Don’t yell but don’t be overly quiet
either. Take into
account that you should generally use the same volume setting for all your training
and for the
recognition too. You might speak a little bit louder (unconsciously) when you are
upset or at
another time of the day so try to raise your voice a little bit to anticipate this.
It is much better to
have a little quieter samples than to start clipping.
In the Simon settings, both the text displayed and the levels considered correct
can be changed. If
you leave the text empty, the default text will be displayed. In the options you
can also deactivate
the calibration completely. See the training section for more details.
3.4.1.2 Audacity Calibration
Alternatively you can use an audio editing tool like the free Audacity to monitor
the recording
volume.
Too quiet:
24The Simon Handbook
Too loud:
Perfect volume:
3.4.2 Silence
To help Simon with the automatic segmentation it is recommended to leave about one
or two
seconds of silence on the recording before and after reading the prompted text.
Current Simon versions include a graphical notice on when to speak during
recording. The
message will tell the user to wait for about half a second:
25The Simon Handbook
... before telling the user to speak:
This method of visual feedback proved especially valuable when recording with
people who
cannot read the prompted text for themselves and therefore need someone to tell
them what they
have to say. The colorful visual cue tells them when to start repeating what the
facilitator said
without the need of unreliable hand gestures.
3.4.3 Content
Generally we recommend to record roughly the same sentences that Simon should
recognize
later.
26The Simon Handbook
(Obviously that does not apply to massive sample acquisitions where other
properties like pho-
netic balance are more important)
Care should be taken to avoid recordings like ‘One One One’ to quickly ramp up the
‘recognition
rate’ property. Such recordings often decrease recognition performance because the
pronuncia-
tion differs greatly from saying the word in isolation.
3.4.4 Microphone
For Simon to work well, a high quality microphone is recommended.
However, even relatively cheap headsets (around 30 Euros) achieve very good results
- magni-
tudes better than internal microphones.
For maximum compatibility we recommend USB headsets as they usually support the
neces-
sary samplerate of 16 kHz, are very well supported from both Microsoft Windows as
well as
GNU/Linux and normally don’t require special, proprietary drivers to operate.
3.4.5 Sample Quality Assurance
Simon will check each recording against certain criteria to ensure that the
recorded samples are
not erroneous or of poor quality.
If Simon detects a problematic sample, it will warn the user to re-record the
sample.
Currently, Simon checks the following criteria:
• Sample peak volume
If the volume is too loud and the microphone started to ‘clip’ (Clipping on
wikipedia), Simon
will display a warning message urging the user to lower the microphone volume.
• Signal to noise ratio (SNR)
Simon will automatically determine the signal to noise ratio of each recording. If
the ratio is
below a configurable threshold, a warning message will be displayed.
The default value of 2300 % means that for Simon to accept a sample as correctly
recorded the
peak volume has to be 23 times louder than the noise baseline (lowest average over
50 ms).
Often this can be a result of either a very low quality microphone, high levels of
ambient noise
or a low microphone gain coupled with a ‘microphone boost’ option in the system
mixer.
SNR warning message triggered by an empty sample. This information dialog is
displayed when
clicking on the More information button on the recording widget.
27The Simon Handbook
3.5 Contribute Samples
The base models that can be used with Simon to augment or replace training are
built from other
peoples speech samples. In order to create high quality base models, a large amount
of training
samples are necessary.
If you trained your local Simon installation, you gathered valuable voice samples
that could
improve the quality of the general model.
Through Simon’s ´´Contribute Samples´´ dialog you can upload those recordings to
benefit the
Voxforge project to create high quality open source base models.
After connecting to the server, Simon will ask for some basic meta-information.
This informa-
tion obviously contains no personal information. Instead, it will later be used to
group together
samples of similar speaker groups to build more accurate acoustic models.
28The Simon Handbook
The duration of the upload process itself will depend on your internet connection.
Generally
speaking, this only transmits relatively little data because the audio samples
collected by Simon
are generally very small: around 0.1 MB per sample.
3.6 Manage training data
To view and modify your personal training corpus, you can access the training data
management
dialog by selecting Manage training data in the Simon main window or the training
section of
any opened scenario.
29The Simon Handbook
3.6.1 Modifying samples
To listen to or re-record a sample, select it from the list and select Open Sample.
In this dialog you can also modify the sample’s group after it was recorded.
If you remove the opened sample and do not re-record it, Simon will offer to remove
it from the
corpus.
30The Simon Handbook
3.6.2 Clear training data
After a confirmation dialog, this will remove all personal training data of the
user.
3.6.3 Importing Training Samples
Using the import training data field you can import previously gathered training
samples from
previous Simon versions or manual training.
NOTE
This feature is very specific. Please use it with caution and make sure that you
know exactly what you
are doing before you continue.
You can either provide a separate prompts file or let Simon extract the
transcriptions from the
filenames.
When using prompts based transcriptions your prompts file (UTF-8) needs to contain
lines of
the following content: ‘[filename] [content]’. Filenames are without file
extensions and the
content has to be uppercase. For example: demo_2007_03_20 DEMO to import the file
demo
_2007_03_20.wav containing the spoken word ‘Demo’.
Because prompts files do not contain a file extension, Simon will try wav, mp3, ogg
and flac (in
that order). If one of those match, no other extension will be tested and only the
first file will be
imported (in contrast to file based transcription where all files would be
imported).
When using file based transcriptions, a file called this_is_a_test.wav must contain
‘This is a test’
and nothing else. Numbers and special characters (‘.’, ‘-’,...) in the filename are
ignored and
stripped.
Files recorded by Simon 0.2 will follow this naming scheme so you can safely import
them us-
ing the file name extraction method. Files generated by previous Simon versions
should not be
imported using this function but you can use the prompts based import for that.
Imported files and their transcription are then added to the training corpus.
To import a folder containing training samples just select the folder to import and
depending on
your import type also the prompts file.
31The Simon Handbook
The folder will be scanned recursively. This means that the given folder and all
its subfolders
will be searched for .wav, .flac, .mp3 and .ogg files. All files found will be
imported.
When importing the sound files, all configured post processing filters are applied.
If you import anything other than WAV files you are responsible for decoding them
during the
import process (for example through post processing filters) or the model creation
will fail.
3.7 Configuration
Simon was designed with high configurability in mind. Because of this, there are
plentiful pa-
rameters that can be fine-tuned to your specific requirements.
You can access Simon’s configuration dialog through the application’s main menu:
Settings →
Configure Simon....
3.7.1 General Configuration
The general configuration page lists some basic settings.
If you want to show the first run assistant again, deselect Disable configuration
wizard.
32The Simon Handbook
Please note that the option to start Simon at login will work on both Microsoft
Windows and
when you are using KDE’s Plasma on Linux. Support for other desktop environments
like
Gnome, XFCE, etc. might require manually placing Simon in the session autostart
(please re-
fer to the respective manuals of your desktop environment).
When the option to start Simon minimized is selected, Simon will minimize to the
system tray
immediately after starting.
Deselecting the option to warn when there are problems with samples deactivates the
sample
quality assurance.
3.7.2 Recordings
Simon uses fairly sophisticated internal sound processing to enable complex multi-
device setups.
3.7.2.1 Device Configuration
The sound device configuration allows you to choose which sound device(s) to use,
configure
them and define additional recording parameters.
Use the Refresh devices button if you have plugged in additional sound devices
since you started
Simon.
33The Simon Handbook
Most of the time you will want to use 1 channel and 16kHz (which is also the
default) because the
recognition only works on mono input and works best at 16kHz (8kHz and 22kHz being
other
viable options). Some low-cost sound cards might not support this particular mode
in which case
you can enable automatic resampling in the device’s advanced configuration.
NOTE
Only change the channel and the samplerate if you really know what you are doing.
Otherwise the
recognition will most likely not work.
34The Simon Handbook
You can use Simon with more than one sound device at the same time. Use Add device
to add a
new device to the configuration and Remove device to remove it from your
configuration. The
first device in your sound setup cannot be removed.
For each device you can determine for what you want the device to be used: Training
or recogni-
tion (last one only applicable for input devices).
If you use more than one device for training, you will create multiple sound files
for each utter-
ance. When using multiple devices for recognition each one feeds a separate sound
input stream
to the server resulting in recognition results for each stream.
If you use multiple output devices the playback of the training samples will play
on all configured
audio devices.
When using different sample rates for your input devices, the output will only play
on matching
output devices. If you for example have one input device configured to use 16kHz
and the other
to use 48kHz, the playback of samples generated by the first one will only play on
16kHz outputs,
the other one only on 48kHz devices.
In the device’s advanced configuration, you can also define the sample group tag of
the produced
training samples and set activation context conditions.
If you set up this device to be used for recognition and (any of) it’s activation
requirements are
not met, the device will not record. This can be used to augment or even replace
the traditional
voice activity detection with context information.
For example, add a face detection condition to the recording devices activation
requirements to
only enable the recognition when you’re looking at the webcam.
3.7.2.2 Voice Activity Detection
The recognition is done on the Simond server. See the architecture section for more
details.
The sound stream is not continuous but is segmented by the Simon client. This is
done by some-
thing called ‘voice activity detection’.
Here you can configure this segmentation through the following parameters:
35The Simon Handbook
• Cutoff level
Everything below this level is considered ‘silence’ (background noise).
• Head margin
Cache for as long as head margin to start consider it a real sample. During this
whole time the
input level needs to be above the cutoff level.
• Tail margin
After the recording went below the cutoff level, Simon will wait for as long as
tail margin to
consider the current recording a finished sample.
• Skip samples shorter than
Samples that are shorter than this value are not considered for recognition
(coughs, etc.).
3.7.2.3 Training settings
When the option Default to power training is selected, Simon will, when training,
automatically
start- and stop the recording when displaying and hiding (respectively) the
recording prompt.
This option only sets the default value of the option, the user can change it at
any time before
beginning a training session.
The configurable font here refers to the text that is recorded to train the
acoustic model (through
explicit training or when adding a word).
This option has been introduced after we have worked with a few clients suffering
spastic dis-
ability. While we used the mouse to control Simon during the training, they had to
read what
was on the screen. At first this was very problematic as the regular font size is
relatively small
and they had trouble making out what to read. This is why we made the font and the
font size of
the recording prompt configurable.
Here you can also define the required signal to noise ratio for Simon to consider a
training sample
to be correct. See the Sample Quality Assurance section for more details.
On this configuration page you can also set the parameters for the volume
calibration.
36The Simon Handbook
It can be deactivated for both the add word dialog and the training wizard by
unchecking the
group box itself.
The calibration itself uses the voice activity recognition to score your sound
configuration.
The prompted text can be configured by entering text in the input field below. If
the edit is empty
a default text will be used.
3.7.2.4 Postprocessing
All recorded (training) and imported (through the import training data) samples can
be processed
using a series of postprocessing commands. Postprocessing chains are an advanced
feature and
shouldn’t be needed by the average user.
The postprocessing commands can be seen as a chain of filters through which the
recordings have
to pass through. Using these ‘filters’ one could define commands to suppress
background noise
in the training data or normalize the recordings.
Given the program process_audio which takes the input- and output files as its
arguments (e.g.:
process_audio in.wav out.wav) the postprocessing command would be: process_audio %1
%2. The two placeholders %1 and %2 will be replaced by the input filename and the
output
filename respectively.
The switch to ‘apply filters to recordings recorded with Simon’ enables the
postprocessing chains
for samples recorded during the training (including the initial training while
adding the word). If
you don’t select this switch the postprocessing commands are only applied to
imported samples
(through the import training data wizard).
3.7.2.5 Context
Every sample recorded with Simon is assigned a sample group.
When creating the acoustic model from the training samples Simon can take the
current situation
into account to only use a subset of all gathered training data.
37The Simon Handbook
For example, in a system where multiple, very different speakers use one shared
setup, context
conditions can be set up to automatically build separate models for both users
depending on the
current situation.
The above screenshot, for example, shows a setup where, given that all samples of
´´peter´´ were
tagged ´´peters_samples´´ and all samples of ´´mathias´´ were tagged
´´mathias_samples´´ (refer
to the device configuration for more information on how to set up sample groups),
the active
acoustic model will only contain the current user’s own samples as long as the file
/home/bedah
r/.username contains either ´´peter´´ or ´´mathias´´.
Another example use-case would be to switch to a more noise-resistant acoustic
model when the
user starts playing music.
3.7.3 Speech Model
Here you can adjust the parameters of the speech model.
3.7.3.1 Base model
You can optionally use base models to limit / circumvent the training or to avoid
installing a
model creation backend. Please refer to the general base model section for more
details about
base models.
38The Simon Handbook
To use a user-generated model, select Do not use a base model. To use a static base
model,
select a base model and do not select Adapt base model using training samples. To
instead use
an adapted base model, check Adapt base model using training samples after
selecting a base
model.
Simon base models are packaged in .sbm files.
To add base models to the selection, you can either import local models (Open model
→ Import),
download them from an online repository (Open model → Download) or create new ones
from
raw files (Open model → Create from model files).
If you have raw model files produced by either supported model creation backend,
you can
package them into SBM container for use with Simon.
39The Simon Handbook
You can also export your currently active model by selecting Export active model.
The exported
SBM file will contain your full acoustic model (ignoring the current context) that
can be shared
with other Simon users.
3.7.3.2 Training data
This section allows to configure the training samples.
The samplerate set here is the target samplerate of the acoustic model. It has
nothing to do
with the recording samplerate and it is the responsibility of the user to ensure
that the samples
40The Simon Handbook
are actually made available in that format (usually by recording in that exact
samplerate or by
defining postprocessing commands that resample the files; see the sound
configuration section
for more details).
Usually either 16kHz or 8kHz models are built / used. 16kHz models will have higher
accuracy
over 8kHz models. Going higher than 16kHz is not recommended as it is very cpu-
intensive and
in practice probably won’t result in higher recognition rates.
Moreover, the path to the training samples can be adjusted. However, be sure that
the previously
gathered training samples are also moved to the new location. If you use automatic
synchroniza-
tion the Simond would alternatively also provide Simon with the missing sample but
copying
them manually is still recommended for performance reasons.
3.7.3.3 Language Profile
In the language profile section you can select a previously built or downloaded
language profile
to aid with the transcription of new words.
3.7.4 Model Extensions
Here you can configure the base URL that is going to be used for the automatic bomp
import.
The default points to the copy on the Simon listens server.
41The Simon Handbook
3.7.5 Recognition
Here you can configure the recognition and model synchronization with the Simond
server.
3.7.5.1 Server
Using the server configuration you can set parameters of the connection to Simond.
3.7.5.1.1 General
The Simon main application connects to the Simond server (see the architecture
section for more
information).
42The Simon Handbook
To identify individual users of the system (one Simond server can of course serve
multiple Simon
clients), Simon and Simond use users. Every user has his own speech model. The
username /
password combination given here is used to log in to Simond. If Simond does not
know the
username or the password is incorrect, the connection will fail. See the Simond
manual on how
to setup users for Simond.
The recognition itself - which is done by the server - might not be available at
all times. For
example it would not be possible to start the recognition as long as the user does
not have a
compiled acoustic and language model which has to be created first (during
synchronization
when all the ingredients - vocabulary, grammar, training - are present). Using the
option to start
the recognition automatically once it is available, Simon will request to start the
recognition when
it receives the information that it is ready (all required components are
available).
Using the Connect to server on startup option, Simon will automatically start the
connection to
the configured Simond servers after it has finished loading the user interface.
3.7.5.1.2 Network
Simon connects to Simond using TCP/IP.
43The Simon Handbook
As of now (Simon 0.4), encryption is not yet supported.
The timeout setting specifies, how long Simon will wait for a first reply when
contacting the
hosts. If you are on a very, very slow network and/or use ‘connect on start’ on a
very slow
machine, you may want to increase this value if you keep getting timeout errors and
can resolve
them by trying again repeatedly.
Simon supports to be configured to use more than one Simond. This is very useful if
you for
example are going to use Simon on a laptop which connects to a different server
depending
where you are. You could for example add the server you use when you are home and
the server
used when you are at work. When connecting, Simon will try to connect to each of
the servers
(in order) until it finds one server that accepts the connection.
To add a server, just enter the host name or IP address and the port (separated by
‘:’) or use the
dialog that appears when you select the blue arrow next to the input field.
3.7.5.2 Synchronization and Model Backup
Here you can configure the model synchronization and restore older versions of your
speech
model.
44The Simon Handbook
Simon creates the speech input files which are then compiled and used by the Simond
server (see
the section architecture for more details).
The process of sending the speech input files, compiling them and receiving the
compiled ver-
sions is called ‘synchronization’. Only after the speech model is synchronized the
changes take
effect and a new restore point is set. This is why per default Simon will always
synchronize the
model with the server when it changes. This is called Automatic Synchronization and
is the
recommended setting.
However, if you want more control you can instruct Simon to ask you before starting
the synchro-
nization after the model has changed or to rely on manual synchronization all
together. When se-
lecting the manual synchronization you have to manually use the Actions →
Synchronize menu
item of the Simon main window every time you want to compile the speech model.
The Simon server will maintain a copy of the last five iterations of model files.
However, this
only includes the ‘source files’ (the vocabulary, grammar, etc.) - not the compiled
model. However,
the compiled model will be regenerated from the restored source files
automatically.
After you have connected to the server, you can select one of the available models
and restore it
by clicking on Choose Model.
3.7.6 Actions
In the actions configuration you can configure the reactions to recognition
results.
3.7.6.1 Recognition
The recognition of Simon computes not only the most likely result but rather the
top ten results.
Each of the results are assigned a confidence score between 0 and 1 (where 1 is
100% sure).
Using the Minimum confidence you can set a minimum confidence for recognition
results to be
considered valid.
If more than one recognition results are rated higher than the minimum confidence
score, Simon
will provide a popup listing the most likely options for you to choose from.
45The Simon Handbook
This popup can be disabled using the Display selection popup for ambiguous results
check box.
3.7.6.2 Dialog font
Many plugins of Simon have a graphical user interface.
The fonts of these interfaces can be configured centrally and independent of the
systems font
settings here.
3.7.6.3 Lists
Here you can find the global list element configuration. This serves as a template
for new scenar-
ios but is also directly used for the popup for ambiguous recognition results.
3.7.7 Text-to-speech
Some parts of Simon, most notably the dialog command plugin employ text-to-speech
(or ´´TTS´´)
to read text aloud.
46The Simon Handbook
3.7.7.1 Backends
Multiple external TTS solutions can be used to allow Simon to talk. Multiple
backends can be
enabled at the same time and will be queried in the configured order until one is
found that can
synthesize the requested message.
The following backends are available:
• Recordings
Instead of an engine to convert arbitrary text into speech, text-snippets can be
pre-recorded
and will be simply played back.
• Jovie
Uses the Jovie TTS system. This requires a valid Jovie set-up.
• Webservice
The webservice backend can be used to talk to any TTS engine that has a web front-
end that
returns .wav files.
3.7.7.2 Recordings
Instead of using an external TTS engine, you can also record yourself or other
speakers reading
the texts aloud. Simon can then play back these pre-recorded snippets when they are
requested
of its text-to-speech engine.
These recorded sound bites are organized into ´´sets´´ of different speakers which
can also be
imported and exported to share them with other Simon users.
47The Simon Handbook
3.7.7.3 Webservice
Through the webservice backend, Simon can use web-based TTS engines like MARY.
You can provide any URL. Simon will replace any instance of ´´%1´´ within the
configured URL
with the text to synthesize. The backend expects the queried webservice to return a
.wav file
that will be streamed and outputted through Simon’s sound layer - respecting the
sound device
configuration.
48The Simon Handbook
3.7.8 Social desktop
Scenarios can be uploaded and downloaded from within Simon.
For this we use KDEs social desktop facilities and our own category for Simon
scenarios on KDE
Store.
If you already have an account on opendesktop.org you can input the credentials
there. If you
don’t, you can register directly in the configuration module.
The registration is of course free of charge.
3.7.9 Webcam configuration
In Webcam configuration, you can configure frames per second (fps) and select the
webcam to
use when multiple webcams are connected to your system.
Frames per second is the rate at which the webcam will produce unique consecutive
images
called frames. The optimal value of fps is between 5-15 for proper performance.
3.7.10 Advanced: Adjusting the recognition parameters manually
Simon is targeted towards end-users. Its interface is designed to allow even users
without any
background in speech technology to design their own language and acoustic models by
provid-
ing reasonable default values for simple uses.
In special cases (severe speech impairments for example), special configuration
might be needed.
This is why the raw configuration files for the recognition are also respected by
Simon and can of
course be modified to suit your needs.
3.7.10.1 Julius
There are basically two parts of the Julius configuration that can be adjusted:
49The Simon Handbook
• adin.jconf
This is the configuration of the Simon client of the sound stream sent from Simon
to the Si-
mond. This file is directly read by the adinstreamer.
Simon ships with a default adin.jconf without any special parameters. You can
change this
system wide configuration which will affect all users if there are different user
accounts on
your machine who all use Simon. To just change the configuration of one of those
users copy
the file to the user path (see below) and edit this copy.
• julius.jconf
This is a configuration of the Simond server and directly influences the
recognition. This file is
parsed by libjulius and libsent directly.
Simond ships with a default julius.jconf. Whenever there is a new user added to the
Simond
database, Simond will automatically copy this system wide configuration to the new
user.
After that the user is of course free to change it but it won’t affect the other
users. This way the
‘template’ (the system wide configuration) can be changed without affecting other
users.
The path to the Julius configuration files will depend on your platform:
File Microsoft Windows GNU/Linux
adin.jconf (system) (installation path)\share\a-
pps\simon\adin.jconf
‘kde4-config --prefix‘/shar-
e/apps/simon/adin.jconf
adin.jconf (user) %appdata%\.kde\share\a-
pps\simon\adin.jconf
~/.kde/share/apps/simon-
/adin.jconf
julius.jconf (template) (installation path)\share\a-
pps\simond\default.jconf
‘kde4-config
--prefix‘/share/apps/simo-
nd/default.jconf
julius.jconf (user)
%appdata%\.kde\share\a-
pps\simond\models\(use-
r)\active\julius.jconf
~/.kde/share/apps/simon-
d/models/(user)/active/j-
ulius.jconf
Table 3.1: Julius Configuration Files
50The Simon Handbook
Chapter 4
Advanced: Creating new scenarios
with Simon
The following chapter is aimed towards more experienced users who want to design
their own
scenarios.
For general usage instruction, please refer to the chapter Using Simon: Typical
user.
4.1 Introduction
To add a new scenario, you first create a new scenario ´´shell´´ by adding a new
scenario object
and then open it in the Simon main window.
To instead modify an existing scenario, you of course just have to open it.
A Simon scenario contains the following components:
• Vocabulary
• Grammar
• Training texts
• Context
• Commands
Before describing how to configure these elements in Simon, the next section
provides back-
ground information that will help you understand the basic principles of speech
modelling. This
fundamental knowledge is necessary to design sensible scenarios.
4.2 Speech recognition: background
NOTE
Before explaining exactly how you can create new scenarios with Simon, this section
introduces some
fundamental basics to speech recognition in general.
51The Simon Handbook
Speech recognition systems take voice input (often from a microphone) and try to
translate it
into written text. To do that, they rely on statistical representations of human
voice. To put it into
simple terms: The computer learns how words - or more correctly the sounds that
make up those
words - sound.
A speech model consists of two distinct parts:
• Language Model
• Acoustic Model
4.2.1 Language Model
The language model defines the vocabulary and the grammar you want to use.
4.2.1.1 Vocabulary
The vocabulary defines what words the recognition process should recognize. Every
word you
want to be able to use with Simon should be contained in your vocabulary.
One entry in the vocabulary defines exactly one ‘word’. In contrast to the common
use of the
word ‘word’, in Simon ‘word’ means one unique combination of the following:
• Wordname
(The written word itself)
• Category
(Grammatical category; for example: ‘Noun’, ‘Verb’, etc.)
• Pronunciation
(How the word is pronounced; Simon accepts any kind of phonetic as long as it does
not use
special characters or numbers)
That means that plurals or even different cases are different ‘words’ to Simon.
This is an impor-
tant design decision to allow more control when using a sophisticated grammar.
In general, it is advisable to keep your vocabulary as sleek as possible. The more
words, the
higher the chance that Simon might misunderstand you.
Example vocabulary (please note that the categories here are deliberately set to
Noun / Verb to
help the understanding; please refer to the grammar section why this might not be
the best idea):
Word Category Pronunciation
Computer Noun k ax m p y uw t er
Internet Noun ih n t er n eh t
Mail Noun m ey l
close Verb k l ow s
Table 4.1: Sample Vocabulary
4.2.1.1.1 Active Dictionary
The vocabulary used for the recognition is referred to as active dictionary or
active vocabulary.
52The Simon Handbook
4.2.1.1.2 Shadow Dictionary
As said above, the user should keep his vocabulary / dictionary as lean as
possible. However,
as a word in your vocabulary has to also have information about its pronunciation,
it would
also be good to have a large dictionary where you could look up the pronunciation
and other
characteristics of the words.
Simon provides this functionality. We refer to this large reference dictionary as
‘shadow dic-
tionary’. This shadow dictionary is not created by the user but can be imported
from various
sources.
As Simon is a multi-language solution we do not ship shadow dictionaries with
Simon. However,
it is very easy to import them yourself using the import dictionary wizard. This is
described in
the Import Dictionary section.
4.2.1.1.3 Language profile
Additionally to a shadow dictionary, Simon can use a language profile to provide
help with
transcribing words.
A language profile consists of rules how words are pronounced in the target
language. It can be
likened to the way that humans can often pronounce a word they never heard just
because they
know some implicit ´´pronunciation rules´´ of the language.
Just as with humans, this process is not perfect but can provide a solid starting
ground.
This automatic deduction of a phoneme transcription from a written word is called
´´grapheme
to phoneme conversion´´.
Simon requires the Sequitur G2P grapheme to phoneme converter to be installed and
set up for
language profiles to work.
If you have selected a pre-built language profile or built your own, Simon will
automatically
transcribe new words with it when they are not found in your shadow dictionary.
4.2.1.2 Grammar
The grammar defines which combinations of words are correct.
Let’s look at an example: You want to use Simon to launch programs and close those
windows
when you are done. You would like to use the following commands:
• ‘Computer, Internet’ to open a browser
• ‘Computer, Mail’
To open a mail client
• ‘Computer, close’
To close the current window
Following English grammar, your vocabulary would contain the following:
Word Category
Computer Noun
Internet Noun
Mail Noun
close Verb
Table 4.2: Sample Vocabulary
53The Simon Handbook
To allow the sentences defined above Simon would need the following grammar:
• ‘Noun Noun’ for sentences like ‘Computer Internet’
• ‘Noun Verb’ for sentences like ‘Computer close’
While this would work, it would also allow the combinations ‘Computer Computer’,
‘Internet
Computer’, ‘Internet Internet’, etc. which are obviously bogus. To improve the
recognition accu-
racy, we can try to create a grammar that better reflects what we are trying to do
with Simon.
It is important to remember that you define your own ‘language’ when using Simon.
That means
that you are not bound to grammar rules that exist in whatever language you want to
use Simon
with. For a simple command and control use-case it would for example be advisable
to invent
new grammatical rules to eliminate the differences between different commands
imposed by
grammatical information not relevant for this use case.
In the example above it is for example not relevant that ‘close’ is a verb or that
‘Computer’ and
‘Internet’ are nouns. Instead, why not define them as something that better
reflects what we want
them to be:
Word Category
Computer Trigger
Internet Command
Mail Command
close Command
Table 4.3: Improved Sample Vocabulary
Now we change the grammar to the following:
• ‘Trigger Command’
This allows all the combinations described above. However, it also limits the
possibilities to
exactly those three sentences. Especially in larger models a well-thought-out
grammar and vo-
cabulary can mean a huge difference in recognition results.
4.2.2 Acoustic Model
The acoustic model represents your pronunciation in a machine readable format.
Let’s look at the following sample vocabulary:
Word Category Pronunciation
Computer Noun k ax m p y uw t er
Internet Noun ih n t er n eh t
Mail Noun m ey l
close Verb k l ow s
Table 4.4: Sample Vocabulary
54The Simon Handbook
The pronunciation of each word is composed of individual sounds which are separated
by spaces.
For example, the word ‘close’ consists of the following sounds:
• k
• l
• ow
• s
The acoustic model uses the fact that spoken words are composed of sounds much like
written
words are composed of letters. Using this knowledge, we can segment words into
sounds (repre-
sented by the pronunciation) and assemble them back when recognizing. These
building blocks
are called ‘phonemes’.
Because the acoustic model actually represents how you speak the phonemes of the
words, train-
ing material is shared among all words that use the same phonemes.
That means if you add the word ‘clothes’ to the language model, your acoustic model
already
has an idea how the ‘clo’ part is going to sound as they share the same phonemes
(‘k’, ‘l’, ‘ow’) at
the beginning.
To train the acoustic model (in other words to tell him how you pronounce the
phonemes) you
have to ‘train’ words from your language model. That means that Simon displays a
word which
you read out loud. Because the word is listed in your vocabulary, Simon already
knows what
phonemes it contains and can thus ‘learn’ from your pronunciation of the word.
4.3 Scenarios
This section extends the previous one about basic scenario management and tells you
how to
create, edit and export scenarios.
55The Simon Handbook
4.3.1 Scenario hierarchies
You can create scenario hierarchies by dragging and dropping active scenarios on
top of each
other.
Scenario hierarchies serve two purposes:
• The context system respects scenario hierarchies: If the parent scenario gets
deactivated, all
child scenarios will become deactivated as well.
• If you attempt to export a scenario that has children, Simon will allow you to
export them in a
joint scenario package. This way, you can share multiple logically co-dependent
scenarios (e.g.
one ´´Office´´ scenario that contains sub-scenarios for ´´Word´´, ´´Excel´´, etc.).
4.3.2 Adding a new Scenario
To add a new scenario, select the Add button. A new dialog will be displayed.
56The Simon Handbook
When creating a new scenario, please give it a descriptive name. For the later
upload on KDE
Store we would kindly ask you to follow a certain naming scheme although this is of
course
not a requirement: ‘[<language>/<base model>] <name>’. If, for example you create a
scenario
in English that works with the Voxforge base model and controls Mozilla Firefox
this becomes:
‘[EN/VF] Firefox’. If your scenario is not specifically tailored to one phoneme set
(base model),
just omit the second tag like this: ‘[EN] Firefox’.
The scenario version is just an incremental version number that makes it easier to
distinguish
between different revisions of a scenario.
If your scenario needs a specific feature of Simon (for example because you use a
new plugin),
you can define minimum and maximum version numbers of Simon here.
The license of your scenario can be set through the drop down. You can of course
also add an
arbitrary license text directly in the input field.
You can then add your name (or alias) to the list of scenario authors. There you
will also be
asked for contact information. This field is purely provided as a convenient way to
contact a
scenario author for changes, problems, fanmail etc. If you don’t feel comfortable
providing your
email address you can simply enter a dash ‘-’ denoting that you are not willing to
divulge this
information.
4.3.3 Edit Scenario
To edit scenarios, just select Edit from the Manage scenarios dialog.
The dialog works exactly the same as the add scenario dialog.
4.3.4 Export Scenario
Scenarios can be exported to a local file in Simon’s XML scenario file format and
directly up-
loaded to the Simon Scenarios subsection of the OpenDesktop site KDE Store.
To upload to OpenDesktop sites, you need an account on the site. Registration is
very easy and
of course free of charge.
57The Simon Handbook
Simon allows you to upload new content directly from within Simon (Export >
Publish).
58The Simon Handbook
To use this functionality, simply enter your account credentials in the social
desktop configuration
in the Simon configuration.
4.4 Vocabulary
The vocabulary module defines the set of words of the scenario.
59The Simon Handbook
Per default, the active vocabulary is shown. To display the shadow vocabulary
select the tab
Shadow Vocabulary.
Every word states it ‘recognition rate’ which at the moment is just a counter of
how often the
word has been recorded (alone or together with other words).
4.4.1 Adding Words
To add new words to the active vocabulary, use the add word wizard.
Adding words to Simon is basically a two step procedure:
60The Simon Handbook
• Defining the word
• Initial training
4.4.1.1 Defining the Word
Firstly, the user is asked which word he wants to add.
When the user proceeds to the next page, Simon automatically tries to find as much
information
about the word in the shadow dictionary as possible.
If the word is listed in the shadow dictionary, Simon automatically fills out all
the needed fields
(Category and Pronunciation).
61The Simon Handbook
All suggestions from the shadow dictionary are listed in the table Similar words.
Per default only
exact word matches are shown. However, this can be changed by checking the Include
similar
words check box below the suggestion table. Using similar words you can quickly
deduce the
correct pronunciation of the word you are actually trying to add. See below for
details.
Of course this really depends on your shadow dictionary. If the shadow dictionary
does not
contain the word you are trying to add, the required fields have to be filled out
manually.
Some dictionaries that can be imported with Simon (SPHINX, HTK) do not
differentiate between
upper and lower case. Suggestions based on those dictionaries will always be
uppercase. You
are of course free to change these suggestions to the correct case.
Some dictionaries that can be imported with Simon (SPHINX, PLS and HTK) provide no
gram-
matical information at all. These will assign all the words to the category
Unknown. You should
change this to something appropriate when adding those words.
4.4.1.1.1 Manually Selecting a Category
The category of the word is defined as the grammatical category the word belongs
to. This
might be Noun, Verb or completely new categories like Command. For more information
see
the grammar section.
The list contains all categories used in both your active and your shadow lexicon
and in your
grammar.
You can add new categories to the drop-down menu by using the green plus sign next
to it.
4.4.1.1.2 Manually Providing the Phonetic Transcription
The pronunciation is a bit trickier. Simon does not need a certain type of
phonetics so you are
free to use any method as long as it uses only ASCII characters and no numbers.
However, if you
want to use a shadow dictionary and want to use it to its full potential you should
use the same
phonetics as the shadow dictionary.
If you do not know how to transcribe a word yourself you can easily use your shadow
dictionary
to help you with the transcription - even if the word is not listed in it. Let’s
say we want to add
the word ‘Firefox’ (to launch firefox) which is of course not listed in our shadow
dictionary.
62The Simon Handbook
(I imported the English voxforge HTK lexicon available from voxforge as a shadow
dictionary.)
‘Firefox’ is not listed in our shadow dictionary so we do not get any suggestion at
all.
However, we know that firefox sounds like ‘fire’ and ‘fox’ put together. So let’s
just open the vo-
cabulary (you can keep the wizard open) by selecting Vocabulary from your Simon
main toolbar.
Switch to the shadow vocabulary by clicking on the tab Shadow Vocabulary.
Use the Filter box above the list to search for ‘Fire’:
We can see, that the word ‘Fire’ is transcribed as ‘f ay r’. Now filter for ‘fox’
instead of ‘Fire’ and
we can see that ‘Fox’ is transcribed as ‘f ao k s’. We can assume, that firefox
should be transcribed
as ‘f ay r f ao k s’.
63The Simon Handbook
Using this approach of deducing the pronunciation from parts of the word has the
distinct ad-
vantage that we not only get a high quality transcription but also automatically
use the same
phoneme set as the other words which were correctly pulled out of the shadow
dictionary.
We can now enter the pronunciation and change the category to something
appropriate.
4.4.1.2 Training the Word
To complete the wizard we can now train the word twice. If you don’t want to do
this or for
example use a static base model, you can skip these two pages.
Because you are about to record some training samples, Simon will display the
volume calibra-
tion to make sure that your microphone is set up correctly. For more information
please refer to
the volume calibration section
Simon will try to prompt you for real-world examples. To do that, Simon will
automatically fetch
grammar structures using the category of the word and substitute the generic
categories with
example words from your active lexicon.
For example: You have the grammar structure ‘Trigger Command’ and have the word
‘Com-
puter’ of the category ‘Trigger’ in your vocabulary. You then add a new word
‘Firefox’ of the
category ‘Command’. Simon will now automatically prompt you for ‘Computer Firefox’
as it is -
according to your grammar - a valid sentence.
If Simon is unable to find appropriate sentences using the word (i.e.: No grammar,
not enough
words in your active lexicon, etc.) it will just prompt you for the word alone.
Although Simon ensures that the automatically generated examples are valid, you can
always
override its suggestions. Just switch to the Examples tab on the Define Word page.
64The Simon Handbook
You are free to change those examples to anything you like. You can even go so far
and use words
that are not yet in your active lexicon as long as you add them before you
synchronize the model,
although this is not recommended.
All that is left is to record the examples.
Make sure you follow the guidelines listed in the recording section.
4.4.2 Editing a word
To edit a word, simply select it from the vocabulary, and click on Edit.
65The Simon Handbook
There you can change name, category and pronunciation of the selected word.
4.4.3 Removing a word
To remove a word from your language model, select it in the vocabulary view and
click on Re-
move.
The dialog offers four choices:
• Move the word to the Unused category.
Because you (hopefully) don’t use the category Unused in your grammar, the word
will no
longer be considered for recognition. In fact, it will be removed from the active
vocabulary
before compiling the model because no grammar sentence references it.
If you want to use the category Unused in your grammar, you can of course use a
different
category for unused words. Just set the category through the Edit word dialog.
To use the word again, just set the right category again. No data will be lost.
66The Simon Handbook
• Move the word to the shadow lexicon
This will remove the selected word from the active lexicon (and thus from the
recognition) but
will keep a copy in the shadow vocabulary. All the recordings containing the word
will be
preserved.
To use the word again, add it again to the active vocabulary. When adding a ‘new’
word with
the same name the values of the moved word will be suggested to you. Therefore, no
data will
be lost.
• Delete the word but keep the samples
Removes the word completely but keeps the associated samples. Whenever you add
another
word with the same word name the samples will be re-associated.
Be careful with this option as the new word you add again might be transcribed
differently
and this difference cannot be taken into account automatically (Simon will then try
to force the
new transcription on the old recordings during the model compilation).
Do not use this option if the samples you recorded for this word were erroneous.
• Remove the word completely
Just remove the word. All the recordings containing the word will be removed too.
This option leaves no trace of neither the word itself nor the associated samples.
Because samples are global (not assigned to scenarios), even samples recorded from
training
sessions of other scenarios might be removed as well if they contain the word. Use
this option
carefully.
4.4.4 Special Training
Please see the special training section in the training section.
4.4.5 Importing a Dictionary
Simon provides the functionality to import large dictionaries as a reference. This
reference dic-
tionary is called shadow dictionary.
When the user adds a new word to the model, he has to define the following
characteristics to
define this word:
• Wordname
• Category
• Phonetic definition
These characteristics are taken out of the shadow dictionary if it contains the
word in question.
A large, high quality shadow dictionary can thus help the user to easily add new
words to the
model without keeping track of the phoneme set or - in many cases - even let him
forget that the
phonetic transcription is needed at all.
67The Simon Handbook
Since version 0.3 you can also import dictionaries directly to the active
dictionary. This option is
mostly there to make it easier to move to Simon from custom solutions and to
encourage import-
ing of older models (for example one used with Simon 0.2). You will almost never
want to import
a very large dictionary as active dictionary.
You can find a list of available dictionaries that work with Simon on the Simon
wiki.
Simon is able to import five different types of dictionaries:
• HADIFIX
• HTK
• PLS
• SPHINX
• Julius
4.4.5.1 HADIFIX Dictionary
Simon can import HADIFIX dictionaries.
One example of a HADIFIX dictionary is the German HADIFIX BOMP.
Hadifix dictionaries provide both categories and pronunciation.
Due to a special exemption in their license the Simon listens team is proud to be
able to offer you
to download the excellent HADIFIX BOMP directly from within Simon.
68The Simon Handbook
Using the automatic bomp import you can, after providing name and email address for
the team
of the University Bonn, directly download and import the dictionary from the Simon
listens
server.
4.4.5.2 HTK Dictionary
Simon can import HTK lexica.
One example of a HTK lexicon is the English Voxforge dictionary.
Hadifix dictionaries provide pronunciation information but no categories. All words
will be
assigned to the category Unknown.
4.4.5.3 PLS Dictionary
Simon can import PLS dictionaries.
One example of a PLS dictionary is the German GPL dictionary from Voxforge.
PLS dictionaries provide pronunciation information but no categories. All words
will be assigned
to the category Unknown.
4.4.5.4 SPHINX Dictionary
Simon can import SPHINX dictionaries.
One example of a SPHINX dictionary is this dictionary for Mexican Spanish.
SPHINX dictionaries provide pronunciation information but no categories. All words
will be
assigned to the category Unknown.
69The Simon Handbook
4.4.5.5 Julius Dictionary
Simon can import Julius vocabularies.
One example of a Julius vocabularies are the word lists of Simon 0.2.
Julius dictionaries provide pronunciation information as well as category
information.
4.4.6 Create language profile
Here, you can build a language profile from your shadow dictionary.
After selecting Create profile, Simon will analyze your current shadow dictionary
and try to
deduce the transcription rules from it.
This is generally a very length process and can, depending on the size of your
shadow dictionary,
take up to several hours.
The created profile will be selected automatically after the process completes.
4.5 Grammar
Simon provides an easy to use text based interface to change the grammar. You can
simply list
all the allowed sentences (without any punctuation marks, obviously) like described
above.
70The Simon Handbook
When selecting a sentence on the left, the right pane will automatically show
possible real sen-
tences with the words of your vocabulary on the right.
The example section will list at most 35 examples so if more than that amount of
sentences match
the selected grammar entry, the list might not be complete.
4.5.1 Import a Grammar
Additionally to simply entering your desired grammar sentence by sentence, Simon is
able to au-
tomatically deduce allowed grammar structures by reading plain text using the
Import Grammar
wizard.
71The Simon Handbook
Simon can read and import text files but also provides an input field if you want
to simply type
the text into Simon.
Say we have a vocabulary like in the general section above:
Word Category
Computer Trigger
Internet Command
Mail Command
close Command
Table 4.5: Improved Sample Vocabulary
We want Simon to recognize the sentence ‘Computer Internet!’. So we either enter
the text using
the Import text option or create a simple text file with this content ‘Computer
Internet!’ (any
punctuation mark would work) and save it as simongrammar.txt to use the Import
files option.
72The Simon Handbook
Simon will then read the entered text or all the given text files (in this case the
only given text file
is simongrammar.txt) and look up every single word in both active and shadow
dictionary (the
definition in the active dictionary has more importance if the word is available in
both). It will
then replace the word with its category.
In our example this would mean that he would find the sentence ‘Computer Internet’.
Simon
would find out that ‘Computer’ is of the category Trigger and ‘Internet’ of the
category Com-
mand. Because of this Simon would ‘learn’ that ‘Trigger Command’ is a valid
sentence and add
it to its grammar.
The import automatically segments the input text by punctuation marks (‘.’, ‘-’,
‘!’, etc.) so any
natural text should work. The importer will automatically merge duplicate sentence
structures
(even across different files) and add multiple sentence (all possible combinations)
when a word
has multiple categories assigned to it.
The import will ignore sentences where one or more words could not be found in the
language
model unless you tick the Also import unknown sentences check box in which case
those words
are replaced with Unknown.
73The Simon Handbook
4.5.2 Renaming Categories
The rename category wizard allows you to rename categories in both your active
vocabulary,
your shadow dictionary and the grammar.
4.5.3 Merging Categories
The merge category wizard allows you to merge two categories into one new category
in both
your active vocabulary, your shadow dictionary and the grammar.
74The Simon Handbook
This functionality is especially useful if you want to simplify your grammar
structures.
4.6 Training
Using the Training-module, you can improve your acoustic model.
The interface lists all installed training texts in a table with three columns:
• Name
A descriptive name for the text.
• Pages
The number of ‘pages’ the text consists of. Each page represents one recording.
• Recognition Rate
Analogue to the vocabulary; represents how likely Simon will recognize the words
(higher is
better). The recognition rate of the training text is the average recognition rate
of all the words
in the text.
To improve the acoustic model - and thus the recognition rate - you have to record
training texts.
This means that Simon gets essentially two needed parts:
• Samples of your speech
• Transcriptions of those samples
The active dictionary is used to transcribe the words (mapping them from the actual
word to its
phonetic transcription) that make up the text so every word contained in the
training text you
want to read (train) has to be contained in your active dictionary. Simon will warn
you if this is
not the case and provide you with the possibility to add all the missing words in
one go.
75The Simon Handbook
The procedure is the same as if you would add a single word but the wizard will
prompt you for
details and recordings for all the missing words automatically. This procedure can
be aborted at
any time and Simon will provide both a way to add the already completely defined
words and
to undo all changes done so far. When the user has added all the words he is
prompted for (all
the words missing) the changes to the active dictionary / vocabulary are saved and
the training
of the previously selected text starts automatically.
The training (reading) of the training text works exactly the same as the initial
training when
adding a new word.
Make sure you follow the guidelines listed in the recording section.
4.6.1 Storage Directories
Training texts are stored in two different locations:
• Linux: ~/.kde/share/apps/simon/texts
Windows: %appdata%\.kde\share\apps\simon\texts
The texts of the current user. Can be deleted and added with Simon (see below).
• Linux: ‘kde4-config --prefix‘/share/apps/simon/texts
Windows: (install folder)\share\apps\simon\texts
76The Simon Handbook
System-wide texts. They will appear on every user account using Simon on this
machine
and cannot be deleted from within Simon because of the obvious permission
restrictions on
system-wide files.
This folder can be used by system administrators to provide a common set of
training texts for
all the users on one system.
The XML files (one for each text) can just be moved from one location to the other
but this will
most likely require admin privileges.
4.6.2 Adding Texts
The add texts wizard provides a simple way to add new training texts to Simon.
When importing text files, Simon will automatically try to recognize individual
sentences and
split the text into appropriate ‘pages’ (recordings). The algorithm treats text
between ‘normal’
punctuation (‘.’, ‘!’, ‘?’, ‘...’, ‘´´’,...) and line breaks as ‘sentences’. Each
‘sentence’ will be on its
own page.
Simon supports two different sources for new training texts.
77The Simon Handbook
4.6.2.1 Add training texts
Simply enter the training text in an input field.
4.6.2.2 Local text files
Simon can import normal text files to use them as training texts.
78The Simon Handbook
4.6.3 On-The-Fly Training
In addition to training texts, Simon also allows to train individual words or word
combinations
from your dictionary on-the-fly.
This feature is located in the vocabulary menu of Simon.
Select the words to train from the vocabulary on the left and simply drag them to
the selection
list to the right (you could also select them in the table on the left and add them
by clicking Add
to Training).
Start the training by selecting Train selected words. The training itself is
exactly the same as if it
were a pre-composed training text.
79The Simon Handbook
If there are more than 9 words to train Simon will automatically split the text
evenly across mul-
tiple pages.
Of course you are free to add words from the shadow lexicon to the list of words to
train but
Simon will prompt you to add the words before the training starts just like he
would if you
would train a text that contains unknown words (see above).
4.7 Context
Simon includes a context layer that allows you to let Simon automatically adjust
its configuration
depending on its context.
For example, you could set up Simon to only allow commands like ´´New tab´´ if
Mozilla Firefox
is running and the currently active window.
There are three major areas that contextual information can influence:
• Scenario selection
• Sample groups
• Active microphones
4.7.1 Scenario selection
Scenarios can specify to only be active during certain contextual situations. If
these situations are
not met, Simon will temporarily deactivate the affected scenario.
80The Simon Handbook
The local context conditions of this scenario are shown in the list of Activation
Requirements
and can be added, edited and deleted through the respective buttons.
The context conditions respect a possible hierarchy of scenarios: The activation
requirements
of all direct or indirect parent scenarios also apply to the child scenario(s).
This condition
´´inheritance´´ is shown on the right side.
The Simon main window also shows a list of currently used scenarios. Scenarios that
are deac-
tivated because of their activation requirements (context conditions) are listed in
light gray and
italic. The screenshot below, for example, shows a temporarily deactivated Amarok
scenario.
The same visual hints (gray, italic font for unmet activation criteria) also apply
to the individual
context conditions in the context menu.
81The Simon Handbook
4.7.2 Sample groups
Every sample recorded with Simon is assigned a sample group. Sample groups can be
configured
to only be used for the building of the acoustic models if certain contextual
conditions are met.
If this is not the case, all samples tagged with the deactivated sample group will
be temporarily
removed from the training corpus.
For more information, an example use-case and instructions on how to work with
sample groups,
please refer to the section on sample groups.
4.7.3 Context conditions
In Simon, context is monitored through a set of context condition plugins.
In general, context conditions are combined through an ´´and´´ association. For
example, if the
activation of resource is bound by two conditions A and B, it will only be
activated if both A and
B see their conditions met. To instead model alternatives (´´A or B or both´´), use
an Or Condition
Association.
All conditions can optionally be inverted. Inverting a condition means that it will
evaluate to
true if it would otherwise evaluate to false and vice versa.
4.7.3.1 Active window
True, if the title of the currently active foreground window matches the provided
window title.
4.7.3.2 D-Bus
The D-Bus condition plugin allows to monitor 3rd party applications that export
state informa-
tion on D-Bus.
The monitored application needs to provide two methods: One signal to notify of
changes and
another method that returns the current state.
82The Simon Handbook
The screenshot above, for example, configures a D-Bus condition that will evaluate
to true while
the music player ´´Tomahawk´´ is playing and to false otherwise.
4.7.3.3 Face detection
The face detection condition will evaluate to true, if Simon’s vision layer has
identified a person
sitting in front of the configured webcam.
83The Simon Handbook
4.7.3.4 File content
This condition plugin will return true, if the given file contains the provided
content.
The file will be monitored for changes.
4.7.3.5 Lip detection
The lip detection condition will evaluate to true, if Simon’s vision layer has
identified a person
sitting in front of the configured webcam and is speaking something (lip
movements).
84The Simon Handbook
The lip detection training will try to determine the optimal value of sensitivity
of the detection
by monitoring your lip movements. For better accuracy of lip detection condition,
stop training
when the sensitivity value on the slider during training becomes almost constant.
4.7.3.6 Or condition association
The or condition association allows you to configure a meta-condition that reports
to be satisfied
as soon as one or more of its child conditions evaluates to true.
Or condition associations can have an arbitrary number of child conditions that may
even also
be or condition associations.
4.7.3.7 Process opened
Is satisfied if there is a running process with the provided executable name.
85The Simon Handbook
4.8 Commands
When Simon is active and recognizes something, the recognition result is given to
the loaded
command plug-ins (in order) for processing.
The command system can be compared with a group of factory workers. Each one of
them knows
how to perform one task (e.g. ‘Karl’ knows how to start a program and ‘Joe’ knows
how to open
a folder, etc.). Whenever Simon recognizes something it is given to ‘Karl’ who then
checks if this
instruction is meant for him. If he doesn’t know what to do with it, it is handed
over to ‘Joe’ and
86The Simon Handbook
so on. If none of the loaded plugins know how to process the input it is ignored.
The order in
which the recognition result is given to the individual commands (people) is
configurable in the
command options (Commands > Manage plugins).
Each plugin can be associated with a ‘trigger’. Using triggers, the responsibility
of each plugin
can be easily be divided.
Using the factory workers abstraction from above it could be compared to stating
the name of
who you mean to process your request. So instead of ‘Open my home folder’ you say
‘Joe, open
my home folder’ and ‘Joe’ (the plugin responsible for opening folders) will
instantly know that
the request is meant for him.
In practice you could have commands like the executable command ‘Firefox’ to open
the popular
browser and the place command ‘Google’ to open the web search engine. If you assign
the trigger
‘Start’ to the executable plugin and the trigger ‘Open’ to the place command you
would have to
say ‘Start Firefox’ (instead of just ‘Firefox’ if you don’t use a trigger for the
executable plugin)
and ‘Open Google’ to open the search engine (instead of just ‘Google’).
Triggers are of course no requirement and you can easily use Simon without defining
any plugin
triggers (although many plugins come with a default trigger of ‘Computer’ set which
you would
have to remove). But even if you use just one trigger for all your commands (like
‘Computer’ to
say ‘Computer, Firefox’ and ‘Computer, Google’ like) it has the advantage of
greatly limiting the
number of false-positives.
Simon’s command dialog displays the complete phrase associated with a command in
the upper
right corner of the command configuration.
You can load multiple instances of one plugin even in one scenario. Each instance
can of course
also have a different plugin trigger.
Each Command has a name (which will trigger its invocation), an icon and more
fields depending
on the type of the plugin (see below).
Some command plugins might provide a configuration of the plugin itself (not the
commands it
contains). These configuration pages will be plugged directly into the action
configuration dialog
(below the General menu item) when you load the associated plugin.
Plugins that provide a graphical user interface (like for example the input number
command
plugin) can be configured by configuring Voice commands. You can, for example,
change the as-
87The Simon Handbook
sociated word that will trigger the button, but also change the displayed icon,
etc. If you remove
all voice interface commands from a graphical element, the element will be hidden
automatically.
Voice interface commands are added just like normal commands through the command
configu-
ration.
To add a new interface command to a function, just select the action you want to
associate with a
command, click Create from Action template and adapt the resulting command to your
needs.
Some plugins (for example the desktop grid or the calculator) might also provide a
menu item in
the Actions menu.
Scenarios can optionally define one command that will immediately be run when the
scenario
is initialized. If you require more than one command to run automatically, consider
the use of a
composite command.
88The Simon Handbook
Command triggers can contain placeholders in the form of ´´%<index>´´, referring to
any one
word, or ´´%%<index>´´ describing one or more left out words. For example the
recognition
result ´´Next window´´ will be matched by the triggers ´´Next %1´´, ´´Next %%1´´
and ´´%%1´´ but
not by the triggers ´´%1´´, ´´Next window %1´´, ´´%%1 Next window´´.
4.8.1 Executable Commands
Executable commands are associated with an executable file (‘Program’) which is
started when
the command is invoked.
Arguments to the commands are supported. If either path to the executable or the
parameters
contain spaces they must be wrapped in quotes.
89The Simon Handbook
Given the executable file C:\Program Files\Mozilla Firefox\firefox.exe the local
html file C
:\test file.html the correct line for the Executable would be: ´´C:\Program
Files\Mozilla
Firefox\firefox.exe´´ ´´C:\test file.html´´.
The working folder defines where the process should be launched from. Given the
working
folder C:\folder, the command ´´C:\Program Files\Mozilla Firefox\firefox.exe´´
file.
html would cause Firefox to search for the file C:\folder\file.html.
The working folder usually does not need to be set and can be left blank most of
the time.
4.8.1.1 Importing Programs
For even easier configuration Simon provides an import dialog which allows you to
select pro-
grams directly from the KDE menu.
NOTE
This option is not available on Microsoft Windows.
The dialog will list all programs that have an entry in your KDE menu in their
respective category.
Sub-Categories are not supported and are thus listed on the same level as top-level
categories.
Just select the program you wish to start with Simon and press Ok. The correct
values for the
executable and the working folder as well as an appropriate command name and
description will
automatically be filled out for you.
4.8.2 Place Commands
With place commands you can allow Simon to open any given URL. Because Simon just
hands the
address over to the platforms URL handler, special Protocols like ‘remote:/’ (on
Linux®/KDE) or
even KDE’s ‘Web-Shortcuts’ are supported.
Instead of folders, files can also be set as the commands URL which will cause the
file to be
opened with the application which is associated with it when the command is
invoked.
90The Simon Handbook
To associate a specific URL with the command you can manually enter it in the URL
field (select
Manual first) or import it with the import place wizard.
4.8.2.1 Importing Places
The import place dialog allows you to easily create the correct URL for the
command.
To add a local folder, select Local Place and choose the folder or file with the
file selector.
To add a remote URL (HTTP, FTP, etc.) choose Remote URL.
91The Simon Handbook
Please note that for URLs with authentication information the password will be
stored in clear
text.
4.8.3 Shortcut Commands
Using shortcut commands the user can associate commands with key-combinations.
The command will simulate keyboard input to trigger shortcuts like Ctrl-C or Alt-
F4.
The plugin can press, release or press and release the configured key combination.
92The Simon Handbook
To select the shortcut you wish to simulate just toggle the shortcut button and
press the key
combination on your keyboard.
Simon will capture the shortcut and associate it with the command.
Due to technical limitations there are several shortcuts on Microsoft Windows that
cannot be
captured by Simon (this includes e.g. Ctrl-Alt-Del and Alt-F4). These special
shortcuts can be
selected from a list below the aforementioned shortcut button.
NOTE
This selection box is not visible in the screenshot above as the list is only
displayed in the Microsoft
Windows version of Simon.
4.8.4 Text-Macro Commands
Using text-macro commands, the user can associate text with a command. When the
command
is invoked, the associated text will be ‘written’ by simulating keystrokes.
4.8.5 List Commands
The list command is designed to combine multiple commands (all types of commands
are sup-
ported) into one list. The user can then select the n-th entry by saying the
associated number
(1-9).
This is very useful to limit the amount of training required and provides the
possibility to keep
the vocabulary to a minimum.
93The Simon Handbook
List commands are especially useful when using commands with difficult triggers or
commands
that can be grouped under a general theme. A typical example would be a command
‘Startmenu’
to present a list of programs to launch. That way the specific executable commands
can still retain
very descriptive names (like ‘OpenOffice.org Writer 3.1’) without the user having
to include these
words in his vocabulary and consider them in the grammar just to trigger them.
Commands of different types can of course be mixed.
4.8.5.1 List Command Display
When invoked, the command will display the list centered on the screen. The list
will automati-
cally expand to accompany its items.
94The Simon Handbook
The user can invoke the commands contained in the list by simply saying their
associated number
(In this example: ‘One’ to launch Mozilla Firefox).
While a list command is active (displayed), all input that is not directed at the
list itself (other
commands, etc.) will be rejected. The process can be canceled by pressing the
Cancel button or
by saying ‘Cancel’.
If there are more than 9 items Simon will add ‘Next’ and ‘Back’ options to the list
(‘Zero’ will be
associated with ‘Back’ and ‘Nine’ with ‘Next’).
95The Simon Handbook
4.8.5.2 Configuring list elements
By default the list command uses the following trigger words. To use list commands
to their full
potential, make sure that your language and acoustic model contains and allows for
the following
‘sentences’:
• ‘Zero’
• ‘One’
• ‘Two’
• ‘Three’
• ‘Four’
• ‘Five’
• ‘Six’
• ‘Seven’
• ‘Eight’
• ‘Nine’
• ‘Cancel’
Of course you can also configure these words in your Simon configuration:
• Commands > Manage plugins > General > Lists for the scenario wide list
configuration.
• Settings > Configure Simon... > Actions > Lists for the global configuration.
When creating
a new scenario, the scenario configuration will be initialized with a copy of this
list configura-
tion.
List commands are internally also used by other plugins like for example the
desktop grid. The
configuration of the triggers also affects their displayed lists.
4.8.6 Composite Commands
Composite commands allow the user to group multiple commands into a sequence.
When invoked the commands will be executed in order. Delays between commands can be
in-
serted.
Composite commands can also work as ´´transparent wrappers´´ by selecting Pass
recogni-
tion result through to other commands. In that case, the recognition result will be
treated as
´´unprocessed´´ even if the composite command was executed.
For example, suppose you have a command to turn on the light in one scenario.
Additionally
to turning on the light, you now want to add some kind of reporting to the activity
by invoking
a script through a program plugin. You could then set up a reporting scenario that
contains a
transparent composite command with the same trigger as the command to turn on the
light and
make sure that this scenario is set before the original one in the scenario list.
You can then activate
and deactivate the reporting simply by loading and unloading this scenario.
96The Simon Handbook
Using the composite command the user can compose complex ‘macros’. The screenshot
above -
for example - does the following:
• Start Kopete (Executable Command)
• Wait 2000ms for Kopete do be started
• Type ‘Mathias’ (Text-Macro Command) which will select Mathias in my contact list
• Press Enter (Shortcut Command)
• Wait 1000ms for the chat window to appear
• Write ‘Hi!’ (Text-Macro Command); the text associated to this command contains a
newline at
the end so that the message will be send.
• Press Alt-F4 (Shortcut Command) to close the chat window
• Press Alt-F4 (Shortcut Command) to close the kopete main window
4.8.7 Desktop grid
The desktop grid allows the user to control his mouse with his voice.
97The Simon Handbook
The desktop grid divides the screen into nine parts which are numbered from 1-9.
Saying one of
these numbers will again divide the selected field into 9 fields again numbered
from 1-9, etc. This
is repeated 3 times. After the fourth time the desktop grid will be closed and
Simon will click in
the middle of the selected area.
The exact click action is configurable but defaults to asking the user. Therefore
you will be pre-
sented with a list of possible click modes. When selecting Drag and Drop, the
desktop grid will
be displayed again to select the drop point.
While the desktop grid is active (displayed), all input that is not directed at the
desktop grid itself
(other commands, etc.) will be rejected. Say ‘Cancel’ at any time to abort the
process.
The desktop grid plugin registers a configuration screen right in the command
configuration
when it is loaded.
98The Simon Handbook
The trigger that invokes the desktop grid is of course completely configurable.
Moreover the user
can use ‘real’ or ‘fake’ transparency. If your graphical environment allows for
compositing effects
(‘desktop effects’) then you can safely use ‘real’ transparency which will make the
desktop grid
transparent. If your platform does not support compositing Simon will simulate
transparency
by taking a screenshot of the screen before displaying the desktop grid and display
that picture
behind the desktop grid.
If the desktop grid is configured to use real transparency and the system does not
support com-
positing it will display a solid gray background.
However, nearly all up-to-date systems will support compositing (real
transparency).
This includes:
• Microsoft Windows 2000 or higher (XP, Vista, 7)
• GNU/Linux using a composite manager like Compiz, KWin4, xcompmgr, etc.
By default the desktop grid uses numbers to select the individual fields. To use
the desktop
grid, make sure that your language and acoustic model contains and allows for the
following
‘sentences’:
• ‘One’
• ‘Two’
• ‘Three’
• ‘Four’
• ‘Five’
• ‘Six’
• ‘Seven’
• ‘Eight’
• ‘Nine’
• ‘Cancel’
99The Simon Handbook
To configure these triggers, just configure the commands associated with the
plugin.
4.8.8 Input Number
Using the input-number plugin the user can input large numbers easily.
Using the Dictation or the Text-Macro plugin one could associate the numbers with
their digits
and use that as input method. However, to input larger numbers there are two ways
that both
have significant disadvantages:
• Adding the words eleven, twelve, etc.
While this seems like the most elegant solution as it would enable the user to say
‘fivehun-
dredseventytwo’ we can easily see that it would be quite a problem to add all these
words - let
alone train them. What about ‘twothousandninehundredtwo’? Where to stop?
• Spell out the number using the individual digits
While this is not as elegant as stating the complete number it is much more
practical.
However, many applications (like the great mouseless browsing firefox addon) rely
on the
user to input large numbers without too much time passing between the individual
keystrokes
(mouseless browsing for example will wait exactly 500ms per default before it
considers the in-
put of the number complete). So if you want to enter 52 you would first say ‘Five
(pause) Two’.
Because of the needed pause, the application (like the mouseless browsing plugin)
would con-
sider the input of ‘Five’ complete.
The input number plugin - when triggered - presents a calculator-like interface for
inputting a
number. The input can be corrected by saying ‘Back’. It features a decimal point
accessible by
saying ‘Comma’. When saying ‘Ok’ the number will be typed out. As all the voice-
input and the
correction is handled by the plugin itself the application that finally receive the
input will only
get couple of milliseconds between the individual digits.
100The Simon Handbook
While the input number plugin is active (the user currently inputs a number), all
input that is not
directed at the input number plugin (other commands, etc.) will be rejected. Say
‘Cancel’ at any
time to abort the process.
As there can no command instances be created of this plugin it is not listed in the
New Command
dialog. However, the input number plugin registers a configuration screen right in
the command
configuration when it is loaded.
The trigger defines what word or phrase that will trigger the display of the
interface.
By default the input number plugin uses numbers to select the individual digits and
a couple
of control words. To use the input number plugin, make sure that your language and
acoustic
model contains and allows for the following ‘sentences’:
• ‘Zero’
• ‘One’
101The Simon Handbook
• ‘Two’
• ‘Three’
• ‘Four’
• ‘Five’
• ‘Six’
• ‘Seven’
• ‘Eight’
• ‘Nine’
• ‘Back’
• ‘Comma’
• ‘Ok’
• ‘Cancel’
To configure these triggers, just configure the commands associated with the
plugin.
4.8.9 Dictation
The dictation plugin writes the recognition result it gets using simulated
keystrokes.
Assuming you didn’t define a trigger for the dictation plugin it will accept all
recognition results
and just write them out. The written input will be considered as ‘processed input’
and thus not
be relayed to other plugins. This means that if you loaded the dictation plugin and
defined no
trigger for it, all plugins below it in the Selected Plug-Ins list in the command
configuration will
never receive any input.
As there can no command instances be created of this plugin it is not listed in the
New Command
dialog.
The dictation plugin can be configured to append texts after recognition results to
for example
add a space after each recognized word.
102The Simon Handbook
4.8.10 Artificial Intelligence
The Artificial Intelligence is a just-for-fun plugin that emulates a human
conversation.
Using the text to speech system, the computer can ‘talk’ with the user.
The plugin uses AIMLs for the actual ‘intelligence’. Most AIML sets should be
supported. The
popular A. L. I. C. E. bot and a German version work and are shipped with the
plugin.
The plugin registers a configuration screen in the command configuration menu where
you can
choose which AIML set to load.
103The Simon Handbook
Simon will look for AIML sets in the following folder:
• GNU/Linux: ‘kde4-config --prefix‘/share/apps/ai/aimls/
• Microsoft Windows: [installation folder (C:\Program Files\simon 0.2\ by
default)]\
share\apps\ai\aimls\
To add a new set just create a new folder with a descriptive name and copy the
.aiml files into it.
To adjust your bots personality have a look at the bot.xml and vars.xml files in
the following
folder:
• GNU/Linux: ‘kde4-config --prefix‘/share/apps/ai/util/
• Microsoft Windows: [installation folder (C:\Program Files\simon 0.2\ by
default)]\
share\apps\ai\util\
As there can no command instances be created of this plugin it is not listed in the
New Command
dialog.
It is recommended to not use any trigger for this plugin to provide a more natural
‘feel’ for the
conversation.
4.8.11 Calculator
The calculator plugin is a simple, voice controlled calculator.
The calculator extends the Input Number plugin by providing additional features.
When loading the plugin, a configuration screen is added to the plugin
configuration.
104The Simon Handbook
There you can also configure the control mode of the calculator. Setting the mode
to something
else than Full calculator will hide options from the displayed widget.
However, the hidden controls will, in contrast to simply removing all associated
command from
the functions, still react to the configured voice commands.
When selecting Ok, the calculator will by default ask you what to do with the
generated result.
You can for example output the calculation, the result, both, etc. Besides always
selecting this
from the displayed list after selecting the Ok button, this can also be set in the
configuration
options.
105The Simon Handbook
4.8.12 Filter
Using the filter plugin, you can intercept recognition results from being passed on
to further
command plugins. Using this plugin you can for example disable the recognition by
voice.
The filter command plugin registers a configuration screen in the command
configuration where
you can change what results should be filtered.
The pattern is a regular expression that will be evaluated each time a recognition
results receives
the plugin for processing.
106The Simon Handbook
The plugin also registers voice interface commands for activating and deactivating
the filter.
In total, the filter therefore has three states:
• Inactive
The default state. All recognition results will be passed through.
• Half-active (if Two stage activation is selected)
– If the next command is the ´´Deactivate filter´´ command, the filter will enter
the ´´Inactive´´
state.
– If, however, the next result is something else and Relay results in stage one of
two stage
activation is selected, this result will be passed on to other plugins. The filter
will reset to
´´Active´´ afterwards.
• Active
When activated, the filter will ‘eat’ all results that match the configured
pattern. By default
this means every result that Simon recognizes will be accepted by the filter and
therefore not
relayed to any of the plugins following the filter plugin.
If Two stage activation is enabled and the filter plugin receives the command to
directly enter
the ´´Inactive´´ state, this command is ignored. In other ways: If two stage
activation is enabled,
the filter can only be disabled by going through the intermediate stage.
4.8.13 Pronunciation Training
The pronunciation training, when combined with a good static base model, can be a
powerful
tool to improve your pronunciation of a new language.
Essentially, the plugin will prompt you to say specific words. The recognition will
then recognize
your pronunciation of the word and compare it to your speech model which should be
a base
model of native speakers for this to work correctly. Then Simon will display the
recognition rate
(how similar your version was to the stored base model).
The closer to the native speaker, the higher the score.
107The Simon Handbook
The plugin adds an entry to your Commands menu to launch the pronunciation training
dialog.
The training itself consists of multiple pages. Each page contains one word fetched
from your
active vocabulary. They are identified by a category which needs to be selected in
the command
configuration before starting the training.
4.8.14 Keyboard
The keyboard plugin displays a virtual, voice controlled keyboard.
The keyboard consists of multiple tabs, each possibly containing many keys. The
entirety of tabs
and keys are collected in ‘sets’.
You can select sets in the configuration but also create new ones from scratch in
the keyboard
command configuration.
108The Simon Handbook
Keys are usually mapped to single characters but can also hold long texts and even
shortcuts.
Because of this, keyboard sets can contain special keys like a ‘select all’ key or
a ‘Password’ key
(typing your password).
Next to the tabs that hold the keys of your set, the keyboard may also show special
keys like Ctrl,
Shift, etc. Those keys are provided as voice interface commands and are displayed
regardless of
what tab of the set is currently active.
As with all voice triggers, removing the associated command, hides the buttons as
well.
Moreover, the keyboard provides a numpad that can be shown by selecting the
appropriate op-
tion in the keyboard configuration.
Next to the number keys and the delete key for the number input field (Number
backspace), the
numpad provides two options on what to do with the entered number.
When selecting Write number, the entered number will be written out using simulated
key
109The Simon Handbook
presses. Selecting Select number tries to find a key or tab in the currently active
set that has
this number as a trigger. This way you can control a complete keyboard just using
numbers.
The keys on the num pad are configurable voice interface commands.
4.8.15 Dialog
The dialog plugin enables users to engage in a scripted dialog with Simon.
4.8.15.1 Dialog design
Simon treats dialogs as a succession of different states. Each state can have a
text and several
associated options.
110The Simon Handbook
Dialogs can have more than one text variants - one of which will be randomly picked
when
the dialog is displayed. This can help to make dialogs feel more natural by
providing several,
alternative formulations.
The texts can use bound values and template options.
Dialog options capsule the logic of the conversation. They are the active
components of the
dialog.
Similar to commands, dialog options have a name (trigger) that, when recognized
while the
dialog is active and in the option’s parent state, will cause this option to
activate. Alternatively,
options can also be configured to trigger automatically after a set time period.
This time is relative
to when the state is entered.
111The Simon Handbook
Dialog options, when shown through the graphical output module can show an
arbitrary text
(that will most likely be equivalent to the trigger but doesn’t have to be) and,
optionally, an icon.
If the text-to-speech output module is used, the text (not the trigger) will be
read aloud unless
this is disabled by selecting the Silent option.
Every state can also optionally have an avatar that will be displayed when using
the graphical
output module.
4.8.15.2 Dialog: Bound values
The text of dialog states can contain variables - so called ´´bound values´´ - that
will be filled in
during runtime.
For example, the dialog text ´´This is a $variable$´´ would replace ´´$variable$´´
with the result of
a bound value called ´´variable´´.
112The Simon Handbook
There are four types of bound values:
• Static
Static bound values will always be resolved to the same text. They are useful to
provide con-
figuration options to be filled in to personalize the dialog (e.g., the name of the
user).
• QtScript
113The Simon Handbook
QtScript bound values resolve to the result of the entered QtScript code.
• Command arguments
If the dialog trigger command (the Simon command that initiates the dialog) uses
placeholders,
they can be accessed through command argument bound values. The Argument number
refers
to the index of the placeholder you want to access.
For example, if your dialog is started with the command ´´Call %1´´, and ´´name´´
is a command
argument bound value, then launching the dialog by recognizing ´´Call Peter´´, will
turn the
dialog text ´´Are you sure you want to call $name$?´´ into ´´Are you sure you want
to call
Peter?´´.
114The Simon Handbook
• Plasma data engine
This type of bound value can readily access a wide array of high-level information
through
plasma data engines.
4.8.15.3 Template options
Dialog texts can further be parametrized through template options.
These boolean values choose between different or optional text snippets.
115The Simon Handbook
For example, the template option ´´formal´´ above, would change the dialog text
´´Would you
please {{{formal}}be quiet{{elseformal}}shut up{{endformal}}´´ to ´´Would you
please be quit´´ or
´´Would you please shut up´´ depending on if the template option is set to true or
false. The
else-path can be omitted if it is not required (e.g. ´´Would you {{formal}}please
{{endformal}}be
quiet´´).
4.8.15.4 Avatars
Every state can potentially show a different avatar.
These images can range from the picture of a (simulated) speaker to an image of
something
topically appropriate.
To use an avatar, first add it here and later define where to use it in the dialog
design section.
4.8.15.5 Output
Dialogs can be displayed graphically, use text-to-speech or combine both
approaches.
116The Simon Handbook
The Separator to options will be spoken between the dialog text and the current
state’s options
(if there are any). If there are no options to this state or all are configured to
be silent, this will
not be said. The option to listen to the whole announcement again is triggered when
saying
one of the configured Repeat on trigger. Additionally, the text-to-speech output
can optionally
be configured to repeat the listing of the available options (including the
configured separator)
when the user says a command that does not match any of the available dialog
options.
4.8.16 Akonadi
The Akonadi plugin allows Simon to plug into KDE’s PIM infrastructure.
117The Simon Handbook
The plugin fulfills two major purposes:
• Execute Simon commands at scheduled times
The Akonadi plugin can monitor a specific collection (calendar) and react on
entries whose
summary start with a specific prefix. Per default, this prefix is ´´[simon-command]
´´, meaning
that events of the form ´´[simon-command] <plugin name>//<command name>´´ will
trigger
the appropriate Simon command at the ´´start time´´ of the event.
The name of the plugins and commands are equivalent to the ones shown in the
command
dialog and do not necessarily need to reference commands in the same scenario as
the Akonadi
plugin instance.
• Show reminders for events in the given calendar
If configured to do so, the Akonadi plugin can show reminders for calendar events
with a set
alarm flag. These reminders will be shown through the Simon dialog engine.
4.8.17 D-Bus
With the D-Bus command plugin, Simon can call exported methods in 3rd party
applications
directly.
The screenshot below, for example, calls the ´´Pause´´ method of the MPRIS
interface of the Tom-
ahawk music playing software.
4.8.18 JSON
Similar to the D-Bus command plugin, the JSON plugin also allows to contact 3rd
party applica-
tions to directly invoke functionality (instead of simulating user activity).
118The Simon Handbook
4.8.19 VRPN
With the VRPN command plugin, Simon can act as a VRPN server and export voice
controlled
buttons.
The plugin configuration allows you to set the port the server should operate on
and to define an
arbitrary list of buttons. Each of these button objects will have exactly one
´´button´´ (in VRPN, a
button may theoretically have more than one clickable item).
After setting up the buttons, you can now configure Simon commands to act on them.
You can
119The Simon Handbook
set the commands to either Press & Release (consecutively), Press, Release or
Toggle the button
they manipulate.
For example, the command shown in the screenshot above would press and release
(´´click´´) the
VRPN button at index 0 of the button device accessible as ´´ButtonB@localhost´´.
120The Simon Handbook
Chapter 5
Questions and Answers
In an effort to keep this section always up-to-date it is available at our online
wiki.
121The Simon Handbook
Chapter 6
Credits and License
Simon
Program copyright 2006-2009 Peter Grasch peter.grasch@bedahr.org, Phillip Goriup,
Tschernegg
Susanne, Bettina Sturmann, Martin Gigerl
Documentation Copyright (c) 2009 Peter Grasch peter.grasch@bedahr.org
This documentation is licensed under the terms of the GNU Free Documentation
License.
This program is licensed under the terms of the GNU General Public License.
122The Simon Handbook
Appendix A
Installation
Please see our wiki for install instructions.
123