Sei sulla pagina 1di 10

Dragon Drive Embedded Technologies April 2018

Version 1.1
White Paper

Dragon Drive
Embedded
Technologies
Helping automotive manufacturers create superior user experiences

© 2018 Nuance Communications, Inc. All rights reserved.


2
Dragon Drive Embedded Technologies April 2018
Version 1.1
White Paper

Table of Contents

Introduction ........................................................................................................................... 3

The Embedded Portfolio ....................................................................................................... 4


Dragon Drive Encompasses Technologies, Platforms and Applications ............................... 4
Dragon Drive Framework | Core Technologies and Features ............................................... 5
Nuance SSE (Speech Signal Enhancement) ................................................................................................................. 5

Automatic Speech Recognition (VoCon)........................................................................................................................ 5

Natural Language Understanding (NLU)........................................................................................................................ 6

Vocalizer: Text-to-Speech (TTS) ................................................................................................................................... 6

Voice Biometry (VB)....................................................................................................................................................... 7

Push-to-Talk, Wake-up Word and Just Talk .................................................................................................................. 7

Multimodal Input (Non-Speech) ..................................................................................................................................... 8

Cognitive Arbitration....................................................................................................................................................... 8

Benefits of Embedded Design .............................................................................................. 9

Company Background ........................................................................................................ 10


About Nuance ..................................................................................................................... 10
About Nuance Automotive .................................................................................................. 10

© 2018 Nuance Communications, Inc. All rights reserved.


3
Dragon Drive Embedded Technologies April 2018
Version 1.1
White Paper

Introduction

A lot of thought is involved in designing a vehicle cabin environment that reflects the unique brand experience a
manufacturer wants to create. Every element is carefully considered: the layout, controls, materials selected, heritage and
even the scent contribute to the completed space.

In recent years, the infotainment system has become one of the most challenging and laboriously considered aspects of
the automotive user experience. Infotainment systems are evolving rapidly into the “central nervous systems” of the
vehicle, complete with unique personas of their own. More than any other element, infotainment systems are impacting
how users perceive their vehicles. Therefore, it is not surprising that unprecedented resources are being invested in
designing better systems that can support customer needs in an increasingly connected, automated and data-saturated
world.

Nuance has long been a partner to vehicle manufacturers and Tier 1 suppliers throughout the evolution of what we today
call infotainment. Having relationships with nearly every automotive company on the planet, Nuance’s Dragon Drive
technologies are in over 200 million vehicles on the road today.

Nuance Automotive maintains that the best contemporary infotainment systems leverage a very well-designed “hybrid”
embedded-cloud architecture. Consequently, we are resolute that no system design without a solid embedded platform
can adequately, consistently and reliably deliver the customer experience any brand wishes—and increasingly needs—to
deliver. This paper provides a high-level overview of the various embedded technologies Nuance offers and explains why
they are key elements to carefully consider when designing next-generation infotainment systems.

© 2018 Nuance Communications, Inc. All rights reserved.


4
Dragon Drive Embedded Technologies April 2018
Version 1.1
White Paper

The Embedded Portfolio

Dragon Drive Encompasses Technologies, Platforms and Applications


The newest breed of infotainment systems includes an automotive assistant: a virtual, speech-enabled persona that users
communicate with to perform both routine and highly complex tasks while driving. Nuance Automotive Professional
Services specializes in developing custom, brand-tailored “intelligent” automotive assistants using a portfolio of
technologies, platforms and applications. Catering to the OEM’s unique requirements and specifications, Nuance
Automotive Professional Services tightly integrates these solutions with the vehicle’s native applications, sensors and
data.

The foundation of the architecture consists of a number of Nuance core technologies specifically optimized for automotive
use cases. Collectively, they are part of Dragon Drive Framework (DDFW), our award-winning software platform, which is
embedded in the vehicle head unit.

Nuance Dragon Drive

OEM-Branded Automotive Assistant


Platforms Applications

Dragon Drive Domains

Gas Parking Car


UDE Calling Music Weather News Office Custom
Stations Finder Manual

Dragon Drive Framework Dragon Drive Mobile Dragon Drive Cloud


Technologies

Speech Signal Automatic Speech Natural Language


Enhancement Recognition Understanding Text-to-Speech

Push-to-Talk / Wake-up
Voice Biometry Word / Just Talk Multimodal Input Cognitive Arbitration

© 2018 Nuance Communications, Inc. All rights reserved.


5
Dragon Drive Embedded Technologies April 2018
Version 1.1
White Paper

Dragon Drive Framework | Core Technologies and Features


Nuance SSE (Speech Signal Enhancement)

As in any type of system that processes data, the old adage “good input is necessary for good output” is a universal truth.
This without doubt applies to the speech recognition systems in passenger vehicles, where road noise and interior sounds
present challenges that must be dealt with properly for a frustration-free user experience.

Ambient interior sounds and noise from around the vehicle mingle with output from the infotainment system and
conversations between passengers. In this complex soundscape, the automotive assistant has to understand when it is
being spoken to and what it is being said, without interference from all other sounds it captures. This is no easy task in a
single-user setup and is made even more difficult with multiple users, where commands can come from any position in the
vehicle and in any user’s voice, volume and intonation. For these reasons, a high-quality voice input signal is a
precondition of any reliable automotive speech recognition solution. The best implementations make use of a combination
of independent audio signal processing technologies.

For over fifteen years, Nuance has been highly successful in developing and combining several state-of-the-art audio
processing technologies. Collectively known as Nuance SSE (Speech Signal Enhancement), these technologies deliver
speech recognition results at benchmark levels. They are each named below and can be studied in detail in our white
paper titled Nuance SSE.

AEC ........................................ Acoustic Echo Cancellation

DT ............................................ Double-Talk (duplex speech, e.g., for phone calls)

ICC........................................... In-Car Communication

MIMO ....................................... Multiple Input–Multiple Output Channels

PIC ........................................... Passenger Interference Cancellation

RES ......................................... Residual Echo Suppression

WBS......................................... Wind Buffeting Suppression

Automatic Speech Recognition (VoCon)

The heart of any speech recognition system is its ability to efficiently and accurately recognize what the user has said to it.
Consumers focus on this aspect more than any other to judge whether a system is good or bad. VoCon (short for Voice
Control) has been recognized for many years as the gold standard for automatic speech recognition (ASR).

Today, VoCon supports 32 languages and dialects, with more planned, making it one of the largest language portfolios in
the speech industry. The latest generation of VoCon is based entirely on neural networks, which has improved overall
accuracy by 20% on average across the full portfolio.

A few key features of the newest generation of embedded VoCon include: free-format natural speech input, one-shot
unified destination entry, “barge-in” and top-level menu support for all domains. For example, a user could say, “Drive to
Madison Square Garden and send the ETA to my brother.” In a single utterance, the user has instructed the system to
launch the navigation system, identify the address of a named point of interest, calculate when the destination will be
reached, switch to the messaging domain, interpret in Contacts who the user’s brother is and finally, compose a text
message incorporating data from the navigation system.

© 2018 Nuance Communications, Inc. All rights reserved.


6
Dragon Drive Embedded Technologies April 2018
Version 1.1
White Paper

Invariably, users will want to interrupt what the system is saying, whether to correct themselves, cancel a command or
answer a prompt more quickly. Embedded VoCon enables this with “barge-in.” Simply put, users need not wait for the
system to finish speaking before telling it what to do next.

Users greatly appreciate two things when they experience ASR: accuracy and speed. Both of these are best achieved
with an embedded platform, where risk of connectivity and backend latency issues are eliminated. That said, there are
additional “best-in-class” features and benefits that can be delivered by integrating Embedded VoCon with Cloud VoCon
in a hybrid solution. These can be discussed in detail with your Nuance representative.

Natural Language Understanding (NLU)

Once speech has been captured and accurately converted into words, it needs to be analyzed for meaning and intent. To
do this, a recognized utterance that is associated with a known domain (i.e., field of action, thought, topic) is determined to
have the highest probability of matching the user’s intent. NLU not only enables the comprehension that “Drive to John’s
office” is a request to navigation, but also that “I am cold” refers to the climate control domain. This is again no easy task,
but one in which Nuance language scientists and software engineers continue to lead.

Further, when AI reasoning is also applied, pre-defined or learned user preferences and real-time contextual information
are applied to the results the automotive assistant gives the user.

For example, if the user asks, “Will I need an umbrella tomorrow?” the automotive assistant will determine that the
structure of the sentence and the word “umbrella” refer to the weather domain. It reasons that “umbrella” is usually
associated with rain and that “tomorrow” indicates a future forecast. Next, applying AI reasoning, if the assistant has
access to the user’s calendar and determines that they will be in another city the following day, the recommendation to
take an umbrella would be based upon the forecast in that location, as opposed to the location at the time of the inquiry.

For optimum performance, reliability and security, NLU processing should be primarily performed on-board. This is
particularly appropriate for the most-used driver commands involving elemental automotive domains, such as navigation,
telephony, messaging, audio tuner and vehicle systems control. Sending every query simultaneously to the cloud creates
some added benefits, which is why Nuance recommends a combination hybrid approach whenever possible for the best
user experience in all circumstances. This is also why cloud-based NLU on its own is not recommended.

Vocalizer: Text-to-Speech (TTS)

The persona of an infotainment system is directly determined by the speech output it generates when conversing with
users. Is the voice human-like, natural, expressive, smooth, consistent, intelligent-seeming and so on? Known as
Vocalizer, Nuance’s text-to-speech solution consists of an expansive and growing portfolio of more than 56 languages
and 122 voices, covering all major global languages and dialects.

In addition, Nuance develops custom voices for customers that wish to differentiate themselves through an exclusive
personality which best represents their brand. A custom solution is truly bespoke and can be tailored to virtually any
language, accent, gender and style.

Some high-level benefits of Nuance’s embedded vocalizer include: superior-quality audio fidelity output that sounds
smooth and natural, pronunciation consistency between speech input and output, accurate global cross-lingual language
identification, multi-style voice support—formal versus conversational, emotional styling such as neutral, joyful and
didactic—and footprint scalability based on hardware memory resources.

Embedded TTS also has the great advantages of providing voice output without any latency and reliable availability
regardless of connectivity status. This is very important not only for route guidance, but also for other use cases where
system feedback is expected.

For an in-depth understanding of embedded vocalizer and the additional benefits that augmenting it with our cloud
vocalizer creates, ask your Nuance representative to schedule a presentation.

© 2018 Nuance Communications, Inc. All rights reserved.


7
Dragon Drive Embedded Technologies April 2018
Version 1.1
White Paper

Voice Biometry (VB)

Our voices are one of the most unique and distinguishing aspects of who we are as individuals. When someone speaks,
others can usually determine the speaker’s identity simply from the sound of his or her voice. Our voices are actually
much more unique than most people realize. As the preeminent expert in the business of speech and voice recognition
technologies, Nuance knows that there are many characteristics and attributes of a voice that make it unique and useful
for accurately distinguishing one person from another. In fact, humans cannot detect many of these unique identifiers with
as much accuracy and certainty as advanced voice biometric technology can. Even two voices that sound identical to
most people (think of the best impersonators you’ve seen) can be discerned by machine with surprising precision. This is
why voice biometry is being used more and more to authenticate and protect sensitive transactions in industries such as
financial services, utilities and healthcare.

Nuance Automotive uses VB in Dragon Drive Framework to identify users, distinguish one user from others, link users’
voices to individual profiles and permissions, protect personal data and ultimately, allow multiple users—drivers and
passengers—to interact with the automotive assistant, simultaneously and in ways specific to each person. For example,
when one passenger says, “Call my husband,” the system VB can identify the speaker and determine the correct husband
to dial.

With VB embedded in the vehicle, the system can almost instantly match speakers’ voices to user voiceprints securely
stored in the head unit, adjust account preferences and permissions and interact with the user(s) without any extra effort
on their part. Further, because this processing occurs on-board, weak or absent cellular connectivity is not a factor, and
user privacy is not compromised.

Push-to-Talk, Wake-up Word and Just Talk

Invoking the automotive assistant has never been easier and Nuance Dragon Drive Framework supports three methods,
which can be implemented alone or in combination:

PTT .......................................... Push-to-Talk

WuW ........................................ Wake-up Word

Just Talk .................................. Active Listening Mode

Push-to-Talk—a button, usually on the steering wheel or center console—has been the activation method for speech
recognition systems since their inception. It is still the most common. Recently, use of wake-up words (aka trigger words),
such as “Hey Dragon,” have begun to emerge in automobiles.

In 2017, Nuance introduced Just Talk, a new embedded technical innovation that requires neither the press of a button
nor the use of a specific utterance to get the automotive assistant’s attention. When turned on by the user, the automatic
speech recognition and natural language understanding systems actively monitor all speech in the vehicle, filter out audio
from the infotainment system and wait for words, phrases and grammars that it recognizes as a command related to a
domain it understands and can respond to.

False triggers are largely eliminated through sophisticated syntax, cadence and intonation analysis performed in real time
and can be further augmented with in-vehicle sensors such as head/body movement trackers.

We believe Just Talk is a true game-changer for creating the most pleasant user experience, because it mimics how
people naturally communicate with one other. Nuance has conducted research on users’ experiences with the various
methods that we would be happy to share upon request.

© 2018 Nuance Communications, Inc. All rights reserved.


8
Dragon Drive Embedded Technologies April 2018
Version 1.1
White Paper

Multimodal Input (Non-Speech)

The vehicle human-machine interface (HMI) becomes more sophisticated every year. Yesterday’s super cool, new
innovations in cars, such as touchscreens, head-up displays and touchpads that were available only in the premium
segment are being added to mass-market vehicles faster than ever. Not long ago, speech recognition was mostly
limited—both in terms of availability and capability—to the high-end set. The same can be said about advanced driver
assistance systems (ADAS).

All this new HMI and ADAS tech necessitates a really “smart” automotive assistant that is capable of receiving and
seamlessly handling input from a wide array of systems and sensors. Nuance Dragon Drive Framework enables this, with
solution modules that support several forms of driver input beside voice and haptic:

XT9 ......................................... Text Prediction and Correction

T9Write .................................... Handwriting Recognition combined with Intuitive Predictive Text

Gaze ........................................ Eye/Head Tracking Recognition Input

Detailed information about each of these is described in another white paper titled Humanizing the Car Experience.

Cognitive Arbitration

In a world of proliferating virtual assistants—both general-purpose and highly specialized—it may be preferable for the
user to not think about which assistant to invoke for a given request. This becomes particularly relevant when considering
driver distraction and “cognitive load” in UX design. Dragon Drive can route any particular command to the most
appropriate virtual assistant or bot. These could be OEM enterprise systems and/or third-party assistants (e.g., Amazon,
Apple, Google, Microsoft, etc.).

For example, if the user tells the automotive assistant to unlock the back door at home, it can recognize that this is a
command for the user’s smart home system to fulfill and automatically pass it along, as well as receive and share the
confirmation with the user. Similarly, when the user asks the automotive assistant how many payments remain on the
vehicle’s lease, the query could be referred to the OEM’s captive finance company’s assistant or backend interface. We
refer to this as interoperability enabled through cognitive arbitration.

There are a few approaches an OEM could take to implement interoperability into their automotive assistant design. In an
ideal architecture, embedded and cloud arbitration modules would work together to support both explicit and implicit
arbitration cases. Please read our white paper titled Cognitive Arbitration for a deeper look into this truly innovative
advancement that aims to address one of the biggest unmet consumer desires today.

© 2018 Nuance Communications, Inc. All rights reserved.


9
Dragon Drive Embedded Technologies April 2018
Version 1.1
White Paper

Benefits of Embedded Design

The benefits of an infotainment architecture with embedded components are numerous and essential to provide an
experience that meets the needs of even “typical” users in the next vehicles they purchase or lease. Customer
expectations of virtual assistants are rising quickly, and automotive brands that do not keep pace in their systems will risk
becoming competitively disadvantaged. Here are the top-line benefits of having an embedded foundation for the
automotive assistant in the design of your infotainment system:

Performance
Nearly instantaneous speech recognition and understanding processing, as well as ultra-low latency
for speech output from the system.

Quality
Combining Speech Signal Enhancement and Automatic Speech Recognition technologies on-board
ensures the cleanest, most accurate speech input, for a truly automotive-grade solution.

Reliability
On-board systems are available anywhere, anytime, regardless of cellular connectivity.

Sustainability
Future-proofing is possible with over-the-air software updates. With DDFW dialog plug-ins, complete
domains can be added to an existing deployed speech system.

Features
Just Talk, Wake-up Word and Barge-in are only possible on an embedded platform.

Vehicle Systems Integration


Dragon Drive Framework enables tight integration with native vehicle applications, sensors and HMI.

Privacy
Users’ utterances are processed on-board and immediately purged, as well as the system’s output.

Cost
On-board processing reduces or eliminates cellular data transmission costs to the cloud.

© 2018 Nuance Communications, Inc. All rights reserved.


10
Dragon Drive Embedded Technologies April 2018
Version 1.1
White Paper

Company Background

About Nuance
Nuance Communications, Inc. (NASDAQ: NUAN) is seen as the leading provider of voice and language solutions for
businesses and consumers around the world. Its technologies, applications and services make the user experience more
compelling by transforming the way people interact with information. Every day, millions of users and thousands of
businesses experience Nuance’s proven applications and professional services.

Nuance is reinventing the relationship between people and technology through speech and language solutions driven by
advances in Artificial Intelligence and cognitive computing. It has pioneered the evolution of speech recognition
technology that today integrates Artificial Intelligence (AI) to transform the way people interact with the devices, systems,
apps, and services that surround them. Every day, millions of people and thousands of organizations experience our
technology through intelligent systems that can listen, understand, learn, reason and facilitate life and work. Our clients
span large companies and organizations, including hospitals, banks, airlines, carriers and car manufacturers that leverage
our technologies and services to make businesses and products run more smoothly and create a better experience.

Speech is one of the most natural and intuitive ways to interact with devices, applications and systems, lessening our
reliance on the mouse, keyboard and touchscreen. We have developed a broad portfolio of speech recognition and
Natural Language Understanding (NLU) technologies that integrate machine learning and big knowledge for the variety of
systems and services that leverage virtual and collaborative assistant offerings across devices and services in the Mobile,
Enterprise and Healthcare industries. Further, our Document Imaging business drives increased productivity and security
for the world’s largest enterprises that need to gain control over document capture and workflows.

About Nuance Automotive


Speech recognition, NLU, AI and predictive touch solutions from Nuance have pioneered many of the personal assistant
technologies and intelligent systems in the devices we use every day from the world’s leading brands—including mobile
devices, cars, televisions, wearable devices, and now the emerging ecosystem of the Internet of Things. We deliver a
more human experience with technology, keeping consumers better connected and informed—consistently adapting to
and predicting their needs.

The Nuance Automotive business delivers automotive-grade solutions enabling drivers all over the world access to
information and services and providing the safest, smartest and most natural user experience. Nuance’s voice technology
has been shipped in more than 200 million cars from BMW, Ford, GM, Mercedes-Benz, Toyota, Volkswagen and other
major automakers and is at the heart of over 14 million connected car experiences on the market today. Nuance’s Dragon
Drive provides the industry’s most comprehensive suite of solutions for the connected car, giving automakers and
suppliers the ability to integrate a natural language voice interface, content and connectivity that is customized for their
individual brand.

© 2018 Nuance Communications, Inc. All rights reserved.