Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Wideband speech
Wideband
Telephone-band
capability
capability Wideband
G.711 bitstream enhancement
bitstream
2. Process up to the approval of G.711.1 AAP was completed in March 2008. That is, the pro-
posed candidate was approved as a new ITU-T stan-
To provide wideband speech communication in dard, G.711.1.
commercial NGN services, NTT Labs. proposed
beginning standardization of wideband speech cod- 3. Concepts of G.711.1
ing scalable with G.711 at the ITU-T meeting held in
January 2008. It was recognized that such a codec 3.1 Wideband speech capability
would have advantages in situations where new wide- Since the frequency bandwidth of the telephone-
band terminals and legacy G.711 terminals co-exist band speech coded by G.711 is limited to the range
and the start of standardization was agreed. After- from 300 Hz to 3.4 kHz, it has enough quality to
ward, NTT acted as a leader of the task and held two handle conversations, but it loses the clearness and
responsible positions: moderator in charge of coordi- naturalness of human voices slightly. The first prior-
nation and editor in charge of drafting the recommen- ity for the concepts required for G.711.1 was the abil-
dation. NTT and four other organizationsETRI ity to handle wideband speech (50 Hz to 7 kHz),
(Korea), France Telecom (France), Huawei Technolo- which can transmit properties that are lost in the tele-
gies (China), and VoiceAge (Canada)jointly pro- phone band. It must also transmit both music and
posed a candidate algorithm, which combined UEM- environmental sounds with a high level of listening
CLIP and technologies of the other four organiza- quality.
tions. Characterization tests, which were conducted
in ITU-T using subjective listening, confirmed that 3.2 Bitstream scalability with G.711
this algorithm met all the requirements in terms of Several wideband speech coding standards have
listening quality. As a result, at the ITU-T SG16 already been established, but their concepts do not
WP3*3 meeting held in February 2008, consent was include scalability with G.711. If the bitstream of the
given for the candidate codec to progress to the alter- G.711 core layer and that of the enhancement layer
native approval process (AAP)*4, which is the final are multiplexed for bandwidth expansion, wideband
approval process for ITU-T standardization. This speech can be obtained using all parts of the bit-
stream, while telephone-band speech can be obtained
*3 ITU-T SG16 WP3: Study Group 16 (SG16) is responsible for from the G.711 part (Fig. 1). This bitstream structure
standardization related to multimedia terminals, systems, and ap- brings two advantages, as described below.
plications. Working Party 3 (WP3) manages overall issues about
media coding.
*4 AAP: The alternative approval process is applied for technical
3.2.1 Transcoding between G.711 and G.711.1
standards. The last call for comments is conducted online after Until wideband speech terminals completely
consent has been granted at the ITU-T SG meeting. This can re- replace the telephone-band ones, both types of termi-
duce the period until approval to about two months. nals will continue to coexist. The codec to be used
Private Public
VoIP gateway
Need
Wideband
transcoding
terminal A
Private Public
IP phone IP phone
Transfer systems services
during a communication session is usually negotiated sary to complete all the following processes in the
between the terminals when the call is set up. In the mixing server simultaneously: N decoding processes
coexistence situation, in call transfers from wideband that produce N speech signals from N locations, N
terminals to legacy terminals, transcoding will be mixing processes that prepare N mixed signals for N
needed after the negotiation has been established locations, and N re-encoding processes that generate
(Fig. 2). Such transcoding is generally performed in N bitstreams for N locations. Since the existing wide-
the following manner: decoding of the bitstream from band speech codecs require much more computation
the wideband terminal and re-encoding of the decod- than G.711 ones, they require much greater facilities
ed speech signals into the G.711 bitstream for the in terms of both cost and scale to provide multipoint
legacy terminals. For transcoding between different conferencing systems and services than the usual
types of bitstreams at a media gateway on a network, wideband codecs. This problem regarding the wide-
much more computation would be expected for the band speech mixing can be solved by introducing
wideband case. Moreover, quality degradation caused Partial Mixing [2], which was developed by NTT
by the transcoding cannot be ignored. However, the Labs. As shown in Fig. 3, partial mixing is performed
introduction of a bitstream structure scalable with the in the following way. Since the amount of computa-
G.711 bitstream, as described here, enables transcod- tion needed for decoding and re-encoding G.711 is
ing to be completed just by extracting the G.711 bit- much less, the G.711 core bitstreams are merged
stream. This needs hardly any additional computation using the conventional mixing. The conventional
and there will never be any quality degradation mixing is not applied to the enhancement layer. The
caused by re-encoding. current location of the speaker is detected by analyz-
ing the G.711 decoded signals and then the enhance-
3.2.2 Wideband speech mixing ment bitstream, received from the speaker site, is
In multipoint conferencing, the mixing process used for all locations. Each G.711 bitstream obtained
must be performed at a signal mixer. In general, the by conventional mixing and the enhancement bit-
signal mixer decodes all bitstreams from all loca- stream from the speakers location are multiplexed
tions, merges the decoded signals into one speech and transmitted to every location. Partial mixing
signal, re-encodes the merged signal into the bit- enables wideband speech mixing by just adding the
stream, and then transmits it to all locations. Howev- slight computation required for detecting the speak-
er, to avoid echoes generated by sending back the ers location to that for conventional G.711 mixing.
signal received from each location, the mixed signal
for each location must be prepared individually by 3.3 Packet loss concealment
removing that locations decoded signal from the full Packet loss concealment is necessary for the real-
signal and then re-encoding it. That is, for multipoint- time voice communication provided on an IP net-
conferencing between N different points, it is neces- work. On a best-effort network, packets might not
Location B
reach the receiver side with the proper timing. In that first wideband speech coding method ever standard-
case, the speech would be interrupted. Conventional ized in ITU-T.
VoIP systems usually have jitter buffers to minimize
the delay variation in packet arrival. However, mak- 4. Technical descriptions of G.711.1
ing the jitter buffer longer would mean increasing the
speech delay, so the buffer length cannot be length- A high-level block diagram of G.711.1 is shown in
ened without limit. G.711.1 has a mechanism for Fig. 4. The input signal, which is sampled at 16 kHz,
producing speech signals without interruption under is processed frame-by-frame with a frame length of 5
conditions where several packet losses might occur ms. The input is split into lower-band and higher-
successively. band signals by an analysis quadrature mirror filter.
The lower-band signal is encoded with an embedded
3.4 Low delay and low computation lower-band pulse code modulation (PCM) encoder,
The lower the speech delay arising from the speech which generates a G.711-compatible core bitstream
coding algorithm itself, the better the conversational at 64 kbit/s and a lower-band enhancement bitstream
quality. The frame length of G.711.1 has been con- at 16 kbit/s. The higher-band signal is transformed
firmed to be 5 ms, so the speech packet length is set into the modified discrete cosine transform (MDCT)
to the minimum of 5 ms or to a multiple of 5 ms. The domain and the frequency domain coefficients are
algorithmic delay including the frame length is set to encoded by the higher-band MDCT encoder, which
11.875 ms. generates a higher-band enhancement bitstream at 16
Lower cost is also one of the most important factors kbit/s. The lower-band enhancement bitstream
for promoting the spread of wideband voice commu- improves the speech quality of the lower-band (50 Hz
nication services. The algorithm of G.711.1 is to 4 kHz) and the higher-band enhancement layer
designed to process speech signals with as little com- adds wideband capability (4 to 7 kHz). These three
putation as possible so that it can be installed on bitstreams are multiplexed and transmitted to the
inexpensive digital signal processors. The complexity decoder. The bitstream received at the decoder is
of G.711.1, which is estimated using fixed-point demultiplexed into three bitstreams. The G.711 bit-
simulation software, is 8.70 WMOPS (weighted mil- stream and the lower-band enhancement bitstream
lion operations per second)*5 in the worst case. This are handed to the lower-band embedded PCM decod-
is comparable to that for ITU-T G.722, which is the ers. The higher-band enhancement bitstream is given
to the higher-band MDCT decoder, and the decoded
*5 WMOPS: An abbreviation of weighted million operations per signal in the frequency domain is subsequently fed to
second. It indicates how many operations, simulating digital sig- an inverse MDCT (IMDCT), and the higher-band
nal processor operations, are performed per second. signal in the time domain is obtained. The lower- and
G.711.1 encoder
Noise shaping
Lower-band G.711
Wideband signal Lower-band embedded G.711 bitstream
input PCM encoder encoder
Lower-band
enhancement
Pre-processing Analysis bitstream Output
MUX
filter QMF bitstream
Higher-band
Higher-band enhancement
signal Higher-band bitstream
MDCT MDCT
encoder
G.711.1 decoder
Noise shaping
G.711 Lower-band
bitstream Lower-band Lower-band signal
G.711 embedded PCM frame erasure
decoder decoder concealment
Lower-band enhancement
Input bitstream Synthesis Wideband
DeMUX Higher-band QMF output
bitstream
Higher-band frame erasure
enhancement concealment Higher-band
bitstream signal
Higher-band
MDCT IMDCT
decoder
R1 8 64
R2a 8 80
R2b 16 80
R3 16 96
higher-band signals are combined using a synthesis can perceptually suppress the G.711 quantiza-
quadrature mirror filter to generate a wideband output tion noise in encoding and also improves the
signal. Thus, one of the concepts mentioned above, listening quality of the speech decoded at legacy
bitstream scalability with G.711, can be obtained. telephone terminals.
The four modes, which differ in sampling rate and T he higher-band MDCT encoder efficiently
bitrate, and respective combinations of the three bit- compresses the input MDCT coefficients at 16
streams are given in Table 1. The structure described kbit/s using interleaved conjugate-structured
here is basically equal to that of UEMCLIP. vector quantization (compression rate of the
The main technologies introduced in G.711.1 are higher-band MDCT encoder: 1/8). The pre-
listed below. selection method and table optimization can
N
oise shaping using linear prediction is applied greatly decrease the amount of computation
to the lower-band embedded PCM encoder. It required for vector quantization.
5 5
Speech Music
4 4
MOS 3 MOS 3
2 2
1 1
Original G.722 G.722 R2b R3 Original G.722 G.722 R2b R3
56 kbit/s 64 Kbit/s 80 kbit/s 96 kbit/s 56 kbit/s 64 Kbit/s 80 kbit/s 96 kbit/s