Translation and the Web in the era of Machine Translation

Translation and the Web in the era of Machine Translation (MT)

How to use MT and write/edit texts

Hellmut Riediger Gabriele Galati

Key words: machine translation, simplified writing, writing for the web, sustainable communication


Machine Translation has become an essential tool for professional translators.

Post-Editing Machine Translation (PEMT) is a key trend in the domain of professional translation.

Size of texts or words requiring MT are on the rise; this trend forced the market to look for professionals able to analyse and post-edit the output of an automatic translator in adherence to customer requirements. This comprises the range of further usability and improve the performance of MT as such.

We are hereby trying to describe what MT can do as to skills and uses besides post-editing, especially in connection with web pages written in simplified language.

In 1965 Italo Calvino wrote a famous article(Calvino 1980) in which he inveighed against "the antilingua (“the anti-language”) of bureaucracy. At the end of the quote, he predicted that

«…Our age is characterized by this contradiction: on the one hand we need to be able to translate everything which is being said into other languages immediately translated into other languages; on the other we realise that every language is in a self-contained system of thought in itself and by definition untranslatable. My prediction is this: each language will revolve around two poles. One pole is immediate translatability into other languages which will come close to a sort of all-embracing, high-level interlanguage; and another pole will be where the singular and secret essence of the language, which is by definition untranslatable, is distilled and entrusted to diverse linguistic institutions such as popular argot and the poetic creativity of literature»

After fifty years, MT has become the major instrument of the first pole, i.e. "immediate translatability". Despite the scepticism and resistance of many a human translator, MT has been taking over for several years a central role in the language and translation industry to the extent that in 2014 the MT market value adds up to 250 million dollars with an irreversible growth trend (van der Meer 2014b).

The MTOE (Machine Translation Post Editing) is an ever-increasing skill demanded from professional translators, whereas MT for Multilingual terminology search and retrieval of parallel texts is part of run-of-the-mill translation assignments.

Technology is steadily improving, there are though a few limitations that restrict the facile enthusiasm of those who think that they may at some point do away with all human input. To some extent, one might say that the question is not about human replacement, but rather what one can do to improve the performance of those systems to reduce costs, as well as carrying out less gratifying tasks or create new professional opportunities.

One paramount aspect is to question how to write or edit source texts that are easily intelligible and translatable by using a simplified language and pre-editing (see also Muegge, 2007) in order to optimise

- translation time and relevant cost

- the texts for web users who use online automatic translation on a regular base to

get the gist of the contents in languages unknown to them.

We are presenting the results of researches we have carried out on this issue in the Weaver laboratory.

1 MT development and use

1.1 Development

Up until a few years ago, there was a widespread skepticism amongst translators about MT. A renowned spokesperson for MT-skeptics was Umberto Eco. His essay on translation "Say almost the same thing" opens up with the chapter "Synonyms of Altavista", (2007, 25 Eco et seq.) presenting a series of translation examples made in the late '90 from Babel Fish MT, at the time in use by Altavista search engine. Some of those examples are the hilarious

automatic translations of the Bible (Genesis) in several language combinations with terms listed in the following table:

English source text

Italian translation by Babel Fish

The Works of Shakespeare

Gli impianti di Shakespeare

Hartcourt Brace Brace

sostegno di Hartcourt

Speaker of the Chamber of Deputies

Altoparlante dell’alloggiamento dei delegati

Studies in the logic of Charles Sanders Pierce

Studi nella logica delle sabbiatrici Peirce del Charles

In 2014 we have translated the same phrases using Google translate and got these results:

English source text

Italian translation by Google translate

The Works of Shakespeare

Le opere di Shakespeare

Hartcourt Brace Brace

Harcourt Brace

Speaker of the Chamber of Deputies

Presidente della Camera dei deputati

Studies in the logic of Charles Sanders Pierce

Studi nella logica di Charles Sanders Peirce

This is an undeniable improvement. What has happened in these past few years?

Modern MT systems use statistical approaches, in other words, they attain translation from online corpora of multilingual human-collected texts constantly updated and corrected through user contribution looking for the same or similar pre-existing texts and translations.

Briefly, talking about statistic MT systems we can confirm that

the more you use them (properly), the better they work.

the longer they are online (in the "cloud" or other shared databases) the more

accessible they are to an ever larger number of people.

This is a revolutionary event. Research (see e.g. Plitt Masselot and Pym 2010 & 2012) showed a significant increase in productivity, both in the classroom and at professional level. This is a revolution affecting technology and above all the role and social function of translation itself.

TAUS is an institution promoting MT. They have coined the expression Convergence Era meaning by that MT is not an end in itself, but an essential tool in an era where contents are to be available in all languages and at all times. Translation turns into a utility in its own right, like water, electricity or internet, merging into anything like an app, search tools, social media and Internet of Things. It will not be perfect but a real-time communication need prevails over the linguistic excellence (cf. van der Meer 2014a).

What are the consequences for professionals and translators’ training? We are expecting that statistical MT with its many hybrids is soon going to turn many translators into post- as well as pre-editors and managers of translation systems. There is an urgency in rethinking the basic make-up of our training programmes and updating our translation skill models.

1.2 MT use and connected activities

CAT-Tool integrated MT

CAT-Tool integrated pre-MT

full post-reviewing: human translation and high quality reviewing also known as "publishable quality"

light post-reviewing: human translation and low level reviewing also known as "good enough" or "fit for the purpose" (cfr. TAUS-Evaluation 2010)


source text alteration (abbreviation, simplification) to ease post-reviewing and make it economically viable.

1.3 Other uses

Use MT as a dictionary

Figure 1-MT of a term with GoogleTranslate

Figure 1-MT of a term with GoogleTranslate

An increasingly common practice is to use MT as a dictionary. Google Translate for instance contains various bilingual dictionaries. A term is listed against the corresponding entry. In addition, Google allows its users to create their own customised dictionaries that one can use along with MT. Many online dictionaries offer similar resources.

There is a widespread use of Statistical MT as a tool for finding a specialized term equivalent for sets of languages. In many cases, you can achieve better results in a far shorter time than that needed when using dictionaries and traditional terminology databases.

Find parallel texts with MT

Translation tools such as make use of MT automatically translating the search string entered by the user showing Google results retrieved in two languages on the same page and at the same time.

Figure 21

Figure 21

2 Sustainable communication: the importance in using languages and simplified writing.

Within the MT domain, Controlled Language means writing texts consisting of simple and intelligible sentences with a basic vocabulary and simplified syntax. This kind of writing is similar to other initiatives and recommendations linked with the many forms of sustainable communication, such as the use of languages and simplified writing at both European and international level in various contexts (educational, technical, bureaucratic), web writing and accessibility.

Briefly, a controlled language is based on a set of rules such as, a short phrase length, a basic vocabulary, simplified verbal forms, restricted subordinate phrase, and use of explicit subject. You may check readability indexes by using tools like Gulpease as found in MS Word.

We have summarized the essential principles of controlled language in the acronym BASICO with each letter standing for:

Brief: i.e. less than 25 words per sentence or 500 characters per paragraph

Active: prefer active form instead of passive, avoid gerund and impersonal clauses.

Simple: use simple words; avoid rare or polysemous words and technical terms.

Incisive or trenchant: one idea for one sentence, no subordinates or non-restrictive clauses

Clear: always show subject and object, avoid pronouns and pronominal particles; saying the same thing without fearing repetition or using synonyms for the same word

Optimise for the destination medium

2.1 MA in the multilingual web

If you want to be competitive in foreign markets and attract traffic to your website you must speak the language of your clients or at least make the texts intelligible by those who do not speak your language. There are two choices if you want to boost up online visibility and access new language areas, namely to translate your website into other languages or

make the site available to those using automatic translation tools. Actually, there are more and more people using MT to translate web pages from languages they do not understand. As an instance, Google Translate in 2012, received 200 million requests for translations per month (Och 2012), equivalent to 2.4 billion translation requests per year, 92% of which came from countries outside of the US for languages other than English. If you write translatable texts the potential number of readers/users increase for the benefit of internationalisation.

2.2 Example 1 Expo 2015

Milan Expo 2015 is a great opportunity to attract foreigners in Italy. Twenty million visitors from over 140 countries are expected to visit Expo, the equivalent of 2/3 of all world countries with visitors speaking more than 60 languages!

Regretfully the Expo website is only available in Italian, English and French. Here is one of the earliest texts taken from the presentation section and published on the web:

Expo Milano 2015 intende affrontare la tematica universale e complessa della nutrizione da un punto di vista ambientale, storico, culturale, antropologico, medico, tecnico-scientifico ed economico.

Tale impostazione multidisciplinare crea interessanti intrecci, correlazioni e collegamenti: Expo Milano 2015 propone di affrontare il tema secondo una scansione molto ampia, capace di interrogare e stimolare tutti i livelli della società, affinché emerga la consapevolezza della vastità e della complessità dei fattori che coinvolgono ognuno di noi.

Fin dal Dossier di candidatura il tema generale di Expo Milano 2015 Nutrire il Pianeta, Energia per la Vita è stato declinato nei seguenti sottotemi:

1. Scienza e tecnologia per la sicurezza e la qualità alimentare;

2. Scienza e tecnologia per l’agricoltura e la biodiversità;

3. Innovazione della filiera agroalimentare […]

The text posted on the site is 11,843 characters: too long for a webpage. Even for Italians such a text is convoluted and tortuous.

2.2.1 The following is an unedited MT of the above text into English

Expo 2015 thinks to address the issue of universal and complex nutrition from the point of view of environmental, historical, cultural, anthropological, medical, scientific-technical and economic. This multidisciplinary approach creates interesting plots, correlations and connections: Milan Expo 2015 aims to address the issue according to a scan very large, capable of interrogating and stimulate all levels of society to show awareness of the vastness and complexity of the factors that affect each our [ ]

2.2.2 The following is an unedited MT of the above text into German

Expo Milano 2015 beabsichtigt, die Frage der universellen und komplexe Ernährung aus anzugehen Sicht, historische organic, kulturelle, anthropologische, medizinische, wissenschaftlich-technischen und wirtschaftlichen. Dieser Ansatz multidisziplinäre schafft interesting Grundstücke, Zusammenhänge und Verbindungen: Milan Expo 2015 zielt darauf ab, die Frage nach einem sehr groß Scan ist, in der Lage, Abfragen und regen to Ebenen der Gesellschaft das Bewusstsein für die Weite und Komplexität der Faktoren, die Einfluss auf die jeweils hervor Adresse von uns [---]

2.2.3 Rewriting the original Italian text

This is an excerpt of the rewritten source text using a controlled language. From 11,000 characters, we shrank the text to a puny 4,200 Ch.

Expo Milano 2015 Nutrire il Pianeta, Energia per la Vita ha individuato tre aree tematiche che si sviluppano in diversi sottotemi:

A. Area tecnico-scientificaPrende in esame i processi produttivi, le politiche e i

meccanismi di mercato. I sottotemi sono:

Scienza e tecnologia per la sicurezza e la qualità alimentare;

Scienza e tecnologia per l’agricoltura e la biodiversità;

Innovazione della filiera agroalimentare […]

2.2.4 MT applied to the above rewritten English text

The following is an MT rendering into English using Google Translate. You will soon realise that rewriting bettered the translation, too.

Milan Expo 2015 "Feeding the Planet, Energy for Life" has identified three areas that develop in different sub-themes: a. Technical-scientific

It examines the processes, policies and market mechanisms. The sub-themes are:

Science and technology for safety and food quality;

Science and technology for the agriculture and biodiversity;

Innovation in the food industry [

2.2.5 MT applied to the above rewritten German text

Milan Expo 2015 Den Planet ernähren, Energie für das Leben hat drei Bereiche, die in verschiedenen Unterthemen entwickeln identifiziert:


A. Technisch-wissenschaftliche

ER untersucht die Prozesse, Richtlinien und Marktmechanismen. Die Unterthemen sind:

Wissenschaft und Technologie für Sicherheit und Lebensqualität.

Wissenschaft und Technik für die Landwirtschaft und Biodiversität [ ]

2.3 Example 2: research into the performance of MT

In 2013 we have conducted a research into the efficacy of MT (Galati 2013) based on an adequacy criterion. We translated into six languages a brief text taken from the Encyclomedia with the help of nine online translators.

We deemed the translation appropriateness against the following scale of values:


3-perfectly acceptable translation. Does not need reviewing

2-intelligible translation requiring style adjustments.

1-intelligible translation with inconsistent grammar, language and style errors

0-unintelligible translation. Requires rewriting.

For sake of comparison, we listed the texts on the table below:

Source text

Rewritten text

Poeta italiano.

Poeta italiano.

Studia grammatica e retorica, entrando in

D. studia grammatica e retorica, entrando in contatto con i principali rappresentanti dell'ambiente culturale fiorentino, tra cui Brunetto Latini, Lapo Gianni, Guittone d'Arezzo e Guido Cavalcanti.

contatto con i principali esponenti dell'ambiente culturale fiorentino, tra cui Brunetto Latini, Lapo Gianni, Guittone d'Arezzo

e Guido Cavalcanti.

A partire dal 1292 partecipa alla vita politica

A partire dal 1292 egli prese parte alla vita politica del comune fiorentino, sostenendo la fazione guelfa contro i ghibellini.

del comune fiorentino, sostenendo la fazione

guelfa contro i ghibellini.

Dopo la spaccatura dei guelfi, nel 1302 viene esiliato dai Neri.

Dopo la spaccatura dei guelfi, nel 1302 D. venne esiliato dai Neri.

Con il trattato De vulgari eloquentia conferisce

Con il trattato De vulgari eloquentia D. promuove l'uso del "volgare" nelle scienze e in letteratura, come lui stesso dimostra nel Convivio.


volgare dignità di lingua letteraria e

scientifica, come lui stesso dimostra nel


Esponente del "dolce stil novo", ha tra le sue opere più importanti la Vita Nova, il De Monarchia e la Commedia.

Rappresentante del "Dolce stil novo", egli ha tra le sue opere più importanti la Vita Nova, il De Monarchia e la Commedia.

The result of the study showed that MT definitely improves with text pre-editing.

2.4 Example 3: PROMAC site

In 2014 we have conducted a different study by taking a multilingual website sample of an Italian coffee machines manufacturer, namely Promac Italy. The study compared the quality of English, French, Spanish and German translations on the website pages against Google MT by applying the said acceptability criterion. The two texts were rather different; one more descriptive about the Company, the other more technical about their produce. Firstly, we assessed the quality of translations on the website ranging from poor to sufficient. MT quality averaged equal or lower than the website translation when using the Italian original source text with very different results in terms of language and type of translated text. After a bland pre-editing on the Italian text however, Google MT quality faired better than human translation on this site on average.

Figure 3 2

Figure 3 2

The study confirms a previously verified hypothesis: the combined use of a simplified writing and MT may be rewarding in terms of quality and economy.

2.5 Conclusions: why is a "controlled language" useful and profitable?

Italian texts drafted using a controlled language makes them

intelligible by ordinary readers

intelligible by web page readers

easily translatable by anyone who knows our language

better translated with MT

available to those who do not know our language via MT

By controlled language text editing we can save money when it comes to translation.

Its saving ensues from simplifying the texts and reducing its overall length.

Saving also comes from easing the human translator's work:

the final text is easier to translate and the translator needs less time

the final text can be automatically translated and revised by the translator

A simplified text allows web users to use MT in other languages yielding a more intelligible translation.



Translation and the Web in the era of Machine Translation

