Sei sulla pagina 1di 22

Minimum Requirements on E-PUB for Indian languages text layout

This Version: 1.0


Latest Version: http://w3cindia.in/word_pdf/epub_mini_requr.pdf
Working Draft: First working draft

Introduction:
This document describes minimal requirements specifications for Indian
languages text layout required for content format in E-publishing.
This documents covers major issues of E-content in Indian languages in order to
create standardize format of text layout like storage, rendering problems, vertical
writing, margins areas, page numbers, repeated head, line breaking etc and CSS
requirements for Indian languages.
The main purpose of this document is to gather the information from E-publishers
about the page text layout they are using for E-publishing.
Storage Requirements:
1. Support of Unicode 6.2 and IVS
For the global language support EPUB should support Unicode and also
should support SVG Fonts and IVS (Ideographic Variation Sequence).
UNICODE is the Universal character encoding standard, used for
representing text for information processing. Unicode encodes all of the
individual characters used for all the written languages of the world. The
standards provide information about the character and their use. Unicode
uses a 16 bit encoding that provides code point for more than 65000
characters (65536). It assigns each character a unique hexadecimal numeric
value and name.
Reference URL : http://www.unicode.org/versions/Unicode6.2.0/

Copyright@W3C India

Page 1

Common Locale Data Repository is the largest standard repository of


locale data in the world. It is a part of the W3C and Unicode Standard. It
provides locale data in an XML format for use in computer applications. It
facilitates locale-related information sharing among applications regardless
of their domains. Its goal is to provide basic linguistic information for
diverse locales in an open, interoperable form.
This data is usable for localizing applications.
Some examples of the information that CLDR gathers for languages and
territories are:
Date formats
Time Zones
Number formats
Currency and its formats
Measurement Systems
Collation (Sort order) Specification: Sorting, Searching and
Matching
Translations of names for language, territory, script, time zones,
currencies
Script and exemplar characters used by a language.
Calendaring rules, Formats and important dates.
Specification of selected but universal cultural terminologies.
Reference URL: http://cldr.unicode.org/

IVS (Ideographic Variation Sequence): Characters in the Unicode Standard


can be represented by a wide variety of glyphs. Occasionally the need arises
in text processing to restrict or change the set of glyphs that are to be used
to display a character. In special circumstances, this restriction needs to be
expressed in plain text rather than by font selection or some other rich text
mechanism. The Unicode Standard accommodates those circumstances
with variation selectors: the code point of a graphic character can be
followed by the code point of a variation selector to identify a restriction on
the graphic character. The combination of a graphic character and a
Copyright@W3C India

Page 2

variation selector is known as a variation sequence. An Ideographic


Variation Sequence (IVS) is a sequence of two coded characters, the first
being a character with the Unified Ideograph property, the second being a
variation selector character in the range U+E0100 to U+E01EF.
A glyphic subset for a given character is a subset of the glyphs that are
appropriate for displaying that character.
Reference URL : http://www.unicode.org/reports/tr37/

2. Fonts
Open Type fonts convert the Unicode code numbers to their glyphs on the
display interface. They are directly based on Unicode. Open Type provides a
series of enhancements to the TrueType format, the most significant of
which allows PostScript font data to nest inside a TrueType software
wrapper.
Open Type allows type designers and font foundries to create larger
character sets within fonts. Within the parameters of the TrueType and
Type 1 formats, fonts are limited to 256 characters. If a typeface designer
wanted to create an extended ligature set, small caps, swash and alternate
characters, or characters to support multiple languages, these had to be
put into another font. The large character set capabilities of Open Type
allows type designers much more latitude in typeface design, resulting in
better graphic communication.
SVG Fonts: The purpose of SVG fonts is to allow for delivery of glyph
outlines in display-only environments. SVG fonts that accompany Web
pages must be supported only in browsing and viewing situations. Graphics
editing applications or file translation tools must not attempt to convert
SVG fonts into system fonts.
Reference URL: http://www.w3.org/TR/SVG/fonts.html

WOFF (Web Open font format):


This format was designed to provide lightweight, easy-to-implement
compression of the font data, suitable for use in conjunction with the
Copyright@W3C India

Page 3

@font-face CSS declaration. Any TrueType/Open Type/Open Font Format


file can be loss-lessly converted to WOFF for Web use (subject to licensing
of the font data). Once decoded by a user agent, the WOFF font will display
identically to the original desktop font from which it was created.
The WOFF format also allows additional metadata to be attached to the
file; this can be used by font designers to include licensing or other
information, beyond that present in the original font. Such metadata does
not affect the rendering of the font in any way, but may be displayed to the
user on request.
Reference URL: http://www.w3.org/TR/WOFF/

Page text layout Requirements:


The following issues should help in the implementation of text layout for
Indian languages:
Arrangement of Running Heads and Page Numbers

Positioning of all running heads and page numbers in the same book should
be consistent. The following ways might be used for positioning running
heads and page numbers in horizontal writing system:

Copyright@W3C India

Page 4

Positioning of Consecutive Opening Brackets, Closing Brackets, Commas,


Purna virama etc
In cases where multiple punctuation marks, such as opening brackets,
closing brackets, commas, Purna Viram, come one after the other, the
space adjustments are made.
Vertical writing and horizontal writing
When the principal text direction is horizontal, every text including page
headers/footers, page numbers, figure captions, table captions, and table
entries is in horizontal writing mode.

When the principal text direction is Vertical, every text including table
entries is in Vertical writing mode.

Copyright@W3C India

Page 5

Paragraph Adjustment Rules

Line Head Indent at the Beginning of Paragraphs:

A paragraph, a section of a document which consists of one or more


sentences to indicate a distinct idea, usually begins on a new line.
Widow Adjustment of Paragraphs:

The intent of widow adjustment of paragraphs is to avoid that the last


line of a paragraph contains less than a given number of characters. This
is also called "widow" processing.
Mixed Text Composition in Horizontal Writing Mode.

In horizontal writing mode the basic approach is to use proportional


Western fonts. Example of proportional Western fonts used in Indian
languages in horizontal writing mode.
India
Mixed Text Composition in Vertical Writing Mode.
In Vertical writing mode the basic approach is to use proportional
Western fonts. Example of proportional Western fonts used in Indian
languages in Vertical writing mode.

I
n
d
i
a

Copyright@W3C India

Page 6

Styling Requirements:
The following CSS issues should help in the implementation of text layout
for Indian languages:

Drop First letter

The first-letter pseudo-element represents the first letter of the first line of
a block, if it is not preceded by any other content (such as images or inline
tables) on its line. It allows that first letter to be styled individually, without
markup. It may be used for "initial caps" and "drop caps", which are
common typographical effects in text in Latin script.

Vertical & horizontal writing

Vertical arrangement of characters If some string is written in vertical


mode, then writing each character on a new line may not be suitable,

Copyright@W3C India

Page 7

Styling like vertical arrangement of the character in Hindi

Line breaking
Unicode Line Breaking Algorithm UAX #14-(Word wrapping)

Characters not starting a line): A line should not begin with the
characters shown below:
closing brackets (cl-02),
hyphens (cl-03),
dividing punctuation marks (cl-04),
middle dots (cl-05),
full stops (cl-06),
commas (cl-07),
iteration marks (cl-09),
Reference URL: http://unicode.org/reports/tr14/
Reference URL: http://www.w3.org/TR/2007/WD-css3-text-20070306/#linebreaking

Indentation
Sometimes some of the character of a word is indented as in figure-3
the is indented
Example in Bangla:

Copyright@W3C India

Page 8

What should be the solution or rule for such type of styling issue in case
of Indian language Some time people said that styling is done on the
basis of the syllable, but what is the definition of syllable. The definition
of syllable depends on the pronunciation of the word. In the example
the syllables are , , , , but styling is done as
which is not as per the syllable. So we should define the rule instead
of defining it by syllable basis.

Letter spacing

Same thing applies to horizontal spacing as well for Indic languages


styling issues like the Horizontal spacing between characters like C E R T
I F I C A T E the space is given between the every character in case of
English. But in case of Indian language like Bangla, Assamese etc the
space may give not in every character but after some portion of the
character sequence as in figure below:

Reference URL : http://www.w3.org/TR/2007/WD-css3-text-20070306/#letterspacing

Copyright@W3C India

Page 9

Underlining

There is some examples of Indian languages in which Matras are not


readable due to underlining of characters
Hindi -
Punjabi Matras are not readable
Bengali:
Guajarati -
Marathi-
Tamil-
-


Telugu - TV9 " " - 2
When we see these pages on internet, the information is not clearly
readable because if we hyperlink the text in Indian languages some
modifiers (matras) are cut and in Punjabi the underline matches few
matras (Small u). It can create problem in reading the information
correctly. Therefore some changes may be required to be implemented
in CSS standards developed by W3C with respect to Indian languages.
Reference URL: http://www.w3.org/TR/CSS2/text.html#decoration

Copyright@W3C India

Page 10

CSS Embedded fonts

First, add the font to your book files in the normal way, by adding an
@font-face statement at the beginning of your CSS, something like this:
@font-face {
font-family: Prophecy Script;
font-style: normal;
font-weight: normal;
src:url("Fonts/Prophecy_Script.ttf");
}
That makes the font available. To apply it to your text, you have to add it
to one of your styles, also in the CSS:
p.letter {
font-family: "Prophecy Script";
font-weight: normal;
font-style: normal;
font-size: 1em;
margin: 1em 0 0 0;
-webkit-hyphens:none;
}

Copyright@W3C India

Page 11

Reference URL: http://w3cindia.in/cssdocument.html

Copyright@W3C India

Page 12

Styling issues for Urdu:

Horizontal writing for Urdu

Direction of writing: words are written in horizontal lines from right to


left, numerals are written from left to right
Number of letters: 28 (in Arabic) - some additional letters are used in
Arabic when writing place names or foreign words containing sounds
which do not occur in Standard Arabic, such as /p/ or /g/. Additional
letters are used when writing other languages.

Copyright@W3C India

Page 13

First Letter
In Cursive Text like Arabic and Urdu the styling is applied to whole word

Copyright@W3C India

Page 14

Styling Requirements for Mobile:


The following CSS mobile properties must be found for Indian languages
in order to get the proper E-content on mobile

Vertical-align
This property affects the vertical positioning inside a line box of the
boxes generated by an inline-level element.

Text-decoration
This property describes decorations that are added to the text of an
element using the element's color. When specified on or propagated to
an inline element, it affects all the boxes generated by that element, and
is further propagated to any in-flow block-level boxes that split the
inline.
Value: none | [underline || overline || line-through || blink] |

Letter-spacing
This property specifies spacing behavior between text characters.

Text-indent
This property specifies the indentation of the first line of text in a block
container. More precisely, it specifies the indentation of the first box
that flows into the block's first line box. The box is indented with respect
to the left (or right, for right-to-left layout) edge of the line box. User
agents must render this indentation as blank space.

Reference URL: http://www.w3.org/TR/css-mobile/

Copyright@W3C India

Page 15

CSS Speech Module Requirements:


The CSS Speech module provides properties that enable authors to declaratively
control presentational aspects of the aural dimension (e.g. TTS voice, pitch, rate,
and volume levels). These style sheet properties can be used together with visual
properties (mixed media), or as a complete aural alternative to a visual
presentation.
Typical examples include in-car use of an e-book reader, industrial and medical
documentation systems, home entertainment, helping users to learn reading, or
supporting users who have reading difficulties (print disabilities).
Properties

voice-volume
The voice-volume property allows authors to control the amplitude of
the audio waveform generated by the speech synthesizer, and is also
used to adjust the relative volume level of audio cues within the audio
"box" model.

voice-balance
The voice-balance property controls the spatial distribution of audio
output across a lateral sound stage: one extremity is on the left, the
other extremity is on the right hand side, relative to the listener's
position.

speak
The speak property determines whether or not to render text aurally.

speak-as
The speak-as property determines in what manner text gets rendered
aurally, based upon a basic predefined list of possible values.

Copyright@W3C India

Page 16

Pause properties
The pause-before and pause-after properties specify a prosodic
boundary (silence with a specific duration) that occurs before (or after)
the speech synthesis rendition of the selected element, or if any cuebefore (or cue-after) is specified, before (or after) the cue within the
audio "box" model.

Rest properties
The rest-before and rest-after properties specify a prosodic boundary
(silence with a specific duration) that occurs before (or after) the speech
synthesis rendition of an element within the audio "box" model.

Cue properties
The cue-before and cue-after properties specify auditory icons (i.e.
pre-recorded / pre-generated sound clips) to be played before (or after)
the selected element within the audio "box" model.

Voice characteristic properties


a. voice-family
b. voice-rate
c. voice-pitch
d. voice-range
e. voice-stress
f. voice-duration

Reference URL: http://www.w3.org/TR/css3-speech/

Copyright@W3C India

Page 17

E-publishing survey
Types of survey:
1. Online survey
2. Offline survey

1 Online survey: Survey by online form submission.


a) Online analysis
1. Search through websites:
1.1 The categories of websites
a. Online newspapers publishers
b. Portales like rediffmail, indiatimes, yahoo, etc
c. E-publishers (e-book, magazines, entertainment)
d. Mobile VAS content(As per list provided by IAMAI)

1.2 Things needs to be search through websites


a. Encoding used
b. File format
c. Image format
d. Number of languages used
e. Frequency and circulation of publishing (daily/monthly, etc)
f. Type of publication (nation/state)
g. Mobile compatible or not
h. Which fonts used for publishing
i. Whether data is rendered flawlessly

b) Offline analysis through e-publishers


1. Survey through relevant contact person belongs to the category mentioned in the
section 1.1
2. Collect the information by filling questionnaire manually

Copyright@W3C India

Page 18

2 Offline survey
1.
2.
3.
4.

Offline newspapers publishers


Offline magazine publishers
Offline course materials publishers (school, institute, college, etc)
Survey through advertisement, email, telephone

Outcome:
I.

Free survey for the identification of the sources.

II.

Survey forms (placed as Annexure I) shall be collected from different organizations


for 12 major languages (Hindi, Bangla, Punjabi, Gujarati, Marathi, Malayalam, Tamil,
Telugu, Assamese, Oriya, Kannada, and Manipuri) and other remaining languages
data as per the availability shall be collected.

III.

Final report should be prepared to clearly bring out an objective and concrete
outcomes so as to use the same for future actions.

The final outcome should also help in the implementation of Indian languages text layout in the
following areas:
1. E-Publishing in Indian languages
- Page Formats for Indian languages Documents.
- Positioning of Running Heads and Page Numbers.
- Positioning of Closing Brackets, Purnaviram at Line End
- Vertical Writing Mode and Horizontal Writing Mode.
- Paragraph Adjustment Rules.
- Mixed Text Composition in Horizontal Writing Mode.
- Mixed Text Composition in Vertical Writing Mode.
2. CSS

First drop letter


Vertical & horizontal writing
Line breaking
Indentation
letter spacing
Underlining
CSS Embedded fonts

Copyright@W3C India

Page 19

Annexure I
E-Publishing related questions
A. General Questions:
1) Does your organization work for India languages publishing?
a. Yes

b. No

2) If no, then do you have any plans to localize your content in Indian Languages?
2. Are you using e publishing? If so, how e- publishing supplements your publishing?
a. Increase your circulation

b. Increase revenue c. Advertisement only

3. Does your organization involve in Indian languages translation services also?


a. Yes

b. No

4. Which file format is most widely compatible in e-publishing?


a. Doc
b. PDf
c. HTML
d. Other (Specify)
5. Which are Indian languages you are using in e-publishing and the corresponding script?
6. 1) Are you using Unicode for content declaration?
a. Yes

b. No

2) If no, which font are you using?


7. Which Encoding are you using for saving the data?
a. Unicode

Copyright@W3C India

b. ISFOC

c. Proprietary Font

d. Others

Page 20

8. Does your organization works for Web development in Indian languages?


a. Yes

b. No

9. Does your organization follow the rules which is mention below:

Vertical writing and horizontal writing

Line Breaking Rules

Ruby and Emphasis Dots

10. What are the proactive measures that your organization had taken up to avoid the bugs
and problems regarding the picture clarity and the simplified use of script as well as
language?
11. How much space or memory you people uses on web server, so that your data can be
easily retrieved over the network
12. What format of the images does your organization uses for the images that you being
publish in your paper?
a. JPEG

b. TIFF

c. BMP

d. Others

13. Are there different formats for electronic/mobile publishing?


a. Yes
b. No
14. Do your publication are suitable for mobile communication devices?
a. Yes
b. No
15. What are some ePub compatible readers/devices?
a. Sonys Reader (Touch Edition)
b. PRS-505
c. Apples iPhone
d. Others

Copyright@W3C India

Page 21

16. What is your experience in mobile publishing?

a. Satisfied

b. Need improvement

c. Unsatisfied

d. Please

specify...............................
17. What are the trends of publishing?
18. What type of problems you phase regarding publication?
19. Frequency and Circulation of Publication?
a. Daily

b. Monthly

c. Quarterly

d. Others

20. Which type of publication do you have?


a. Nation wise

b. State wise

c. Others

B. Questions related to Indic Text layout for EPUB


Give the information of the following:1. EPUB Content fidelity:1.1 Size of the two Columns.
1.2 Margins, padding, borders.
1.3 Trim size and binding margins.
1.4 Position of running head/page number
1.5 Position of page number related to trim size
1.6 Line Gap in horizontal writing mode
2. Format of Table of Contents.
3. How to process incomplete number of lines on a multi Column format Page?
4. How to arrange table related 90 degrees Counterclockwise?
5. How to arrange the lines contains multiple illustration/images?

Copyright@W3C India

Page 22

Potrebbero piacerti anche