10 Chapter2

CHAPTER II
NICHE OF THE THESIS
“Karkai Nandre Karkai Nandre! Pichai Puhinum Karkai Nandre” - Aavaiyar
in Aathichudi. One should pursue learning even if he begs to meet his needs.
“Test a servant in the discharge of his duty, a relative in difficulty, a friend in
adversity, and a wife in misfortune” - Arthashastra. The Arthashastra is an ancient
Indian book on statecraft, economic policy and military strategy.
These vital statements from ancient literature and manuscripts are distinct
sources of precious information to various fields. They store, organize and elucidate
information for education, enhancement and for the enrichment of civilization.
Documentation is the process of making documents for every distinct field with
specific standard and document depiction. Thus, the intense meaning of
documentation is to record and communicate information to the society of respective
fields.
In finance and business sectors, the documents are contract agreements made
between companies, proposals for new project or bills used on purchase. In
educational sector, documents are research articles, journals, books, dissertations etc.
In government sector, documents are white papers used in policy/ decision making
and government orders. In media, documents are scripts and screenplays. In science
and technology, all discoveries and inventions are reported as documents. In
archaeology and forensic sciences, pottery fragments of ancient civilization, historical
buildings, manuscript, inscription and sculptures are documents.
Information from documents provides knowledge and conveys messages to
society as symbols, signals, controls, instructions, orders and data. Information of any
form has significance in science and humanities. Information is a source of learning
6
[88] and is a seed for decision making. Organized and available information of olden
days paved way for innovating modern technologies. Principles of modern
technologies have their seed from ancient people’s knowledge. Hence, information of
past and present is to be made available for future society to lead improved lives and
the best way of preserving the same is documentation. Ancient documents are
manuscripts, sculptures, inscriptions and paintings.
2.1 Manuscripts
Preservation of information started with ancient manuscripts and these
manuscripts include records made by hand writings on papers, writings on the wax/
clay tablet, inscription made on hard materials, sculptures, paintings and
hieroglyphics. They also include hand written letters and books created by ancient
people.
2.1.1 Hand Written Manuscripts. Papyrus is an ancient manuscript writing
material obtained from the plant Cyprus papyrus. Cyprus papyrus were cut into long
strips, placed one beside the other [25-27], made wet with the water of Nile, subjected
to strong pressure, dried in the sun, rubbed with shells to render them solid and rolled
on up to a cylinder of wood or bone. Egyptians used papyrus widely. Parchment is a
manuscript writing material that replaced papyrus slowly. Parchment and Vellum [21]
were made of animal’s skin. After preprocessing, these parchments were cut into
leaves, and made into a book. Parchments [22-24] were written on both sides. Jews
use parchment to prepare documents from ancient times of which the Torah Scrolls
are shown in figure 2.1(a). Manuscripts were also chiseled in clay tablet [28], [30]
like the cuneiform writings. Sumerians used clay tablet to prepare manuscripts. Clay
tablets were baked after writing. Figure 2.1(b) depicts Sumerians writings on clay
7
tablet. Romans used wax [29] tablet to prepare notes. Wax tablet is a wooden piece
covered with wax. A stylus was used to write on this wax tablet and these wax tablets
are reusable. Russians used birch bark documents [48]. Instead of chiseling,
Philippines made their manuscripts by punching.
(a) (b) (c)
Figure 2.1 Hand written manuscripts (a) Sefer Torah-traditional Hebrew

Bible-scroll of parchment (b) Clay tablet (c) Thirukkural in palm leaves
Asians used dried, smoothed and smoke-treated palm leaves [31] to prepare
manuscripts. Letters were etched with an iron stylus on palm leaves and lampblack or
turmeric is applied to enhance contrast and legibility [18]. Indians used palm leaves to
record literary [18], [33], [34] and scientific heritages. Palm leaves are shown in
figure 2.1(c). The available information from Indian manuscripts include medicine
[32], siddha, ayurveda, unani system, human anatomy, veterinary sciences,
agriculture, traditional art, temple art, temple architecture, ship building, carpentry,
metal working, sculpture, traditional musicology, techniques of writing, astrology and
astronomy, yoga, animal husbandry, martial arts and physiognomy.
2.1.2 Inscriptions. Inscriptions are engravings on hard materials like stone,
bronze, copper and wood. Many stone inscriptions are available in Indian temples,
palaces, historical buildings and community halls. Indian epigraphy began in second
8
century to depict the kings’ administration, victory, grants and adventures.
Uttaramerur inscription shown in figure 2.2(a) dated around 920 AD [49-51] during
the reign of Paranthaka Chola, talks about Cholas’ village administrative election
process and qualification of nominees. Names of Contestants written on palm leaves
were put into a pot and shuffled. Person whose name is picked up was elected as the
representative, this democratic system practiced by the Chola kingdom is called
Kudavolai system. Contestants whose age is between 35 and 70 should possess one
veli land, house, knowledge about Vedas and mantras and should not be an accused.
It is a unique feature of Chola’s administration. This Uttaramerur inscription is an
outstanding document in knowing Indian Democratic System from ancient days.
(a) (b)
Figure 2.2 Inscriptions (a) Stone inscriptions - Cholas’ Kudavolai system

(b) Iron pillar - Chandragupta Vikramadithya
Tamil copper plate inscriptions have information about grants of villages,
plots of cultivable lands and highly structured taxation. Available copper plate
inscriptions pertain to Cholas, Chalukyas and Vijayanagar kings [37]. They provide
insight into social conditions of medieval south India. Copper plate inscriptions date
from tenth century. Iron pillar inscription available in Delhi, weighs 6 tons, is 7
9
meters tall and is constructed in a single forge [35], [36] as shown in figure 2.2(b). It
talks about the Gupta ruler Chandragupta Vikramadithya. Pillar has not caught rust
even after 1600 years. Huge iron pillar, built in single forge with 98% pure iron,
without rust after 1600 years, proclaim the knowledge of ancient Indians on
Metallurgy. Similar Iron resistant pillars are available in Konark temple of Orissa and
Kodachadri hills of South India. Buddhist inscriptions are available on brass, copper
and ivory sheets.
2.1.3 Sculptures and Paintings. Sculptors chisel sculpture on natural stone.
They produce their own vision image or image that depends upon stone’s nature.
When shape is determined, it is roughed out on the stone and chiseled. Indian temples
are enriched with sculptures. The famous sculpture - The Lion capital of Asoka dates
back to 250BC [52]. Four animals on Saranath Lion capital symbolizes different
phases of Lord Buddha’s life [47]. Four lions standing back to back, on a cylindrical
abacus in the Indian national emblem are adapted from Asoka’s pillar. The wheel
“Asoka Chakra” from its base is placed in the center of Indian National flag as shown
in figure 2.3(a).
Sculptures and shrines of Ajantha caves [44] and Mahabalipuram are carved
on rock. Ajantha caves and Mahabalipuram rock temples are announced as world
heritage sites by the UNESCO. Ajantha caves include temples, sculptures and
paintings while Ajantha paintings depict Jataka tales. Ellora [42,43] caves have both
Buddha and Hindu temples. Natural colors of Ajanta and Ellora [44] paintings shown
in figure 2.3(b) still remain mystery to the world. This is an example of Indian
knowledge on herbal plants used for making painting dyes. Mahabalipuram is an art
treasure of Pallavas [46]. The temples are carved out from rocks and an example of
the same is shown in figure 2.3(c). Historians concluded that Mahabalipuram acted as
10
a learning place for new sculptors [45]. Tajmahal, Madurai Meenakshi Amman
temple, and Thanjai Periya Koil stand for ancient Indian knowledge on architecture.
Information documentation entered into new revolution with the invention of paper.
Paper replaced parchment, vellum and palm leaves. The invention of paper by
Chinese made revolution in the world of writings. Documents were produced and
reproduced in the form of papers. Type writer and printing machine made the
reproduction of information easier.
(a) (b) (c)

Figure 2.3 Sculptures and paintings (a) Indian Government’s emblem-Lion
capital of Asoka (b) Ellora curve wall paintings (c) Varaha mandapam in
Mahabalipuram
2.2 Importance of Manuscripts and Ancient Documents
Information about languages, epics like Ramayana and Mahabharata,
literatures, civilizations like Harappa and Mohanjathara, traditions, culture of a
particular group of people in a particular span of time, art of living, techniques of
various skills in sculptures and paintings, science, technology, architecture,
medicines, history of nations and kings, business, religions, ceremonies, postures of
God and Goddess are presented as manuscripts, inscriptions, paintings and sculptures.
Though the information is written in olden days, they are helpful to younger
generations in knowing history, science and technology, culture, customs and moral
11
values of ancestors and the prosperity of races. Information in the documents, keep
the society philosophically powered and scientifically updated. So a document of any
kind which provides knowledge on any subject is invaluable information to be
preserved. In this electronic era, with the features of digitization, documents are
preserved for long.
2.3 Need for Digitization
Information in the form of manuscripts, inscriptions, paintings and printed
papers are preserved for later generations. Digitization of manuscripts, inscriptions
and paintings makes the information preservation easier. The major advantages of
digitization are,
· Preservation - It preserves and excludes the use of fragile original material.
· Anywhere Access - It facilitates anywhere access according to user’s need.
· Multiple Access - Books in physical format are handled by a single person at a
specific time period while digitized books are accessed by several users
simultaneously.
· Knowledge Sharing - It enhances resource access and leads long life learning.
· Storage - It allows safer storage medium, paperless environment, easy
information maintenance and retrieval.
· Virtual Unification - Unifies material to enhance research by integrating
different sources. Otherwise, these sources exist separately by making it
difficult to integrate and thereby contributing research.
· Easy Reproduction - Reproduction of documents are easy due to digitization.
Due to these advantages of digitization, digitization of documents is of prime
importance for document preservation.
12
2.3.1 Digitization Process. Digitization is the process of converting
continuous analog signal into discrete digital signal and these digitized documents are
sampled versions of continuous real world documents [19]. Numerous tools are
available to digitize manuscripts, inscriptions, sculptures and papers [86]. The various
digitization and allied processing tools are scanner, digital camera and image
processing software.
l
Figure 2.4 Digital Roy image
Scanner converts analogue data into digital data. Digital Camera is a device
used for digitizing documents like sculptures, inscriptions, paintings, historical
buildings, shrine, temples and manuscripts that cannot bear the pressure of scanning.
Camera’s sensor captures light reflected from the object to be captured. An image is a
two dimensional light intensity function, g ( x, y ) where x and y are spatial
coordinates and g is the amplitude at any spatial position ( x, y ) . g ( x, y ) is called the
intensity or gray level of the image at spatial position ( x, y ) [20]. Consider the image
in figure 2.4. This image has certain height, h and length, l . The x -axis coordinate
points are taken vertically downwards along h and y -axis coordinate points are taken
13
horizontally towards right along l. Every coordinate point in the two dimensional
image has a limit. The x coordinate vary from 0 to h and y coordinate vary from 0
to l as
0£ x<h (2.1)
0£ y<l (2.2)
The intensity value g ( x, y ) at a point ( x, y ) is the multiplication of two terms
r ( x, y ) and i ( x, y ) . Here r ( x, y ) and i ( x, y ) respectively are the reflectance and the
intensity of light falling on the object surface at position ( x, y ) . Theoretically, the
value of r ( x, y ) vary from zero to one and the value of i ( x, y ) vary from zero to
infinity. The point ( x, y ) in this image can have the intensity g ( x, y ) value ranging
from zero to infinity. Practically, the intensity value, g ( x, y ) at a point ( x, y ) varies
from minimum intensity value, imin to maximum intensity value, imax . According to
theory of real numbers, infinite number of points exists between any two points. Also
the intensity value, g ( x, y ) lies between minimum intensity value, imin and
maximum intensity value, imax is infinite. Similarly x coordinate points vary from 0
to h and y coordinate points vary from 0 to l and there exist infinite number of
points. This necessitates infinite number of bits to represent an image in computer.
Images are represented in a digital computer with the finite number of bits. Discrete
number representation of image in a digital computer is achieved through digitization
process. Digitization process represents the image as a finite two dimensional matrix
as shown in equation 2.3.
Equation 2.3 shows the matrix representation of an image that has finite
number of rows and columns. Every matrix element represents the discrete intensity
14
value corresponding to discrete image coordinate points. Digitization involves two
processes: sampling and quantization. Sampling is the process of digitizing spatial
coordinates of the image. It is the process of converting the infinite values of image
coordinates to discrete values known as samples and the sampling rate determines the
spatial resolution of the digitized image. Quantization is the process of digitizing
amplitude values of the image.
é g ( 0, 0 ) g ( 0,1) g ( 0, 2 ) g ( 0,3) ù
ê ú
ê g (1,0 ) g (1,1) g (1, 2 ) g (1,3) ú
g ( x, y ) = (2.3)
ê g ( 2, 0 ) g ( 2,1) g ( 2, 2 ) g ( 2,3) ú
ê ú
êë g ( 3,0 ) g ( 3,1) g ( 3, 2 ) g ( 3,3) úû
Sampling points
Quantization
Input signal
In
Sampling
Figure 2.5 Digitization of one dimensional signal
Figure 2.5 shows the digitization of one dimensional signal [20] where each
sample is denoted by dashed lines meeting x axis. The black dots super imposed over
the signal are the samples and these discrete locations are the sampled function. In
quantization, the continuous range of gray level values is converted into discrete
15
values. In the example shown in figure 2.5, the gray level values are divided into
eight discrete levels (3 bits per pixel) ranging from black to white in color. The
continuous gray level values are quantized by assigning one of the eight discrete gray
levels to each sample points super imposed on the signal. Figure 2.6 is an illustration
of digitization process where samples are taken only at the crossing of horizontal and
vertical stripes.
Figure 2.6 Digitization of an image. Blue color

dots are the sampled points
2.4 Digital Image Processing Fundamentals
This section details various methodologies that are applied on images for
different applications to meet different objectives. Figure 2.7 shows various steps in
digital image processing.
· Image Acquisition. Image acquisition is the process of capturing a natural
image and converting it to digital form such as png, jpeg, gif, tif and bmp so
that these captured digital images are processed by a digital computer. The
image is captured by distinct sensors suitable for different applications [20].
16
Wavelets and multi Morphological
resolution processing processing
Image Segmentation
restoration
Image Knowledge base Representation &

enhancement description
Image acquisition Object recognition
Color image Compression

processing
Figure 2.7 Fundamental steps of digital image processing
(a) (b)
Figure 2.8 Image enhancement (a) Low contrast image of grain (b) Enhanced
image of grain
· Image Enhancement. Image enhancement techniques bring improvements in
the obscured details by modifying the image for specific application. It
17
highlights certain features of interest in an image suitable for the given
application. Image enhancement is a very subjective area in image processing
where the enhanced image looks better than the original image [20] for
specific applications. Figure 2.8(a) shows low contrast image of grain and (b)
shows the enhanced image of it.
· Restoration. Image restoration reconstructs or recovers an image that has
been degraded by using a priori knowledge of the degradation phenomenon
[20]. Restoration techniques are based on mathematical or probabilistic
models of image degradation. The degraded pixels are restored with the
unaffected pixels of image in most applications. Figure 2.9(a) shows an image
degraded by periodic noise and (b) shows the image restored from the
corrupted version.
(a) (b)
Figure 2.9 Image restoration (a) Degraded image (b) Reconstructed image
· Color Image Processing. Color image processing use the color features of the
image as shown in figure 2.10. Human can discern thousands of color shades/
18
intensities and hence color image processing has significance as a powerful
descriptor in object identification and feature extraction phases [20].
· Wavelets. Wavelets represent images in various degrees of scale and
resolution. It is mainly used for image compression and restoration.
(a)
(a) (b)
Figure 2.10 Color image processing (a) Original image, (b) Features
of original image are highlighted in different colors
· Compression. Compression techniques reduce the storage space of an image
where the compression ratio depends on the application. Transform based
compression algorithms convert 2D pixel array into a statistically uncorrelated
data set. The transformation is applied prior to storage or transmission of the
image.
(a) (b) (c)
Figure 2.11 Morphological operations (a) Original image (b) Dilated

image of original image (c) Eroded image of dilated image
19
· Morphological Processing. Mathematical morphology is a tool for extracting
image components which are useful in the representation and description of
shapes. Figure 2.11 illustrates the morphological dilation and erosion
operations.
· Segmentation. Image segmentation partitions image into constituent regions/
objects. Figure 2.12(a) shows original image and (b) shows the objects
segmented from its background
(a) (b)
Figure 2.12 Segmentation (a) Original image (b) Segmented objects from
background
· Image Representation and Description. Representation techniques are used
for representing the image in terms of its boundary/ internal/ shape features.
Boundary representation focuses on external shape characteristics such as
corners and inflections. Figure 2.13 shows an image and its chain code
representation. Region based representation focuses on internal properties,
such as texture or skeletons. Description concentrates on feature selection that
extracts the needed attributes to differentiate one class of object from another.
20
· Knowledge Base. Knowledge base helps efficient processing and controls
inter module cooperation of all other processing stages.
Digital image processing starting from image acquisition to image recognition are
classified into low level, mid level and high level processes as shown in figure 2.14.
Low level processes are preprocessing techniques that involve noise filtering, contrast
enhancement and sharpening. In this level, both inputs and outputs are images. In mid
level processing, segmentation, description and classification are involved. In this
level, inputs are images and outputs are image attributes. In high level processing,
objects are identified/ recognized for a specific application.
(a) (b)
(c) (d)
Figure 2.13 Image representation (a) Original image (b) Result of

sampling (c) 4 directional chain code representation of original image
(d) 8 directional chain code representation of original image
21
2.5 Applications of Digital Image Processing
Digital image processing is used in different fields like medicine, remote
sensing, communications, automobiles, environment and forensics [65].
Low level Image Processing
Enhancement
Restoration
Image acquisition
Representation
Segmentation
Compression
Description
Image Storage
Mid- level Image Processing
Transmission
Recognition &Interpretation
High level Image Processing
Figure 2.14 General image processing system
2.5.1 Applications in Medical Field. Image processing techniques in
medicine creates visual representation of internal structures hidden by the skin and
bones for clinical analysis and internal medical intervention. For medical diagnosis,
different types of imaging tools such as X-ray, Ultrasound, Computer aided
Tomography (CT) are used. Image processing techniques like image segmentation
and pattern recognition are used to identify tumor and extract information from
images. In telemedicine, compression algorithms help transmit medical image without
22
loss of valuable information. Figure 2.15(a) shows the normal brain image processed
through CT scan. An example for identifying crack and its direction from the X-ray
image of a hand is shown in figure 2.15(b). Figure 2.15(c) shows abdomen image for
tumor identification.
(a) (b) (c)
Figure 2.15 Medical images (a) CT scan image of brain (b) X-ray image of
broken finger (c) Tumor identified in abdomen image
2.5.2 Applications in Remote Sensing. Remote sensing is the process of
observing remote targets to make useful inferences. Remote sensing observations
consist with gauging of interactions between earth surface materials and
electromagnetic energy. Electromagnetic radiation with different wavelengths of
radiation carries variety of information about the earth’s surface and atmosphere. For
remote sensing applications, sensors capture images of the earth’s surface using
remote sensing satellites or multi – spectral scanner mounted on an aircraft. Remote
sensing provides data for diverse applications like city planning (figure 2.16(a)),
hydrology, agriculture, geology (figure 2.16(b)), weather forecasting as in figure
2.16(c), other planet observation (figure 2.16(d)) and forestry. Image processing
techniques used in remote sensing include image enhancement, image merging,
image classification, multi-spectral processing and texture enhancements.
23
2.5.3 Applications in Visual Communications. For effective
communication, information needs faster transmission. In this internet era,
information are easily transmitted through advanced digital carriers. Video
conferencing help people to communicate from various places. Image compression
algorithms like JPEG, JPEG2000, H.26X standards help transmit data effectively in
video conferencing applications.
(a) (b)
(c) (d)
Figure 2.16 Remote sensing images (a) Urban city planning (b) Geological
hazard of India (c) Weather forecasting (d) Satellite based thermo-physical
analysis of volcanic lactic deposits
2.5.4 Applications in Machine Vision. Machine vision application includes
automatic visual inspection systems. These systems improve the quality and
productivity of products produced in industries by automating the inspection of
24
incandescent lamp filament, surface inspection system, faulty component
identification system and packaged pills as shown in figure 2.17(a).
Automatic target detection and tracking detects and tracks moving objects for
security and surveillance purposes, it finds the trajectory of moving target. It also
monitors the movement of organs in medical applications. Defense surveillance
includes target recognition as shown in figure 2.17(b) and interpretation of aerial
photography as shown in figure 2.17(c).
(a) (b) (c)
Figure 2.17 Machine vision applications (a) Packaged pills strips (b) Target
identification (c) Aerial photography
2.5.5 Applications in Forensic Sciences. Digital image processing is used in
many areas of Forensic sciences. Personal identification is done by biometrics like
face, fingerprint as shown in figure 2.18(a) and iris as shown in figure 2.18(b). Law
enforcement deals with signature verification as shown in figure 2.18(c), hand writing
verification, shredded questioned document reconstruction as shown in figure 2.18(d).
Image processing techniques like edge enhancement, denoising, skeletonisation,
pattern recognition and matching are also used for forensics applications.
25
Thus the concepts of digital image processing convert manuscripts into digital
images composed of pixels. Image Processing techniques also edit and modify the
digital images based on various preservation requirements. Enhancement techniques
are used for increasing the legibility of the document. Restoration techniques are
preprocessing stages for many other techniques to remove acquisition/ transmission
errors. Segmentation techniques extract features of images which are used for
comparing the features like artistic and decorative styles. Etched letter contours are
retrieved by using wavelets [53]. In this thesis, various concepts of digital image
processing are used for the reconstruction of digitized shredded documents.
(a) (b)
(c) (d)
Figure 2.18 Forensic applications (a) Finger print recognition

(b) Iris recognition (c) Signature verification
(d) Shredded document reconstruction
26
2.6 Unfeasibility in Document Preservation
Preservation of document is significant when there are changes in the
document due to deterioration. Climatic variations, sun light, insects, constant
handling, adverse storage and wars deteriorate manuscripts, inscriptions, sculptures
and paintings. The walls found in Greek island of Tera decorated with paintings no
longer exist since they collapsed together with their painted coat due to the volcanic
eruption and strong earthquake [54].
(a) (b)
(c) (d)
Figure 2.19 Various manuscripts’ treasures (a) Water damage - Ellora temple
paintings (b) Tsunami affected - ancient Minoan Athens (c) Fragments of ancient
manuscript (d) Deteriorated palm leaves
The single wall painting scattered into many small fragments and the collected
fragments are a mixture of different wall paintings. Similarly, ceramic vessels and
27
ancient buildings produce hundreds of fragments due to natural or manmade
destructions. These deteriorated manuscripts, inscriptions, wall paintings and pottery
fragments are basic tools to know technical, artistic and decorative changes for
understanding the history and civilization [55]. Sculptures and inscriptions broken
into pieces are the valuable heritage of ancient life style. Reconstruction of
deteriorated document is essential to know the information in them and for the digital
preservation of documents. Reconstruction of original documents from scattered
fragments of wall paintings as shown in figure 2.19(a), (b), pot shreds [61], [62],
fragile fragments of manuscripts as shown in figure 2.19(c), (d), has wide applications
in many disciplines such as archaeology, paleontology, art conservation and forensic
sciences. Archaeology, Art Conservation and Paleontology preserve documents and
extract invaluable information.
2.7 Archaeology
Archaeology is a scientific study of past human culture and behavior. This
field provides knowledge of human culture and behavior through the examination of
material remains of human societies. The material remains include human artifacts
like potsherds, tools, coins, ornaments, ruins of buildings, food remains and human
fossils. These materials exist in nature due to favorable preservation conditions in the
soil or atmosphere. Three major stages of archaeological study are,
· Chronology - finding the age of excavated materials
· Reconstruction - understanding people campsites, settlement, cities,
their lifestyle and their environment
· Explanation - evolving scientific theories about the thoughts and deeds
of people who lived in the past.
28
In the reconstruction phase of archaeological study, human artifacts play an
important role. Potsherds, broken tools and ruins of building need reassembling and
reconstruction of broken down or shredded counterparts to extract actual information
in them [61].
2.8 Art Conservation
The field of art conservation deals with the preservation of ancient art. This
field provides cultural knowledge from arts like wall paintings and hieroglyphics. The
major tasks of art conservation are,
· Preventive conservation - The Art Conservation field that puts forth efforts to
prevent or minimize the causes of deterioration to the art collections is
preventive conservation. Deteriorations in documents occur due to incorrect
handling and packing, vandalism, fire, water, pests, airborne contaminants,
vibration, light-induced damage, temperature and humidity variations.
· Conservation treatment - In conservation treatment, trained personals work on
the art collections that have deteriorated or suffered damage or disfigurement.
Treatment processes include cleaning, consolidation, reinforcement,
stabilization, restoration and protection.
· Research and documentation - Technical analysis and photographic
documentation of art collections are done for further understanding of
fabrication, history and meaning of collections. Techniques such as infrared
photography, radiography, ultraviolet illumination, ultraviolet fluorescence
photography and material identification spot-tests are done for art
documentation.
29
Findings of Archaeology and Art Conservation are preserved in museums and
research centers to serve as information reservoirs. These findings are preserved by
using digitization for long term benefits. In this electronic era, features of digitization
hold good for preservation. The subsequent section details the techniques followed in
forensic sciences to extract information from preserved documents.
2.9 Forensic Sciences
Forensic science collects and analyses evidences to solve criminal and civil
crimes. Subjects like Anatomy, Pathology, Psychiatry, Anthropology, Biometrics,
Chemistry, Botany, Entomology, Geophysics, Geology, Intelligence, Limnology,
Linguistics, Meteorology, Serology etc. contribute forensic science in arriving at
solution to various crimes. The applications of forensic science are involved in cases
concerning blood relationship, mental illness, injury, finding cause of death and
manner of death. Finger prints provide clues about the criminal’s identity. Forensic
sciences contribute to different investigations and some of the sub-areas of forensic
science are briefed in the following sub-sections.
2.9.1 Computational Forensics. Computational forensics involves computer
based modeling, computer simulation, analysis, recognition in studying and solving
problems posed in various forensic disciplines. Computational Forensics integrates
knowledge of expertise from computational science and forensic sciences.
2.9.2 Questioned Document Examination. Questioned document
examination analyses cases of forgery, counterfeit, mail fraud, embezzlement,
organized crime, white collar crime, art crime, theft, robbery, arson and homicide.
Questioned document examination is done for,
· Historical dating to find the age of the document.
30
· Fraud investigation to focus on money trail and criminal intent.
· Paper and Ink Investigation to deal with date, type of the paper used,
watermarks, ink and hard copying machines.
· Forgery investigation to analyze the changes made in documents and photos.
· Stylistics and hand writing (graphologist) analysis to reconstruct character
from semantics, spelling, word choice, syntax and phraseology.
· Shredded document analysis to reconstruct the shredded paper documents
which are important to enforce judiciary.
Forensic investigators are interested in such computer assisted reconstruction
processes to avoid continuous handling of evidence and the use of adhesives [10].
Analysis of manually shredded/ machine shredded paper document is a subfield of
forensic sciences and it is related to federal, civil law enforcements and justice area
[9]. Shredded documents have to be reconstructed back to their original format so that
forensic examiners can analyze them. Questioned documents are destroyed to destroy
evidences needed by forensic department to make conclusion about crimes. Manual
reconstruction of shredded paper fragments is time consuming and tiresome since
prior knowledge of fragmented pieces are not available and it requires many years of
hard work of experienced persons. Reconstruction of fragmented documents is
possible through digital computers with image processing techniques. The image
processing techniques for document reconstruction have their roots from puzzle
solving techniques.
2.10 Puzzle Solving Techniques
Puzzles are a set of irregularly shaped pieces that when properly assembled
form a picture or document. It is found from the literature that John Spilsbury made a
31
jigsaw puzzle out of a wooden map by cutting the borders of countries using jigsaw
[56]. The idea behind his wooden jigsaw puzzle was to create an educational tool to
be used by children to learn geography [57]. Puzzle solving is a pattern recognition
game played from centuries and it needs a matching knowledge of shapes or content
printed on the material or texture. Freeman and Garder visualized jigsaw puzzle
solving through digital computers, classified 2D puzzles as Pictorial and Apictorial
[1]. In Apictorial puzzles, only the shapes of fragment pieces are considered to
reconstruct the original document. Pictorial puzzles account shapes as well as the
texture of the pieces to find the correct solution [60]. The objective of jigsaw puzzle
solving is to arrange the set of given pieces into a single, well fit structure with no
gaps between adjacent pieces [1].
2.10.1 Characteristics of Puzzles. The puzzle solving techniques are based
on predefined characteristics of puzzles defined by Freeman and Garder [1]. The
characteristics are
· Orientation of puzzle pieces: Jigsaw puzzle pieces are usually given without
information about its orientation. Orientation of fragments is provided only for
geographic map based puzzles where north direction is indicated on each
piece.
· Connectedness: An assembled puzzle covers simply connected or multiple-
connected areas. Simply connected areas are mixtures of two or more puzzles
that assemble into many disjoint areas as shown in figure 2.20(a). Puzzles of
multiple connected areas when wrongly connected causes holes between
puzzles as shown in figure 2.20(b).
· Exterior boundary: It defines the shape of outer boundary of a puzzle. The
puzzle has regular/ irregular shapes with or without providing the knowledge
32
of its features such as length and width.
(a) (b)
Figure 2.20 Connectedness of puzzles (a) Simply-connected
puzzles, (b) Multiple-connected puzzle
· Uniqueness: Most available puzzles are unique i.e., the pieces are assembled
properly in only one way. In non unique puzzles, the possibilities of more
than one configuration of interior or exterior boundaries exist. There are
chances that a puzzle is not unique even though its exterior boundary is fixed
and hence there is no direct relation between puzzle and piece uniqueness.
· Radiality: It refers to the kind of interior and exterior junctions in the
assembled puzzle. A tri-radial junction is the junction of three boundary lines
as shown in figure 2.21(a). A quad-radial junction as shown in figure 2.21(b)
joins four boundary lines and a quint-radial junction joins five boundary lines
as shown in figure 2.21(c)
(a) (b) (c)
Figure 2.21 Radiality of puzzles (a) Tri radial (b) Quad radial and
(c) Quint radial junctions
33
Forensic puzzles are intentionally shredded paper documents to destroy
evidences. The extensive need to reproduce original document for forensic
applications necessitates the reconstruction of shredded paper documents using digital
computers. The documents are shredded in two ways, mechanically and manually, the
various characteristic features of mechanically and manually shredded papers are
listed in the following sub sections.
2.11 Characteristic Features of Mechanically Shredded Paper Documents
The outdated documents are destroyed using mechanical shredders where
mechanical shredders shred papers based on cross cut and strip cut methods of which
the strip cut shredders are commonly used in most of the applications. In strip cut
shredders, rotating blades shred paper vertically into rectangles as shown in figure
2.22. Cross-cut shredder has two rotating drums which stamps small rectangles or
diamond shaped pieces.
Ideal features of machine shredded paper rectangle strips [14] are,
· Machine shredded paper fragments have one or two sides that join with the
sides of other fragments.
· The fragments have four corners which represent a rectangle with same
dimension for all the strips.
· The corners of matching fragments join without gaps when assembled
correctly.
· A mechanically shredded strip matches with two other strips except for frame
strips that join with only one strip while performing single paper
reconstruction.
34
(a) (b)
Figure 2.22 Machine shredding (a) Mechanical shredder (b) The shreds
Observed shape and contour representation of machine shredded strips differ from the
ideal features [58] of machine shredded strips. The observation on shape features of
machine shredded rectangle strips are:
· The strips are not exactly rectangular and do not posses same shape.
· During shredding, some pieces or corners are slightly torn/ bent due to
continuous usage of blade and produces gaps between correctly matching
pieces.
2.12 Characteristic Features of Hand Shredded Paper Documents
Ideal features of hand shredded paper document are difficult to predefine since
shredding is due to different emotions of human beings. The observed characteristic
features of manually shredded pieces [8] are:
· Number of edges: A hand shredded piece of paper has arbitrary shape and
has many edges depending on how many times the corresponding document
has been shredded. Each shredded piece contains sudden discontinuities
(corners) in the exterior boundary.
35
· Straight edge: There are chances that a particular torn piece contains straight
edges.
· Shear: Shear occurs while tearing off a piece of paper and the example of the
same is shown in figure 2.23(b).
· Knowledge about the content: There are chances that there exists priori
knowledge about the content of the document.
· Corners: Most of times, the corners of shredded pieces are topologically
different from those of its counterparts due to the nature of hand movements
while tearing the paper.
· Gaps: There are probabilities for small gaps to occur between images of
correctly matching pieces.
· Transformations: There are unpredictable rigid transformations like
translation and rotation of the scanned images of the shredded pieces during
image acquisition like scanning as shown in figure 2.23(a).
· Knowledge about the surface of shredded pieces: Due to lack of
information about the front and back surfaces of the shredded pieces, the
corresponding scanned image of the shredded piece is often from one of its
two surfaces.
Shear
(a) (b)
Figure 2.23 Scanned images (a) Transformation (b) Fragments with shear
36
2.13 Review of Literature
Various approaches and solutions [1-17], [71-74], [76-81] are reported in the
literature for the reconstruction of jigsaw puzzles and shredded paper documents.
This section briefs the general aspects of reconstruction algorithms for jigsaw puzzles
and shredded documents.
Jigsaw puzzle is a tiling puzzle in which discrete pieces are put together to
form a complete picture where each piece interlocks with specific pieces. Right from
the visualization of solving jigsaw puzzle through digital computers by Freeman and
Garder [1], computer techniques are used to manipulate arbitrary geometric patterns,
identify patterns and to solve games. Freeman and Garder’s work remain fundamental
in the field of reconstruction. In this approach [1], chain code of each puzzle piece is
traced to identify corners of puzzle as those points which split the chain code.
Features associated with each chain code between corners are length, distance and
area between corners. Pieces are merged based on their boundary shape information.
Wolfson et al. [2], assembled the border pieces using traveling salesman problem and
corner points in the indents/ outdents are calculated to find the best translation and
rotation parameters to match the inner puzzle pieces. The algorithm solves large
puzzle but it enforces many constraints regarding the shape of puzzle pieces.
Although these algorithms produce good results, these strategies are not practical in
real time applications [87]. B. Burdea and H. Wolfson [4] use shape matching for the
assembly of rectangular jigsaw puzzles. The algorithm uses the characteristic of
jigsaw puzzles that the frame has a prior known rectangle shape. For the automated
assembly, common human heuristic is used for selecting the frame for assembling
first and then the interior puzzles of the document. The puzzle pieces are identified by
one or the two straight line sides. Wolfson [3] describes two curve matching
37
algorithms where the boundaries are represented by shape features obtained by
polygonal approximation. The matching stage finds the longest common substring by
geometric hashing. But the algorithm fails when the number of puzzle pieces is more.
Kong and Kimia [5] introduced re-sampling of boundaries in order to reduce
the complexity of curve matching. The algorithm makes a coarse alignment using
dynamic programming to get a fine scale alignment of the original boundaries. In F.
H. Yao et al. [6], algorithm, the shape matching approach is combined with image
merging process for classifying and joining the puzzle pieces. All puzzle pieces are
classified into defined types and the four sides of individual pieces are extracted by
using dominant points. The matching edge is decided according to image features and
the fragments are assembled by using boundary shape matching technique. G.M
Radack and N.I Badler proposed an algorithm [70] which matches boundaries of
jigsaw puzzle pieces. This algorithm uses polar coordinate system centered at
curvature maxima and minima [6]. This method for general boundary curve
representation and matching do not provide effective reconstruction of canonical
jigsaw puzzle. R.W. Webster et al. [71], uses isthmus critical points to solve jigsaw
puzzle and canonical jigsaw puzzles. The features Isthmus and isthmus critical points
are effective for assembling jigsaw puzzles. D.A.Kosiba et al. [72], proposed the first
method using chromatic information of the puzzle pieces. This approach considers
both shape and color information of the image on the puzzle. Matching process
considers color information along the borders of shapes, curvature parameters,
concavity and convexity of the pieces.
M.G. Chung et al. [73], algorithm uses both color and shape information of
the boundaries. Shape matching is done by calculating the distance from points on
boundary curves determined between the neighboring corner points. P. N. Suganthan
38
et al. [74], proposed a method in solving jigsaw puzzles using neural networks. But
the results obtained by P. N. Suganthan are not satisfactory for real time applications
[6]. In Glassner [76] approach, photograph of entire picture is considered. Then
jigsaw shape is drew on the image using Photoshop and these puzzle pieces are
scattered with fixed orientation. Then these jigsaw puzzle pieces are reassembled by
shape matching and image merging. But this approach is not applicable to reconstruct
real world canonical jigsaw puzzles.
Goldberg et al. [16], algorithm computes centre of the ellipse that fit on the
indents and outdents of the pieces. By computing these points, the algorithm finds the
best translation and rotation for each puzzle piece. Puzzle assembling process starts
with edge pieces using greedy algorithm. Toyama et al. [17], algorithm assembles the
rectangular jigsaw puzzles that are black and white in color. This genetic algorithm
based approach utilizes information from border pixel values as key feature for piece
matching. Zhao et al. [77], presents two algorithms based on human heuristics and ant
colony system (ACS) [78] to solve puzzle pieces based on pictorial information.
Since these methods use only ACS, they lack technical aspects specifically for puzzle
reconstruction [79]. T. R. Nielson et al. [7], algorithm solves classical jigsaw puzzles
by using image features. One pixel wide edge strips are extracted from one side of the
puzzle pieces and an edge detector is used to determine the similarity of the two
adjacent stripes. To assemble the entire puzzle, the algorithm uses adaption algorithm
proposed by Burdea Wolfson [4]. M. Makridis et al. [80], exploits both curve and
color similarity features for matching and assembling puzzle pieces. Characteristic
points of the puzzle pieces are high curvature points which are determined by corner
detection algorithm [81].
39
Skeoch [59] proposed an algorithm for examining image documents. The
strips are compared depending on the information on the borders. The strips of image
documents contain more information whereas strips of text documents mainly consist
of binary data. Although the algorithm attempts to reconstruct single, multiple and
double-sided page documents, it is not suitable for real life data and also for text-
based images. Ukovich et al. [14], identified a method to solve the difficulties of
Skeoch algorithm by matching the curves of edges and information on the pieces with
the constraint that the shapes of pieces are identical. J.Peal et al. [82], algorithm
recognizes the characters at the stripe borders and assembles the matching fragments
subsequently. Optical character recognition system is used to recognize the characters
and the matching fragments are fused based on the Euclidean distance features. This
approach is less suitable for documents with large number and complex construction
of characters like Chinese letters [83]. M. Prandtstetter et al., [84] algorithm
reconstructs cross cut shredded text documents based on variable neighborhood
search and ant colony optimization. These two approaches increase the computation
time significantly. In [15], notebook paper is issued as source material which has
characteristics different from ordinary documents such as uniform size, paper color
and width. Color features describe the kind of paper and color ink used to segregate
them. R. Ranca [85] proposed a score function formulation based on two
probabilistic models, Probscore and ProbrowScore to estimate the likelihood of two
matching edges. Kruskal heuristic search is used for assembling the two matching
shreds based on the scoring function. In this algorithm, the shreds with minimum
information on their edges have high matching possibility when compared to the
wrongly matching shreds.
40
A. Biswas et al. [8], proposed a technique for the reconstruction of hand
shredded pages of documents from images using contours based shape matching
techniques. Chain code of the closed digital arc and its Minkowski sum is used in the
reconstruction phase. The algorithm needs modifications while processing missing
pieces and simultaneously shredded multiple documents. In the method proposed by
Justino et al. [9], polygon approximation is used to reduce the complexity of
boundaries and the extracted geometrical features are used for local reconstruction.
This method works well for the minimum number of hand shredded paper fragments.
When more fragments satisfy the matching criteria to become the pair of a single
fragment, this approach needs technical modifications. De Smet method [10] works
only on partially ordered set of fragments where the onsite recovery of the fragments
guarantees that the fragments are most carefully picked up and stored. The algorithm
works with the constraint that the relative stack position of each fragment with respect
to all other fragments are available and retained. Pimento et al. [11], proposed an
algorithm where geometrical features are extracted from a simplified polygon.
Longest Common Subsequence sum (LCS) score determined based on the features is
fed to modified Prim’s algorithm in the reconstruction phase. Fragments are fused in
disorder when fragment under consideration has more matching fragments with equal
LCS score. S. Cao et al. [12], proposed a method that finds matching between the
fragments by using both geometrical shape and appearance features. In the
reconstruction phase of the algorithm, the rearranged results are obtained by
searching the spanning tree of each subgroup. Though this method deals pieces of
multiple photos, the algorithm is not suitable for reconstructing papers with shear. F.
Richter et al. [13], proposed an algorithm that determines the groups of fragments that
fit together by finding the best match from the spanning tree.
41
Algorithms that work well for machine shredded document do not necessarily
perform equally for hand shredded documents. Comparing the previous works on the
reconstruction of machine shredded or hand shredded documents, it is observed that
the reconstruction algorithms for hand shredded paper needs better concentration in
modeling effective reconstruction procedures to deal with all shredded fragments
without shape constraint, to reconstruct more shredded fragments, to provide remedy
when more fragments satisfy the matching criteria to become a pair of a fragment
side, to identify the matching pair for the fragment’s side affected with shear, to
tackle the features of inappropriate boundary segment in the matching phase and to
reconstruct the document with less time complexity. The thesis concentrates exactly
on the scope for improvement from existing algorithms for the reconstruction of hand
shredded documents and through the course of the work, improved algorithms have
evolved for the automated reconstruction of hand shredded paper documents.
42

10 Chapter2

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

10 Chapter2

Caricato da

Copyright:

Formati disponibili

CHAPTER II

NICHE OF THE THESIS

“Karkai Nandre Karkai Nandre! Pichai Puhinum Karkai Nandre” - Aavaiyar

“Test a servant in the discharge of his duty, a relative in difficulty, a friend in

adversity, and a wife in misfortune” - Arthashastra. The Arthashastra is an ancient

Indian book on statecraft, economic policy and military strategy.

information for education, enhancement and for the enrichment of civilization.

specific standard and document depiction. Thus, the intense meaning of

documentation is to record and communicate information to the society of respective

between companies, proposals for new project or bills used on purchase. In

and technology, all discoveries and inventions are reported as documents. In

archaeology and forensic sciences, pottery fragments of ancient civilization, historical

buildings, manuscript, inscription and sculptures are documents.

Information from documents provides knowledge and conveys messages to

form has significance in science and humanities. Information is a source of learning

days paved way for innovating modern technologies. Principles of modern

manuscripts, sculptures, inscriptions and paintings.

Preservation of information started with ancient manuscripts and these

clay tablet, inscription made on hard materials, sculptures, paintings and

2.1.1 Hand Written Manuscripts. Papyrus is an ancient manuscript writing

on up to a cylinder of wood or bone. Egyptians used papyrus widely. Parchment is a

Philippines made their manuscripts by punching.

(a) (b) (c)

Figure 2.1 Hand written manuscripts (a) Sefer Torah-traditional Hebrew

[32], siddha, ayurveda, unani system, human anatomy, veterinary sciences,

metal working, sculpture, traditional musicology, techniques of writing, astrology and

astronomy, yoga, animal husbandry, martial arts and physiognomy.

2.1.2 Inscriptions. Inscriptions are engravings on hard materials like stone,

process and qualification of nominees. Names of Contestants written on palm leaves

representative, this democratic system practiced by the Chola kingdom is called

It is a unique feature of Chola’s administration. This Uttaramerur inscription is an

outstanding document in knowing Indian Democratic System from ancient days.

Figure 2.2 Inscriptions (a) Stone inscriptions - Cholas’ Kudavolai system

Tamil copper plate inscriptions have information about grants of villages,

and ivory sheets.

2.1.3 Sculptures and Paintings. Sculptors chisel sculpture on natural stone.

reproduction of information easier.

(a) (b) (c)

2.2 Importance of Manuscripts and Ancient Documents

Information about languages, epics like Ramayana and Mahabharata,

literatures, civilizations like Harappa and Mohanjathara, traditions, culture of a

particular group of people in a particular span of time, art of living, techniques of

various skills in sculptures and paintings, science, technology, architecture,

medicines, history of nations and kings, business, religions, ceremonies, postures of

the society philosophically powered and scientifically updated. So a document of any

kind which provides knowledge on any subject is invaluable information to be

preserved for long.

2.3 Need for Digitization

Information in the form of manuscripts, inscriptions, paintings and printed

papers are preserved for later generations. Digitization of manuscripts, inscriptions

· Preservation - It preserves and excludes the use of fragile original material.

· Anywhere Access - It facilitates anywhere access according to user’s need.

· Multiple Access - Books in physical format are handled by a single person at a

· Storage - It allows safer storage medium, paperless environment, easy

information maintenance and retrieval.

· Virtual Unification - Unifies material to enhance research by integrating

different sources. Otherwise, these sources exist separately by making it

difficult to integrate and thereby contributing research.

· Easy Reproduction - Reproduction of documents are easy due to digitization.

Due to these advantages of digitization, digitization of documents is of prime

importance for document preservation.

Figure 2.4 Digital Roy image