Sei sulla pagina 1di 33

nishadkv shm engg college kollam

WELCOME
1

TRANSLATAR: A MOBILE AUGMENTED REALITY TRANSLATOR

nishadkv shm engg college kollam

NISHAD KV No :10 S7 IT

ABSTRACT
TranslatAR is a mobile augmented reality (AR) translation system, using a Smartphone's camera and touch screen, that requires the user to simply tap on the word of interest once in order to produce a translation, presented as an AR overlay.

nishadkv shm engg college kollam

CONTENTS
1) 2) 3) 4) 5) 6) 7) 8) 9)

Introduction What is Augmented Reality? Overview of the system Implementation Details Advantages Disadvantages Future Enhancements Conclusion References
4

nishadkv shm engg college kollam

INTRODUCTION
Written text is one of the most common methods for conveying information in our daily lives. However, when written text is encountered in a language unfamiliar to an individual, the information cannot be conveyed. Using a Smartphone with touch screen and camera as the physical device, we present a system for automatic translation of visual text that has an efficient and easy-to-use input and a natural and compelling form of presentation. The use of our system, TranslatAR, is shown in Fig. 1.
5
nishadkv shm engg college kollam

nishadkv shm engg college kollam

FIG.1

WHAT IS AUGMENTED REALITY ?


  
The goal of augmented reality is to add information and meaning to a real object or place. Augmented reality does not create a simulation of reality. It takes a real object or space as the foundation and incorporates technologies that add contextual data to deepen a persons understanding of the subject.
7
nishadkv shm engg college kollam

OVERVIEW OF THE SYSTEM


 It
was designed such that all expensive operations run in a background thread, while the system maintains interactive frame rates for tracking and augmentation. The systems architecture, shown in Fig. 2. Text Detection 2) Text extraction, Recognition and Translation 3) Visual Tracking 4) AR Overlay
1)
nishadkv shm engg college kollam

nishadkv shm engg college kollam

FIG 2

1. TEXT DETECTION
Given a point c onto which the user tapped, the system first finds the bounding box around the word, then the exact location and orientation of the text within.
1) 2)
nishadkv shm engg college kollam

Bounding box Location & orientation refinement

10

1.1 BOUNDING BOX


    
It is the area around the text to be translated. First the image gradients Ix and Iy are computed. A short horizontal line segment Sh around the input point c is then moved vertically upwards and downwards.(fig 3.a) A short vertical line segment Sv around the input point c is then moved horizontally.(fig 3.b) It is susceptible to fail for non-uniform backgrounds.
nishadkv shm engg college kollam

11

FIG. 3 : TEXT DETECTION IN OPERATION

12

nishadkv shm engg college kollam

1.2 LOCATION &


ORIENTATION REFINEMENT

 Pixels within bounding box are considered.  Lines that cross the vertical line through c at an

nishadkv shm engg college kollam

angle are considered.  A voting scheme is assigned for the task of finding text baselines.  The resulting quadrilateral region of interest is warped into a rectangle, correcting any perspective distortion and showing the text as if seen in.(fig 3.c, fig 3.d)

13

2. TEXT EXTRACTION, RECOGNITION AND TRANSLATION


The warped image produced is used to extract background and foreground color, as well as to read the word via OCR.
1) 2) 3) 4)
nishadkv shm engg college kollam

Foreground and background color estimation OCR Dictionary lookup Translation

14

2.1 FOREGROUND AND


BACKGROUND COLOR ESTIMATION
strong contrast with the background.  The background color is extracted by sampling the borders of the bounding box, assuming that the background will be roughly uniform.  The foreground color estimation is done by scanning pixels starting from a point in the center.  It accepts a sample as an estimation of the foreground color if its intensity is sufficiently different from the background.

 We may assume that the foreground color has a


nishadkv shm engg college kollam

15

2.2 OCR(OPTICAL CHARACTER RECOGNITION)  With the rectified image of the word, we rely on a
standard OCR system recognition of the letters. for extraction and
nishadkv shm engg college kollam

 We

used Tesseract, as it is freely available and was easy to integrate.

 Tesseract

is an OCR engine developed by HP from 1985 to 1995 until the project was continued and released as open source by Google in 2005.

16

OCR (Cont)

 Tesseract is a command line tool that accepts a


TIFF image as input
nishadkv shm engg college kollam

 Tesseract 

is a raw OCR engine that focuses primarily on character recognition accuracy.

It is open source, compiles easily on any platform, and performs all that is necessary for TranslatAR's purpose.

17

2.3 DICTIONARY LOOKUP  It search through a dictionary of valid words to


identify the nearest neighbor with respect to the Levenshtein distance.
nishadkv shm engg college kollam

 The

Levenshtein distance to the found string is computed for each dictionary word within 2 of the length of the found string.

 The word with the smallest distance is taken as

replacement for the original string returned by the OCR.

18

2.4 TRANSLATION  It use Google Translate, an existing free online


translation service, to do the actual text to text translation.
nishadkv shm engg college kollam

 The input language is detected automatically by


Google Translate.

 The

desired output language can be selected by the user in its GUI.

19

20

nishadkv shm engg college kollam

of the word of interest in the live video stream and to present the translation in a live, AR-style overlay.  Several circumstances make tracking in our application easier than it is in the general case:

3. VISUAL TRACKING  Visual tracking enables the system to keep track


nishadkv shm engg college kollam

1) We may assume that the text is displayed on a near-to-planar surface. 2) As the region of interest consists of text, it is automatically well-textured and contains features with high contrast, which is important for tracking.

21

VISUAL TRACKING (Contd.)


3) We are only interested in tracking over short periods of time (as long as it takes the system to obtain the translation + the user to read it). 4) We can assume a cooperative user who will not move the phone jerkily.

22

nishadkv shm engg college kollam

4. AR OVERLAY  Based on the transformation

computed by the tracker, a graphical augmentation is rendered onto the live video screen.

23

nishadkv shm engg college kollam

First a placeholder (please wait...) is displayed while the text is being translated.

24

nishadkv shm engg college kollam

Then, as soon it becomes available, the translation itself is displayed.

25

nishadkv shm engg college kollam

FIG 2

26

nishadkv shm engg college kollam

IMPLEMENTATION DETAILS  This system was implemented on the Nokia


nishadkv shm engg college kollam

N900, which runs on Linux based OS- Maemo.  The code was developed in C++.  The graphical augmentation was implemented in OpenGL ES 2.  HTTP requests to and responses from Googles online translation service are handled with the curl library.

27

in a Linux platform, The application TranslatAR is easy to extend and improve by other developers compared to systems on other platforms.  This application is reasonably simple case for tracking compared to other tracking applications.  Processing is done on the mobile device itself, providing immediate feedback.  No need for other peripheral devices for translation of text.

ADVANTAGES  Since this is implemented

28

nishadkv shm engg college kollam

DISADVANTAGES
of control over the viewfinder's focus currently limits translatAR to relatively large fonts.  The frame rate has to be significantly improved, the most obvious way is to make use of hardware acceleration in many of the image processing & rendering tasks.  Real world imaging problems like illumination variance, glare, and dirt are found to cause significant problems for both text detection and the Tesseract OCR module.

 Lack

29

nishadkv shm engg college kollam

would absorb some of the OCR errors.  Text to voice translation and vice versa.  Translation of text without the use of online services.  Implementation of similar systems in mobile devices other than smart phones.

FUTURE ENHANCEMENTS  Translation of texts in all font formats.  Translation of texts at faster frame rates.  Integration of spell checkers before translation

30

nishadkv shm engg college kollam

CONCLUSION
In this work, we presented a prototype for a real-time, mobile, visual translation system, which requires only a single tap on the word of interest and presents its result as a live AR rendering The application is capable of overlaying an automatically translated text over a region of interest which is extracted and tracked in the video stream of the camera.
nishadkv shm engg college kollam

31

REFERENCES

T. N. Dinh, J. Park, and G. Lee. Low-complexity text extraction in Korean signboards for mobile applications. In Computer and Information Technology, 2008. CIT 2008. 8th IEEE Intl. Conf. on, pages 333 337, 8-11 2008. http://www.wikipedia.org/ B. Epshtein, E. Ofek, and Y.Wexler. Detecting text in natural scenes with stroke width transform. In Proc. IEEE CVPR 2010, July 2010. http://ieeexplore.ieee.org http://ilab.cs.ucsb.edu/index.php/component/content/article/10/14

nishadkv shm engg college kollam

32

THANK YOU

33

nishadkv shm engg college kollam