Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
CHAPTER 1 INTRODUCTION
1.1 Problem Definition
Almost everybody carries a mobile , it can be used to achieve a lot more than just communicate with each other. Today when an image file containing content is received in form of fax or book, one needs to retype the whole document and then save it in any format that he or she wants. For Example, let us take a real time scenario. A boss is travelling and the secretary finds some data that needs to be approved by the boss. She can scan the document and send it to the boss, which our software can receive and convert it from image to editable text, which the boss can edit and resend it back to the secretary in form of an image that she can print or use for presentation.
1.2 Objective
This Project aims at developing software that can convert an image file to an editable text file using the technology Artificial Neural Networks.
The existing system can convert an image to an editable text. A lot of people today are trying to write their own OCR (Optical Character Recognition) System or to improve the quality of an existing one. This project shows how the use of artificial neural network simplifies development of an optical character recognition application, while achieving highest quality of recognition and good performance. Almost everybody carries a mobile, it can be used to achieve a lot more than just communicate with each other. Today when an image file containing content is received in form of fax or book, one needs to retype the whole document and then save it in any format that he or she wants. For Example, let us take a real time scenario. A boss is travelling and the secretary finds some data that needs to be approved by the boss. She can scan the document and send it to the boss, which our software can receive and convert it from image to editable text, which the boss can edit and resend it back to the secretary in form of an image that she can print or use for presentation. Artificial Neural Networks, usually abbreviated to ANNs, is a recent development tool that is modeled from biological neural networks. The powerful side of this new tool is its ability to solve problems that are very hard to be solved by traditional computing methods (e.g. by algorithms). This work briefly explains Artificial Neural Networks and their applications, describing how to implement a simple ANN for Character recognition.
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping system in an office, or to publish the text on a website. OCR makes it possible to edit the text, search for a word or phrase, storing it more compactly, display or print a copy free of scanning artifacts, and apply techniques such as
2
machine translation, text-to-speech and text mining to it. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.
Developing proprietary OCR system is a complicated task and requires a lot of effort. Such systems usually are really complicated and can hide a lot of logic behind the code. The use of artificial neural network in OCR applications can dramatically simplify the code and improve quality of recognition while achieving good performance. Another benefit of using neural network in OCR is extensibility of the system ability to recognize more character sets than initially defined. Most of traditional OCR systems are not extensible enough because such task as working with tens of thousands Chinese characters, for example, is not as easy as working with 68 English typed character set and it can easily bring the traditional system to its knees!
reports have already been published on this topic. Many commercial establishments have manufactured recognizers of varying capabilities. Handheld, desk-top, medium-size and large systems costing as high as half a million dollars are available, and are in use for various applications. However, the ultimate goal of developing a reading machine having the same reading capabilities of humans still remains unachieved. So, there still is a great gap between human reading and machine reading capabilities, and a great amount of further effort is required to narrow-down this gap, if not bridge it. Todays systems use Traditional Algorithms to accomplish optical character recognition tasks. In these algorithms the steps of execution as well as the complete input set should be known by the programmer. This task is clearly very difficult if not impossible. Further these traditional algorithms are not flexible enough to handle unanticipated inputs. If the algorithm does encounter such a state the system comes crashing down on its knees.
Application, while achieving highest quality of recognition and good performance. The use of artificial neural network in OCR applications can dramatically simplify the code and improve quality of recognition while achieving good performance. Another benefit of using neural network in OCR is extensibility of the system ability to recognize more character sets than initially defined. It is highly portable and overcomes the need carry around a fax machine. We can do the changes we want in the image by converting it into text file. We need not type the whole document manually to do the changes. It saves time and reduces manual work.
Once the error produced by the patterns in the training set is below a given tolerance, the training is complete and the network is presented new input patterns and produces an output based on the experience it gained from the learning process.
The ANN consists of a large number of highly interconnected processing elements (nodes) that are tied together with weighted connections (links) (figure 1.2). Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons. This is true for ANN as well. Learning typically occurs by example through training, or exposure to a set of input/output data (pattern) where the training algorithm adjusts the link weights. The link weights store the knowledge necessary to solve specific problems.
Figure 1.3 Nodes and links connections Originated in late 1950's, neural networks didnt gain much popularity until 1980s a computer boom era. Today ANN is mostly used for the 9 complex real world problems. They are often good at solving problems that are too complex for conventional technologies (e.g., problems that do not have an algorithmic solution or for which an algorithmic solution is too complex to be found) and are often well su1ited to problems that people are good at solving, but for which traditional methods are not. They are good pattern recognition engines and robust classifiers, with the ability to generalize in making decisions based on imprecise input data. They offer ideal solutions to a variety of classification problems such as speech, character and signal recognition, as well as functional prediction and system modelling, where the physical processes are not understood or are highly complex. The advantage of ANN lies in their resilience against distortions in the input data and their capability to learn.
9
10
Figure 1.4 Back Propagation ANN A typical Back Propagation ANN is as depicted above (figure 1.3). The black nodes (on the extreme left) are the initial inputs. Training such a network involves two phases. In the first phase, the inputs are propagated forward to compute the outputs for each output node. Then, each of these outputs is subtracted from its desired output, causing an error [an error for each output node]. In the second phase, each of these output errors is passed backward and the weights are fixed. These two phases is continued until the sum of [square of output errors] reaches an acceptable value. Instead of training a single network to recognize multiple fonts, the network could have been implemented as a bank of single-font networks. However, this approach was not chosen because the individual networks would not be able to benefit from associating the \correct" character of a different font with the \correct" character of their font (and similarly for wrong characters). Creating a single network that can successfully recognize any of the fonts increases redundancy, durability, and complexity of the network. The input layer contains an astronomical 2500 neurons. Since the input images consist of an nm matrix of pixels (50_50), the feature vector consists of a 250 element vector. This vector is fed directly into the neural network.
11
The input layer then connects to a hidden Layer consisting of 100 neurons. The hidden layer then connects to an output layer consisting of 94 neurons, each of which corresponds to a given character class. An artificial neuron is a device with many inputs and one output.
Figure 1.5 Artificial Neuron The firing rule is an important concept in neural networks and accounts for their high flexibility. A firing rule determines how one calculates whether a neuron should fire for any input pattern. It relates to all the input patterns, not only the ones on which the node was trained. Take a collection of training patterns for a node, some of which cause it to fire (the 1-taught set of patterns) and others which prevent it from doing so (the 0-taught set). Then the patterns not in the collection cause the node to fire if, on comparison, they have more input elements in common with the 'nearest' pattern in 1-taught set than with the 'nearest' pattern in 0-taught set. If there is a tie, then the pattern remains in the undefined state. For example, a 3-input neuron is taught to output 1 when the input (X1, X2 and X3) is 111 or 101 and to output 0 when the input is 000 or 001.
12
Then, before applying the firing rule, the truth table is:
X1: X2: X3: 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1
OUT:
0/1
Table 1.1: Before Firing Truth Table As an example of the way the firing rule is applied, take the pattern 010. It differs from 000 in 1 element, from 001 in 2 elements, from 101 in 3 elements and from 111 in 2 elements. Therefore, the 'nearest' pattern is 000 which belongs in the 0-taught set. Thus the firing rule requires that the neuron should not fire when the input is 001. On the other hand, 011 is equally distant from two taught patterns that have different outputs and thus the output stays undefined (0/1). By applying the firing in every column the following truth table is obtained;
X1: X2: X3: 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1
OUT:
0/1
0/1
Figure 1.6 Network For example: The network of figure 1.6 is trained to recognize the patterns T and H. The associated patterns are all black and all white respectively as shown below:
Figure 1.7 Pattern T and H recognition If we represent black squares with 0 and white squares with 1 then the truth tables for the 3 neurons after generalization are as follows;
14
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
OUT:
OUT:
0/1
0/1
0/1
0/1
OUT:
15
From the tables it can be seen the following associations can be extracted:
Figure 1.8 Associations from table In this case, it is obvious that the output should be all blacks since the input pattern is almost the same as the 'T' pattern.
Figure 1.9 T Pattern conclusion Here also, it is obvious that the output should be all whites since the input pattern is almost the same as the 'H' pattern.
Figure 1.10 H Pattern conclusion Here, top row is 2 errors away from the T and 3 from an H. So the top output is black. The middle row is 1 error away from both T and H so the output is random. The bottom row is 1 error away from T and 2 away from H.
16
Therefore the output is black. The total output of the network is still in favour of the T shape.
1.7 Purpose
Time efficient Highly Portable i.e. the software works on almost all system This software is developed as there are day by day increasing demands for Emerging Technologies. Minimum hardware cost is involved
1.8 Scope
Fulfils the Requirement for BPOs. Optical Character Recognition provided can be used for various applications like Image recognition etc. In future it can be applied to Voice Recognition application to produce absolute development and enhancement in voice
recognizing.
1.9 Limitations
It is limited to only black and white font color. It is constructed for only limited fonts.
17
1.10 Deliverables
The optical character recognition using ANN is designed to convert an image document into text so that we can edit the information present in the image document and send it back as an Image document itself after editing it to the user. A massively parallel distributed System that has a natural propensity for storing experimental knowledge and making it available for use.And Highly Time efficient software is to be delivered.
18
MODULES
TRAINING A CHARACTER CHARACTER RECOGNITION CONVERSION TO IMAGE
19
Description : Takes the .cts file of the required font which contains alphabets, numerics and special characters written in the particular font style Then load the corresponding image of the font style And trains all the characters of the image by identifying the lines It basically trains the network Class Used: Form1 Methods Used: load_character_trainer_set() Attributes: file_stream, TextField, Button, picturebox, bitmap
image After training generally all characters of a language the network is saved for that particular font style. It works as follows: 2.1.2 Save Network Input : Output of .cts file Output : Particular network file Description : Creates a network file with .ann extension It contain weights of all characters present in the .cts file of specific style This will be useful in identifying characters of particular font style Class Used: Form1 Methods Used: save_network(), Attributes: file_stream, TextField, Button
20
21
Description : Creates a new bitmap image and copies the input image to it Saves the filename, path, height and width of the input image in different variables Identify number of line and atore it for further references Measures the line top and line bottom of each line and store in respective arrays of line top and line bottom Display the input image Class Used: Form1 Methods Used: load_image(), identify_lines() Attributes: file_stream, TextField, Button, picturebox Once input image is loaded the next step is to recognise each and every characters in the input image file. This works as follows: 2.2.3 Next Character Input : Input image with number of lines and their respective line top and line bottom values Output : Displays the character and also its matrix mapping. And the corresponding character in the poutput string Description : Finds the character bound i.e charaters top, bottom, right and left values and also character heght and width First extracts a single line from the input image and get character bounds of all the characters present in the line
22
This same procedure is repeated for all the lines present in the input image Once it gets the character bounds of a single character Records the character images pixels in a matrix Creates a new bitmap image and copies the detected character onto this image using the pixel mxatrix of the character Displays the detected character Next maps this character onto the matrix using pick sampling pixels method and also store these matrix values of the charater in another array Uses this matrix values array and compare with the arrays of the standard character which is stored in the network file And displays the identified charater in the output tab Repeats the same procedure as above for all characters in the input image Class Used: Form1 Methods Used: detect_next_character(), get_next_character(), analyze_image(), get_character_bounds(), map_character_image_pixel_matrix(), create_character_image(), map_ann_input_matrix(), calculate_outputs() Attributes: file_stream, TextField, Button, picturebox
23
24
25
In software engineering, a functional requirement defines a function of a software system or its component. A function is described as a set of inputs, the behavior, and outputs. Functional requirements may be calculations, technical details, data manipulation and processing and other specific functionality that define what a system is supposed to accomplish. Behavioral requirements describing all the cases where the system uses the functional requirements are captured in use cases. Functional Requirements are those that refer to the functionality of the system, i.e., what services it will provide to the user. OCR coverts an image into text file, which involves Training
Input: Image document Action: Line is extracted, then from the line character is extracted, and the extracted character is converted into matrix and then it has been Grey scaled and then it is converted into pixel value and input vector is prepared. Output: Binary value for converted character. Recognition
Input: Converted input File Action: Loading the image file, then comparing the output with values from training set and the values are converted into text output and the weights and errors are adjusted and it is checked whether it is within the acceptable range and then it is matched with the image file. Output: Text File.
Editing
26
Input: Text document Output: Edited text In Editing, files can be saved, fonts can be set, alignments, color pick up can be done. Conversion: Edited document is converted back into image file. Input: Edited document Output: Image document
extensibility and scalability, which are embodied in the static structure of the software system.
27
3.2.1 Scalability Scalability is a desirable property of a system, a network, or a process, which indicates its ability to either handle growing amounts of work in a graceful manner or to be readily enlarged. For example, it can refer to the capability of a system to increase total throughput under an increased load when resources (typically hardware) are added. An analogous meaning is implied when the word is used in a commercial context, where scalability of a company implies that the underlying business model offers the potential for economic growth within the company. 3.2.2 Reliability Reliability is the ability of a person or system to perform and maintain its functions in routine circumstances, as well as hostile or unexpected circumstances. 3.2.3 Integrity Integrity as a concept has to do with perceived consistency of actions, values, methods, measures, principles, expectations and outcome. People use integrity as a holistic concept, judging the integrity of systems in terms of those systems' ability to achieve their own goals (if any). A value system's abstraction depth and range of applicable interaction may also function as significant factors in identifying integrity due to their congruence or lack of congruence with empirical observation A value system may evolve over time while retaining integrity if those who espouse the values account for and resolve inconsistencies. There are requirements that are not functional in nature. Specifically, these are the constraints the system must work within.
28
Extensibility It should have the extensibility of the system ability to recognize more character sets than initially defined. Robustness It should have good pattern recognition engines and robust classifiers, with the ability to generalize in making decisions based on imprecise input data. Scalability The Application should be scalable in the sense that a new service can be added without affecting the available service.
29
30
These perspectives become evident as the diagram is created and help solidify the design.
between one or more actors (an actor that is the initiator of the interaction may be referred to as the 'primary actor') and the system itself, represented as a sequence of simple steps.
31
32
33
34
CHAPTER 5
SYSTEM IMPLEMENTATION
5.1 High Level Algorithm 5.1.1 Detecting character lines algorithm
1. Start at the first x and first y pixel of the image pixel(0,0), Set number of lines to 0 2. Scan up to the width of the image on the same y-component of the image a. If a black pixel is detected register y as top of the first line b. If not continue to the next pixel c. If no black pixel found up to the width increment y and reset x to scan the next horizontal line 3. Start at the top of the line found and first x-component pixel(0,line_top) 4. Scan up to the width of the image on the same y-component of the image a. If no black pixel is detected register y-1 as bottom of the first line. Increment number of lines b. If a black pixel is detected increment y and reset x to scan the next horizontal line 5. Start below the bottom of the last line found and repeat steps 1-4 to detect subsequent lines 6. If bottom of image (image height) is reached stop.
3. Start at the top of the character found and first x-component pixel(0,chracter_top) 4. Scan up to the line bottom on the same x-component a. If black pixel is detected register x as left of the symbol. b. If not continue to the next pixel c. If no black pixel is detected increment x and reset y to scan the next vertical line 5. Start at the left of the symbol found and top of the current line, pixel(character_left, line_top) 6. Scan up to the width of the image on the same x-component a. if no black pixel is found register x-1 as right of the symbol b. if a black pixel is found increment x and reset y to scan the next vertical line 7. Start at the bottom of the current line and left of the symbol, pixel(character_left, line_bottom) 8. Scan up to the right of the character on the same y-component a. if a black pixel is found register y as bottom of the character b. if no black pixels are found decrement y and reset x to scan the next vertical line
36
37
38
39
public void initialize_weights() { for (int i = 1; i < number_of_layers; i++) for (int j = 0; j < layers[i]; j++) for (int k = 0; k < layers[i - 1]; k++) weight[i, j, k] = (float)(rnd.Next(-weight_bias, weight_bias)); }
public void form_input_set() { for (int k = 0; k < number_of_input_sets; k++) { get_next_character(); label16.Text = (k + 1).ToString(); label16.Update(); for (int i = 0; i < 10; i++) for (int j = 0; j < 15; j++) { input_set[i * 15 + j, k] = ann_input_value[i * 2 + 1, j * 2 + 1]; } } }
40
public void train_network() { int set_number; float average_error = 0.0F; for (int epoch = 0; epoch <= epochs; epoch++) { average_error = 0.0F; for (int i = 0; i < number_of_input_sets; i++) { set_number = rnd.Next(0, number_of_input_sets); get_inputs(set_number); get_desired_outputs(set_number); calculate_outputs(); calculate_errors(); calculate_weights(); average_error = average_error + get_average_error(); } average_error = average_error / number_of_input_sets; if (average_error < error_threshold) { epoch = epochs + 1; progressBar1.Value = progressBar1.Maximum; label22.Text = "<" + error_threshold.ToString();
41
public void get_inputs(int set_number) { for (int i = 0; i < number_of_input_nodes; i++) current_input[i] = input_set[i, set_number]; }
public void get_desired_outputs(int set_number) { for (int i = 0; i < number_of_output_nodes; i++) desired_output[i] = desired_output_set[i, set_number]; }
public void calculate_outputs() { float f_net; int number_of_weights; for (int i = 0; i < number_of_layers; i++) for (int j = 0; j < layers[i]; j++) { f_net = 0.0F; if (i == 0) number_of_weights = 1; else number_of_weights = layers[i - 1];
42
for (int k = 0; k < number_of_weights; k++) if (i == 0) f_net = current_input[j]; else f_net = f_net + node_output[i - 1, k] * weight[i, j, k]; node_output[i, j] = sigmoid(f_net); } }
public float sigmoid(float f_net) { //float result=(float)(1/(1+Math.Exp (-1*slope*f_net))); //Unipolar float result = (float)((2 / (1 + Math.Exp(-1 * slope * f_net))) - 1); //Bipolar } return result;
public int threshold(float val) { if (val < 0.5) return 0; else return 1;
43
public void calculate_errors() { float sum = 0.0F; for (int i = 0; i < number_of_output_nodes; i++) error[number_of_layers - 1, i] = (float)((desired_output[i] node_output[number_of_layers - 1, i]) * sigmoid_derivative(node_output[number_of_layers - 1, i])); for (int i = number_of_layers - 2; i >= 0; i--) for (int j = 0; j < layers[i]; j++) { sum = 0.0F; for (int k = 0; k < layers[i + 1]; k++) sum = sum + error[i + 1, k] * weight[i + 1, k, j]; error[i, j] = (float)(sigmoid_derivative(node_output[i, j]) * sum); } }
public float get_average_error() { float average_error = 0.0F; for (int i = 0; i < number_of_output_nodes; i++) average_error = average_error + error[number_of_layers - 1, i]; average_error = average_error / number_of_output_nodes; return Math.Abs(average_error); }
44
for (int i = 1; i < number_of_layers; i++) for (int j = 0; j < layers[i]; j++) for (int k = 0; k < layers[i - 1]; k++) { weight[i, j, k] = (float)(weight[i, j, k] + learning_rate * error[i, j] * node_output[i - 1, k]); } }
public void load_character_trainer_set() { string line; openFileDialog1.InitialDirectory = "C:\\ocr\\myocr\\Data\\Trainer Sets"; openFileDialog1.Filter = "Character Trainer Set (*.cts)|*.cts"; if (openFileDialog1.ShowDialog() == DialogResult.OK) { character_trainer_set_file_stream = new System.IO.StreamReader(openFileDialog1.FileName); trainer_string = ""; while ((line = character_trainer_set_file_stream.ReadLine()) != null) trainer_string = trainer_string + line; number_of_input_sets = trainer_string.Length; character_trainer_set_file_name = Path.GetFileNameWithoutExtension(openFileDialog1.FileName); character_trainer_set_file_path = Path.GetDirectoryName(openFileDialog1.FileName); label20.Text = character_trainer_set_file_name; character_trainer_set_file_stream.Close();
45
image_file_name = character_trainer_set_file_path + "\\" + character_trainer_set_file_name + ".bmp"; image_file_stream = new System.IO.StreamReader(image_file_name); input_image = new Bitmap(image_file_name); pictureBox1.Image = input_image; input_image_height = input_image.Height; input_image_width = input_image.Width; if (input_image_width > pictureBox1.Width) pictureBox1.SizeMode = PictureBoxSizeMode.StretchImage; else pictureBox1.SizeMode = PictureBoxSizeMode.Normal; right = 1; image_start_pixel_x = 0; image_start_pixel_y = 0; identify_lines(); current_line = 0; character_present = true; character_valid = true; output_string = ""; label36.Text = "Input Image : [" + character_trainer_set_file_name + ".bmp]"; } } public void save_network() { saveFileDialog1.Filter = "Artificial Neural Network Files (*.ann)|*.ann"; saveFileDialog1.FileName = character_trainer_set_file_name; if ((saveFileDialog1.ShowDialog() == DialogResult.OK))
46
{ if (saveFileDialog1.FileName != "") { network_save_file_stream = new StreamWriter(saveFileDialog1.FileName); network_save_file_stream.WriteLine("Unicode OCR ANN Weight values. "); network_save_file_stream.WriteLine("Network Name character_trainer_set_file_name); network_save_file_stream.WriteLine("Hidden Layer Size = " + maximum_layers.ToString()); network_save_file_stream.WriteLine("Number of Patterns= " + number_of_input_sets.ToString()); network_save_file_stream.WriteLine("Number of Epochs = " + epochs.ToString()); network_save_file_stream.WriteLine("Learning Rate learning_rate.ToString()); network_save_file_stream.WriteLine("Sigmoid Slope slope.ToString()); network_save_file_stream.WriteLine("Weight Bias weight_bias.ToString()); network_save_file_stream.WriteLine(""); for (int i = 1; i < number_of_layers; i++) for (int j = 0; j < layers[i]; j++) for (int k = 0; k < layers[i - 1]; k++) { network_save_file_stream.Write("Weight[" + i.ToString() + " , " + j.ToString() + " , " + k.ToString() + "] = "); network_save_file_stream.WriteLine(weight[i, j, k]); } ="+ ="+ ="+ ="+
47
network_save_file_stream.Close(); } } }
public void load_network() { form_network(); openFileDialog1.InitialDirectory = "C:\\ocr\\myocr\\Data\\Networks"; openFileDialog1.Filter = "Artificial Neural Network Files (*.ann)|*.ann"; string line; char[] weight_char = new char[20]; string weight_text = ""; int title_length, weight_length; if ((openFileDialog1.ShowDialog() == DialogResult.OK)) { if (openFileDialog1.FileName != "") {
48
network_load_file_stream = new StreamReader(openFileDialog1.FileName); network_file_name = Path.GetFileNameWithoutExtension(openFileDialog1.FileName); label18.Text = network_file_name; for (int i = 0; i < 9; i++) network_load_file_stream.ReadLine(); for (int i = 1; i < number_of_layers; i++) for (int j = 0; j < layers[i]; j++) for (int k = 0; k < layers[i - 1]; k++) { weight_text = ""; line = network_load_file_stream.ReadLine(); title_length = ("Weight[" + i.ToString() + " , " + j.ToString() + " , " + k.ToString() + "] = ").Length; weight_length = line.Length - title_length; line.CopyTo(title_length, weight_char, 0, weight_length); for (int counter = 0; counter < weight_length; counter++) weight_text = weight_text + weight_char[counter].ToString(); weight[i, j, k] = (float)Convert.ChangeType(weight_text, typeof(float)); } network_load_file_stream.Close(); } } }
49
int y = image_start_pixel_y; int x = image_start_pixel_x; bool no_black_pixel; int line_number = 0; line_present = true; while (line_present) { x = image_start_pixel_x; while (Convert.ToString(input_image.GetPixel(x, y)) == "Color [A=255, R=255, G=255, B=255]") { x++; if (x == input_image_width) { x = image_start_pixel_x; y++; } if (y >= input_image_height) { line_present = false; break; } } if (line_present) { line_top[line_number] = y; no_black_pixel = false; while (no_black_pixel == false) { y++;
50
no_black_pixel = true; for (x = image_start_pixel_x; x < input_image_width; x++) if ((Convert.ToString(input_image.GetPixel(x, y)) == "Color [A=255, R=0, G=0, B=0]")) no_black_pixel = false; } line_bottom[line_number] = y - 1; line_number++; } } number_of_lines = line_number; }
public void load_image() { openFileDialog1.InitialDirectory = ""; openFileDialog1.InitialDirectory = "C:\\ocr\\myocr\\Data\\Sample Images"; openFileDialog1.Filter = "Bitmap Image (*.bmp)|*.bmp"; if (openFileDialog1.ShowDialog() == DialogResult.OK) { System.IO.StreamReader image_file_stream = new System.IO.StreamReader(openFileDialog1.FileName); input_image = new Bitmap(openFileDialog1.FileName); pictureBox1.Image = input_image; image_file_name = Path.GetFileNameWithoutExtension(openFileDialog1.FileName); image_file_path = Path.GetDirectoryName(openFileDialog1.FileName); image_file_stream.Close();
51
input_image_height = input_image.Height; input_image_width = input_image.Width; if (input_image_width > pictureBox1.Width) pictureBox1.SizeMode = PictureBoxSizeMode.StretchImage; else pictureBox1.SizeMode = PictureBoxSizeMode.Normal; right = 1; image_start_pixel_x = 0; image_start_pixel_y = 0; identify_lines(); current_line = 0; character_present = true; character_valid = true; output_string = ""; label36.Text = "Input Image : [" + image_file_name + ".bmp]"; } } public int binary_to_decimal() { int dec = 0; for (int i = 0; i < number_of_output_nodes; i++) dec = dec + output_bit[i] * (int)(Math.Pow(2, i)); return dec; }
public void character_to_unicode(string character) { int byteCount = unicode.GetByteCount(character.ToCharArray()); byte[] bytes = new Byte[byteCount];
52
bytes = unicode.GetBytes(character); BitArray bits = new BitArray(bytes); System.Collections.IEnumerator bit_enumerator = bits.GetEnumerator(); int bit_array_length = bits.Length; bit_enumerator.Reset(); for (int i = 0; i < bit_array_length; i++) { bit_enumerator.MoveNext(); if (bit_enumerator.Current.ToString() == "True") desired_output_bit[i] = 1; else desired_output_bit[i] = 0; } }
public char unicode_to_character() { int dec = binary_to_decimal(); Byte[] bytes = new Byte[2]; bytes[0] = (byte)(dec); bytes[1] = 0; int charCount = unicode.GetCharCount(bytes); char[] chars = new Char[charCount]; chars = unicode.GetChars(bytes); return chars[0]; }
53
int dec; string hex = ""; for (int i = 3; i >= 0; i--) { dec = 0; for (int j = 3; j >= 0; j--) dec = dec + (int)(output_bit[i * 4 + j] * Math.Pow(2, j)); if (dec > 9) switch (dec) { case 10: hex = hex + "A"; break; case 11: hex = hex + "B"; break; case 12: hex = hex + "C"; break; case 13: hex = hex + "D"; break; case 14: hex = hex + "E"; break; case 15: hex = hex + "F"; break; } else hex = hex + dec.ToString(); } return hex; }
public void analyze_image() { int analyzed_line = current_line; comboBox1.Items.Clear(); comboBox1.Items.Clear(); get_character_bounds(); if (character_present) { map_character_image_pixel_matrix(); create_character_image(); map_ann_input_matrix(); } else MessageBox.Show("Character Recognition Complete!", "Unicode OCR", MessageBoxButtons.OK, MessageBoxIcon.Exclamation); }
public void get_character_bounds() { int x = image_start_pixel_x; int y = image_start_pixel_y; bool no_black_pixel = false; if (y <= input_image_height && x <= input_image_width) { while (Convert.ToString(input_image.GetPixel(x, y)) == "Color [A=255, R=255, G=255, B=255]") { x++;
55
if (x == input_image_width) { x = image_start_pixel_x; y++; } if (y >= line_bottom[current_line]) { character_present = false; break; } } if (character_present) { top = y; x = image_start_pixel_x; y = image_start_pixel_y; while (Convert.ToString(input_image.GetPixel(x, y)) == "Color [A=255, R=255, G=255, B=255]") { y++; if (y == line_bottom[current_line]) { y = image_start_pixel_y; x++; } if (x > input_image_width) break; } if (x < input_image_width) left = x; no_black_pixel = true;
56
y = line_bottom[current_line] + 2; while (no_black_pixel == true) { y--; for (x = image_start_pixel_x; x < input_image_width; x++) if ((Convert.ToString(input_image.GetPixel(x, y)) == "Color [A=255, R=0, G=0, B=0]")) no_black_pixel = false; } bottom = y; no_black_pixel = false; x = left + 10; while (no_black_pixel == false) { x++; no_black_pixel = true; for (y = image_start_pixel_y; y < line_bottom[current_line]; y++) if ((Convert.ToString(input_image.GetPixel(x, y)) == "Color [A=255, R=0, G=0, B=0]")) no_black_pixel = false; } right = x - 1; top = confirm_top(); bottom = confirm_bottom();
character_height = bottom - top + 1; character_width = right - left + 1; confirm_dimensions(); if (left - prev_right >= 20) output_string = output_string + " ";
57
prev_right = right;
textBox1.Text = Convert.ToString(top, 10); textBox1.Update(); textBox2.Text = Convert.ToString(left, 10); textBox2.Update(); textBox3.Text = Convert.ToString(bottom, 10); textBox3.Update(); textBox4.Text = Convert.ToString(right, 10); textBox4.Update(); textBox6.Text = Convert.ToString(character_width, 10); textBox6.Update(); textBox7.Text = Convert.ToString(character_height, 10); textBox7.Update(); } else if (current_line < number_of_lines - 1) { current_line++; image_start_pixel_y = line_top[current_line]; image_start_pixel_x = 0; prev_right = 20; output_string = output_string + "\n"; character_present = true; get_character_bounds(); } } else character_present = false; }
58
sample_pixel_y[0] = 0; sample_pixel_y[29] = character_height - 1; sample_pixel_y[19] = (int)(2 * sample_pixel_y[29] / 3); sample_pixel_y[9] = (int)(sample_pixel_y[29] / 3);
sample_pixel_y[4] = (int)(sample_pixel_y[9] / 2); sample_pixel_y[5] = sample_pixel_y[4] + step; sample_pixel_y[2] = (int)(sample_pixel_y[4] / 2); sample_pixel_y[3] = sample_pixel_y[2] + step; sample_pixel_y[1] = sample_pixel_y[0] + step; sample_pixel_y[6] = sample_pixel_y[1] + sample_pixel_y[5]; sample_pixel_y[7] = sample_pixel_y[2] + sample_pixel_y[5]; sample_pixel_y[8] = sample_pixel_y[3] + sample_pixel_y[5]; for (int i = 10; i < 19; i++) sample_pixel_y[i] = sample_pixel_y[i - 10] + sample_pixel_y[9]; for (int i = 20; i < 29; i++) sample_pixel_y[i] = sample_pixel_y[i - 20] + sample_pixel_y[19];
59
sample_pixel_x[3] = sample_pixel_x[2] + step; sample_pixel_x[1] = sample_pixel_x[0] + step; sample_pixel_x[6] = sample_pixel_x[1] + sample_pixel_x[5]; sample_pixel_x[7] = sample_pixel_x[2] + sample_pixel_x[5]; sample_pixel_x[8] = sample_pixel_x[3] + sample_pixel_x[5]; for (int i = 10; i < 19; i++) sample_pixel_x[i] = sample_pixel_x[i - 10] + sample_pixel_x[9];
comboBox1.BeginUpdate(); for (int i = 0; i < 20; i++) comboBox1.Items.Add("[" + (i + 1).ToString() + "] " + sample_pixel_x[i].ToString()); comboBox1.EndUpdate(); comboBox1.BeginUpdate(); for (int i = 0; i < 30; i++) comboBox1.Items.Add("[" + (i + 1).ToString() + "] " + sample_pixel_y[i].ToString()); comboBox1.EndUpdate(); }
public void map_character_image_pixel_matrix() { for (int j = 0; j < character_height; j++) for (int i = 0; i < character_width; i++) character_image_pixel[i, j] = input_image.GetPixel(i + left, j + top); }
60
character_image = new System.Drawing.Bitmap(character_width, character_height); for (int j = 0; j < character_height; j++) for (int i = 0; i < character_width; i++) character_image.SetPixel(i, j, character_image_pixel[i, j]); pictureBox2.Image = character_image; pictureBox2.Update(); }
public void map_ann_input_matrix() { pick_sampling_pixels(); for (int j = 0; j < matrix_height; j++) for (int i = 0; i < matrix_width; i++) { ann_input_pixel[i, j] = character_image.GetPixel(sample_pixel_x[i], sample_pixel_y[j]); if (ann_input_pixel[i, j].ToString() == "Color [A=255, R=0, G=0, B=0]") ann_input_value[i, j] = 1; else ann_input_value[i, j] = 0; } groupBox6.Invalidate(); groupBox6.Update(); }
61
get_next_character(); if (character_present) { for (int i = 0; i < 10; i++) for (int j = 0; j < 15; j++) input_set[i * 15 + j, 0] = ann_input_value[i * 2 + 1, j * 2 + 1]; get_inputs(0); calculate_outputs(); comboBox3.Items.Clear(); comboBox3.BeginUpdate(); for (int i = 0; i < number_of_output_nodes; i++) { output_bit[i] = threshold(node_output[number_of_layers - 1, i]); comboBox3.Items.Add("bit[" + (i).ToString() + "] " + output_bit[i].ToString()); } comboBox3.EndUpdate(); char character = unicode_to_character(); output_string = output_string + character.ToString(); textBox8.Text = " " + character.ToString(); string hexadecimal = binary_to_hex(); label11.Text = hexadecimal + " h"; label11.Update(); richTextBox1.Text = output_string; textBox8.Update(); richTextBox1.Update(); } }
62
private Bitmap CreateBitmapImage(string sImageText) { Bitmap objBmpImage = new Bitmap(1, 1); int intWidth = 0; int intHeight = 0; //CreateAccessibilityInstance the Font for the image textBox1 drawing Font objFont = new Font("Arial", 12, System.Drawing.GraphicsUnit.Pixel); // Create a Graphic Object to measure the text width and height Graphics objGraphics = Graphics.FromImage(objBmpImage); // bmp size is determined intWidth = (int)objGraphics.MeasureString(sImageText, objFont).Width; intHeight = (int)objGraphics.MeasureString(sImageText, objFont).Height; // create the mp image with the correct size for text and font objBmpImage = new Bitmap(objBmpImage, new Size(intWidth, intHeight)); // add the colors to the new bitmap objGraphics = Graphics.FromImage(objBmpImage); // Set the background color
63
objGraphics.Clear(Color.White);
// initialize coor brush objGraphics.DrawString(sImageText, objFont, new SolidBrush(Color.FromArgb(102, 102, 102)), 0, 0);
64
CHAPTER 6 TESTING
Information Processing has undergone major improvements in the past two decades in both hardware and software. Hardware has decreased in size and price, while providing more and faster processing power. Software has become easier to use, while providing increased capabilities. There is an abundance of products available to assist both end-users and software developers in their work. Software testing, however, has not progressed significantly. It is still largely a manual process conducted as an art rather than a methodology. It is almost an accepted practice to release software that contains defects. Software that is not thoroughly tested is released for production. This is true for both off-the-shelf software products and custom applications. Software vendor and in-house systems developers release an initial system and then deliver fixes to the code. They continue delivering fixes until they create a new system and stop supporting the old one. The user is then forced to convert to the new system, which again will require fixes. In-house systems developers generally do not provide any better level of support. They require the users to submit Incident Reports specifying the system defects. The Incident Reports are then assigned a priority and the defects are fixed as time and budgets permit.
conditions, test data, and expected results are generally created manually.
65
System testing is also one of the final activities before the system is released for production. There is always pressure to complete systems testing promptly to meet the deadline. Nevertheless, systems testing are important. In mainframe when the system is distributed to multiple sites, any errors or omissions in the system will affect several groups of users. Any savings realized in downsizing the application will be negated by costs to correct software errors and reprocess information. Software developers must deliver reliable and secure systems that satisfy the users requirements. A key item in successful systems testing is developing a testing methodology rather than relying on individual style of the test practitioner. The systems testing effort must follow a defined strategy. It must have an objective, a scope, and an approach. Testing is not an art; it is a skill that can be taught. Testing is generally associated with the execution of programs. The emphasis is on the outcome of the testing, rather than what is tested and how its tested. Testing is not a one-step activity; execute the test. It requires planning and design. The tests should be reviewed prior to execution to verify their accuracy and completeness. They must be documented and saved for reuse. System testing is the most extensive testing of the system. It requires more manpower and machine processing time than any other testing level. It is therefore the most expensive testing level. It is critical process in the system development. It verifies that the system performs the business requirements accurately, completely, and within the required performance limits. It must be thorough, controlled and managed.
66
67
Test cases must be designed in such a way that the test case should have the highest likelihood of finding maximum errors with a minimum amount of time and effort. There are two approaches for designing test cases. One is white box testing and the other is back box testing.
68
Errors in data structures or external database access Performance errors Initialization and terminate errors White box testing is performed early in the testing process. Black box testing is applied during later stages of testing. Black box testing purposefully disregards control structures and focuses on the information domain. Software development has several levels of testing. Unit Testing System Testing Acceptance Testing
69
units of source code are tested to determine if they are fit for use. A unit is the smallest testable part of an application. It allows for automation of the testing process, reduces difficulties of discovering errors contained in more complex pieces of the application, and test coverage is often enhanced because attention is given to each unit. The first level of testing is called unit testing which is done during the development of the system. Unit testing is essential for verification of the code produced during the coding phase. Errors were been noted down and corrected immediately. It is performed by the programmer. It uses the program specifications and the program itself as its source. Thus, our modules are individually tested here. There is no formal documentation required for unit-testing program.
70
groups. Eventually all the modules making up a process are tested together. Beyond that, if the program is composed of more than one process, they should be tested in pairs rather than all at once. Integration testing identifies problems that occur when units are combined. By using a test plan that requires you to test each unit and ensure the viability of each before combining units, you know that any errors discovered when combining units are likely related to the interface between units. This method reduces the number of possibilities to a far simpler level of analysis. The second level of testing includes integration testing. Here different dependent modules are assembled and tested for any bugs that may surface due to the integration of modules. Thus, the administrator module and various visa immigration modules are tested here.
71
production environment. System Testing also evaluates that system compliance with specific functional and non functional requirements both. It is very important to understand that not many test cases are written for the system testing. Test cases for the system testing are derived from the architecture/design of the system, from input of the end user and by user stories. It does not make sense to exercise extensive testing in the System Testing phase, as most of the functional defects should have been caught and corrected during earlier testing phase. The third level of testing includes systems testing. Systems testing verify that the system performs the business functions while meeting the specified performance requirements. It is performed by a team consisting of software technicians and users. It uses the Systems Requirements document, the System Architectural Design and Detailed Design Documents, and the Information Systems Department standards as its sources. Documentation is recorded and saved for systems testing.
72
inputs into the system and verifies that the resulting outputs are correct, without knowledge of the system's internal workings. User acceptance testing (UAT) is the term used when the acceptance tests are performed by the person or persons who will be using the live system once it is delivered. If the system is being built or developed by an external supplier, this is sometimes called customer acceptance testing (CAT). The UAT or CAT acts as a final confirmation that the system is ready for go-live. A successful acceptance test at this stage may be a contractual requirement prior to the system being signed off by the client. The final level of testing is the acceptance testing. Acceptance testing provides the users with assurance that the system is ready for production use; it is performed by the users. It uses the System Requirements document as its source. There is no formal documentation required for acceptance testing. Systems testing are the major testing effort of the project. It is the functional testing of the application and is concerned with following, Quality/standards compliance Business requirements Performance capabilities Operational capabilities
73
CHAPTER 7 SNAPSHOTS
75
76
77
CONCLUSION
Artificial neural networks are commonly used to perform character recognition due to their high noise tolerance. The systems have the ability to yield excellent results. The feature extraction step of optical character recognition is the most important. A poorly chosen set of features will yield poor classification rates by any neural network. Features must be chosen such that they are loss-less or still accurately represent the character. However, loss-less feature extraction does not guarantee good results. Choices for feature extraction include: Use of the entire input image as the feature vector. However
this requires a huge network that must be trained to millions of iterations. Computing the vertical and horizontal projections of the
characters. This is a easy reduction Computing the run lengths of the character, providing a loss-
less reduction in Information. Use hand selected features chosen by human experts to
classify characters. This is a easy reduction. These features vectors are fed into feed-forward, back-propagation artificial neural networks. The output of the network determines the correct character class. Trivial systems work only a single font, while more complex systems can recognize many fonts using the same network. Most systems can accept a character printed in any size font by performing scaling and normalizing before computing the feature vectors.
78
REFERENCES
[1] H.I. Avi-Itzhak, T.A. Diep, and H. Garland. High accuracy optical character recognition using neural networks with centroid dithering. Transactions on Pattern Analysis and Machine Intelligence, 17(2):218224, Feb 1995. [2] Richard G. Casey and Eric Lecolinet. A survey of methods and strategies in character segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(7):690706, July 1996. [3] Tsu-Chang Lee. Structure level adaptation for articial neural networks: theory, applications, and implementations. PhD thesis, Stanford University, Stanford, CA, USA, 1990. Adviser-Allen M. Peterson. [4] N. Mani and B. Srinivasan. Application of articial neural network model for optical character recognition. Systems, Man, and Cybernetics, 1997. Computational Cybernetics and Simulation, 1997 IEEE International Conference on, 3:25172520, October 1997. [5] George L. Nagy, Stephen V. Rice, and Thomas A. Nartker. Optical Character Recognition: An Illustrated Guide to the Frontier. Kluwer Academic Publishers, Norwell, Massachusetts, USA, 1999. [6] E.M. de A. Neves, A. Gonzaga, and A.F.F. Slaets. A multi-font character recognition based on its fundamental features by articial neural networks. Cybernetic Vision, 1996. Proceedings, Second Workshop on, pages 196201, December 1996. J.M. Ramirez, P. Gomez-Gil, and D. Baez-Lopez. On structural adaptability of neural networks in character recognition. Signal Processing, 1996., 3rd International Conference on, 2:14811483, October 1996
79
Curriculum Vitae :
EXAMINATION
INSTITUTION
BOARD/ UNIVERSITY
PERCENTAGE
SESSION
80.5 %
2008-2012
82.5 %
2007
Class X
ICSE Board
87.17 %
2005
EDUCATION
PERSONAL INFORMATION
Date of Birth Gender Fathers Name Permanent Address Email id Contact no Hobbies 09.12.1989 Male Mr. Sunil Kumar Srivastava Village & Post- Tengerpur, Rani ki Sarai, Azamgarh(U.P) -276207 ajayaeccs@gmail.com +91 9411464705 Playing Cricket, Listening Songs.
80
DIVYA CHAWLA
Divya Chawla B.Tech (8th Semester) Computer Science and Engineering. Email id: vickychawla.chawla@gmail.com
EDUCATION
EXAMINATION INSTITUTION BOARD/ UNIVERSITY PERCENTAGE SESSION
75.6%
2008-2012
83.4% 74%
2008 2006
PERSONAL INFORMATION
Date of Birth Gender Fathers Name Permanent Address Email id Contact no Hobbies 09.03.1990 Male Shri Bhim Sen Chawla 371,Adarsh Nagar, Sipri Bazar, Jhansi (U.P) vickychawla.chawla@gmail.com +91 9454959392 Reading novels(Chetan Bhagat and Arpit Dugar), Watching Telgu Movies.
81
KAHKASHAN AHMAD
Kahkashan Ahmad B.Tech (8th Semester) Computer Science and Engineering. Email id: kasha.virgo@gmail.com
EDUCATION
EXAMINATION INSTITUTION BOARD/ UNIVERSITY PERCENTAGE SESSION
Anand Engineering College, Agra St. Patricks Junior College, Agra St. Patricks Junior College, Agra
77.2 %
2008-2012
85.5 % 92.8 %
2008 2006
PERSONAL INFORMATION
Date of Birth Gender Fathers Name Permanent Address Email id Contact no Hobbies 20.02.1990 Female Mr. Syed Rashid Ahmad 15/3, Soron Katra , Shahganj , Agra. kasha.virgo@gmail.com +91 9557860306 Embroidery, Reading articles, collecting and pasting them in my collection.
82
KANIKA AGARWAL
Kanika Agarwal B.Tech (8th Semester) Computer Science and Engineering Email: kanika.303@gmail.com
EDUCATION
Examination B.Tech.(pursuing) Class XII Class X Board/University U.P.T.U. U.P.Board U.P.Board Marks 74.67% 69.2% 62.5% Year 2011 2007 2005
PERSONAL INFORMATION
Date of Birth Gender Fathers Name Permanent Address Email id Contact no Hobbies 03.03.1990 Female Mr. Satish Chand Agarwal 207,Puneet Apartment, Teacher Colony, Jaipur House, Agra(U.P) kanika.303@gmail.com +91 9368049696 Sketching (Portraits etc) Indulging in Artistic and Creative works.
83
NADA KHALEEQUE
Nada Khaleeque B.Tech (8th Semester) Computer Science and Engineering Email: nada_shamsi@yahoo.co.in
EDUCATION:
Examnation 10th Board/Univers Year ity ICSE 2006 Institution Percentage
12th
2008
B.tech
St Anthonys 91.3% Junior College, Agra Mount Carmel 85.6% College, Blore Anand 73%(Avg.) Engineering College,
PERSONAL INFORMATION
Date of Birth 16.01.1990 Gender Female Fathers Name Khaleeque Ahmad Permanent Address 5-A, North Idgah Colony, Agra 282001 Email id nada_shamsi@yahoo.co.in Contact no +91 9368049696 Hobbies Sketching (Portraits etc) Indulging in Artistic and Creative works.
84