Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Late Shri Vishnu Waman Thakur Charitable Trusts Bhaskar Waman Thakur College of Science, Yashwant Keshav Patil College of Commerce, Vidya Dayanand Patil College of Arts. Virar (W).
CERTIFICATE This is to certify that case-study done on Image Processing Bar Code Recognition By SHAYANI S. BATABYAL Seat no 29, in partial fulfillment of B.Sc IT degree. (SEM VI) examination had not been submitted for any other examination and does not form any other course undergone by the guide.
Date:
ACKNOWLEDEGEMENT
It is indeed a matter of great pleasure and proud privilege to be able to present this project on Bar Code Recognition I am thankful to our honourable principal Sir DR. R. Bhagat. We will express our deep regards to our Principal. We are highly indebted to our project guide Prof. Pranali Thakare for her valuable guidance and we wish to record deep sense of gratitude and appreciation for giving form and substance to our project. The completion of the project work is a milestone in the life of students and its execution is inevitable without the cooperation of project guide, professors, librarians and other classmates and seniors who provided with time to time valuable help and advice. Lastly I would like to specially thank the department and nonteaching staff for their support and co-operation throughout the completion of the project It is truly impossible to accredit and recall the debts of all the people who have directly or indirectly helped us in successful completion of project.
INDEX
SERIAL NO. 1 2 3 4 5 6 6.1 7 7.1 8 9 10 10.1 10.2 10.3 10.4 10.5 11 12 13 TOPIC Introduction Uses of Bar codes Benefits of Bar code Types of Bar codes Symbology Choosing an appropriate code for simulation Code UPC-A Structure of UPC Number Reason for choosing UPC-A code MATLAB Simulation for Bar Code scanner Working of Bar Code Image Processing Introduction Algorithm development in bar code recognition Detecting and Rotating Slanted image Advance noise elimination techniques De-blur an image Original de-blur function Future enhancements to image processing techniques Conclusion Bibliography and references PAGE NO. 4 5 6 7 8 10
12 13 14 15 16 17 18 19 22 24 25 26 27
INTRODUCTION
Document Management tools often allow for bar coded sheets to facilitate the separation and indexing of documents that have been imaged in batch scanning applications.
The tracking of item movement, including rental cars, airline luggage, nuclear waste, mail and parcels.
Recently, researchers have placed tiny barcodes on individual bees to track the insects' mating habits.
Many tickets now have barcodes that need to be validated before allowing the holder to enter sports arenas, cinemas, theatres, fairgrounds, transportation etc.
When a manufacturer packs a box with any given item, a Unique Indentifying Number (UID) can be assigned to the box. A relational database can be created to relate the UID to relevant information about the box; such as order number, items packed, qty packed, final destination, etc The information can be transmitted through a communication system such as Electronic Data Interchange (EDI) so the retailer has the information about a shipment before it arrives. Tracking results when shipments are sent to a Distribution Center (DC) before being forwarded to the final destination. When the shipment gets to the final destination, the UID gets scanned, and the store knows where the order came from, what's inside the box, and how much to pay the manufacturer.
The reason bar codes are business friendly is that bar code scanners are relatively low cost and extremely accurate - only about 1/100,000 entries will be wrong.
techniques to decode the bar code. Video cameras use the same CCD technology as in a CCD bar code reader except that instead of having a single row of sensors, a video camera has hundreds of rows of sensors arranged in a two dimensional array so that they can generate an image.
SYMBOLOGY
The mapping between messages and barcodes is called a symbology. The specification of a symbology includes the encoding of the single digits/characters of the message as well as the start and stop markers into bars and space, the size of the quiet zone required to be before and after the barcode as well as the computation of a checksum. Linear symbologies can be classified mainly by two properties:
Continuous vs. discrete: Characters in continuous symbologies usually abut, with one character ending with a space and the next beginning with a bar, or vice versa. Characters in discrete symbologies begin and end with bars; the intercharacter space is ignored, as long as it is not wide enough to look like the code ends. Two-width vs. many-width: Bars and spaces in two-width symbologies are wide or narrow; how wide a wide bar is exactly has no significance as long as the symbology requirements for wide bars are adhered to (usually two to three times more wide than a narrow bar). Bars and spaces in many-width symbologies are all multiples of a basic width called the module; most such codes use four widths of 1, 2, 3 and 4 modules.
Some symbologies use interleaving. The first character is encoded using black bars of varying width. The second character is then encoded, by varying the width of the white spaces between these bars. Thus characters are encoded in pairs over the same section of the barcode. The different bar code symbologies support different types and amounts of data therefore you normally choose a particular symbology based on the type and amount of data that you want to encode in your bar codes. Symbology UPC-A Data Capacity 12 numeric digits - 11 user specified and 1 check digit.
7 numeric digits - 6 user specified and 1 check digit. 8 numeric digits - 7 user specified and 1 check digit. 13 numeric digits - 12 user specified and 1 check digit. Variable length alphanumeric data - the practical upper limit is dependent on the scanner and is typically between 20 and 40 characters. Code 128 is more efficient at encoding data than Code 39 or Code 93. Code 128 is the best choice for most general bar code applications. Code 39 and Code 128 are both very widely used while Code 93 is rarely used. Variable length numeric data - the practical upper limit is dependent on the scanner and is typically between 20 and 50 characters. Data can consist of any type of data including binary or alphanumeric and be up to 3116 bytes in length. Data can consist of any type of data including binary or alphanumeric and be up to 3750 bytes in length. Maxicode can hold up to 93 alphanumeric characters or 138 numeric digits. Maxicode is used almost exclusively for United Parcel Service package identification. PDF417 is a little more complex and it is difficult to say exactly what its capacity is because it depends greatly on the type of data that you encode in a PDF417 symbol as well as the amount of error correction capacity that you choose to use in a PDF417 symbol. For general binary data with no error correction enabled, a single PDF417 symbol can hold up to 1108 bytes. If the data consists of all numeric digits, then a single PDF417 symbol can hold up to 2725 digits. If the data consists of alphanumeric data, you can encode a maximum of 1850 bytes. If you have a mix of alphanumeric and binary data, the capacity will be somewhere between 1108 and 1850 bytes and will depend on the content of the data.
I 2 of 5
Data Matrix
Aztec
Maxicode
PDF417
10
All of our bar code software products use an extremely efficient encoding algorithm that will squeeze the maximum number of bytes possible into a PDF417 symbol however it still must work within the limits of the symbology specification.
11
9: 0001011 3-1-1-2 On the other hand, the right hand side bit patterns relating to each digit are essentially ones complements of the left hand side pattern. They have an even parity and start with a bar. The bit patterns are as follows: Right Hand Side Codes (Remember these are the ones complement!): 0: 1110010 1: 1100110 2: 1101100 3: 1000010 4: 1011100 5: 1001110 6: 1010000 7: 1000100 8: 1001000 9: 1110100 One thing to note is that at the start and end of each barcode, there is a particular bit pattern, 101, indicating to the deciphering program where the barcode is initialised and concluded. Barcodes, as stated above, due to the differing widths of the bars and spaces can essentially be composed of binary code, i.e. 1s or 0s. Thinking about the notion of barcode recognition within MATLAB is probably much simpler as we need to determine a way to convert the barcode image of bars and spaces into perhaps a graph, with the vertical axis ranging from 0 to 1, corresponding to the bars and spaces, etc. To convert the image to a graph as such would then allow us to manipulate MATLAB to read the graph and determine the numbers involved. The UPC-A barcode is the most common and well-known symbology in the United States. You can find it on virtually every consumer goods in your local supermarket, as well as books, magazines, and newspapers.
12
There are a number of UPC variants, such as UPC-E, UPC 2-digit Supplement, UPC 5-digit supplement. UPC-A encodes 11 digits of numeric data along with a trailing check digit, for a total of 12 digits of barcode data. 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
NS 0 1 2 3 4 5 6 7 8 9
Description Regular UPC code Reserved Weight Items Drug/Health Items In-store use on non-food items Coupons Reserved Regular UPC code Reserved Reserved
13
1. Number System: The number system is the first digit in the UPC number to identify the type of the product. For example, if the barcode starts with digit 5, this barcode is a coupon code. 2. Manufacturer Code: The manufacturer code is assigned by UCC council to each manufacturer or company which distributed goods that uses UPC-A barcode. Note that UCC has started to assign manufacturer code longer than 5 digits to conserve the numbering resource. 3. Product Code: The product code is assigned by the manufacturer. The product code is a 5-digit number so it can accommodate 99,999 possible product codes for each manufacturer. That is far enough for any manufacturer in the world! 4. Check Digit: The check digit is used to verify that the barcode is generated or scanned correctly. The check digit is calculated based on the rest of the barcode digits. Read the following section to learn how to calculate the check digit.
The main purpose of selection of this code is that the length of the barcode is always a standard 95 bits long. This length does not vary depending on its usage, or the number of products that it represents. Initially, Code 39 was chosen as a standard to run the simulation in, but the length of a code 39 Barcode varies greatly depending on the product for which it is made. Hence, using a standard 95 bit format makes it easier to implement the Image processing algorithms as the image cropping and barcode reading algorithms become easier to define.
14
To provide the user with an interface when he can input any scanned image containing a barcode To correctly scan the barcode segment from the scanned input image Decode the barcode segment Present the user with the decoded output
In order to ensure that the barcode segment is read as accurately as possible, we need to perform the following operations on the scanned image once it has been input by the user:
Image rotation (the barcode is rotated with respect to the camera) Noise (poor signal to noise ratio, bad lighting conditions, image taken through glass, and so on) Blurriness of the image (the camera is out of focus)
Now, Barcode Recognition involves a wide range of activities to ensure that the give image is properly processed and deciphered by the program. This project aims to correctly decode as many images as possible, though it may not be possible to accurately decipher each image. Hence, the order followed to process a scanned image will be : 1) Create a GUI for the user 2) Clean the image by debluring it or by removing noise (if required) 3) Angular rotation of the image in case it isn't properly aligned 4) Barcode Image recognition
15
16
The challenge in this case study is to be able to detect a barcode on an image and we have to account for the following situations: blurriness, slanted barcodes, light intensity of images, noise in images, more than one barcode in the image and upside down barcodes.
17
The following flow chart outlines the image processing algorithm that our group has used in order to recognise a barcode in an image:
18
19
This barcode detection algorithm can also detect if the barcode is slanted and readjust the barcode so that it is straight. Actually very simple maths is behind this part of the algorithm! Using the Bounding Box property of Regionprops this will output the upper-left x and y coordinates, the width and the height of each area found in the image (i.e. the bars in the barcode). From this if we take the x and y values of the first and last bar and calculate their gradient, if this is not zero then the barcode must be slanted. Since we have the gradient we can then calculate the degree to rotate the image. The only major disadvantage is that it will work for barcodes that are rotated -90 to 90 degrees, if the barcode is rotated more than this the resultant barcode will be upside down (i.e. the barcode will still be straightened but it will be upside down). Below shows this part of the algorithm in action!
20
The basic algorithm will go through three elimination stages: 1. Clear any group/s of pixels that are touching the edge/border of the image 2. Clear any group/s of pixels less than 100 pixels^2 in area for image sizes > [300 300] or 50 pixels^2 for those smaller. 3. Clear any groups of pixels that are not straight (since eccentricity of a straight line is 1).
What happens when there is noise in the image, for example letters such as I, L, T etc that make it through the first three elimination stages? This will cause the resulting output image to include other noise not just the barcode. Hence we need to use more of the Region Properties to eliminate this noise. There are several options to the do this. 1. Firstly you can use the property Centroids that outputs the x and y values of the centre of each area in the image. From this you can then calculate the distance between each centroid of a group of pixels in an image. The barcode will have a small distance between each bar of the graph. The advantage here is that if a bit of noise was on the same x axis as the barcode so possibly make it through the above elimination technique, the distance would be too far away from the barcode, hence it would be eliminated. Disadvantage of this technique is that if there is noise scattered around the barcode it will calculate the distances from bars to the noise and thus bars will be eliminated.
21
22
2. This option uses the property area, so it will output the area of each of the groups of pixels left in the image both barcode and noise. Generally by this stage there are just small groups of noise left that weren't eliminated by the 100 threshold or maybe a really large group. Advantage of this option is that it does not matter if noise has made it through to this stage or not and it is easy to calculate ( a lot more efficient than calculating the distance!). Disadvantage if the area of the noise is the same size as the barcode it will still make it through!!!
3. Lastly you can use the property Centroids again this time though we will be doing something different with the x and y values. A barcode would generally be in the same region of the image, so you can use the median and standard deviation calculations to determine the region of the barcode (assuming that if there is any noise it will mainly be random outliers in the image.) Advantage of this technique is that it can identify the region of the image where the barcode is and eliminate any outlier noise. Disadvantage is that if all the noise is eliminated in the first three stages and if the barcode takes up the whole image you may lose bars of the barcode!!! Second disadvantage is if the barcode is on a 45 degree angle may lose outer bars of the barcode. Both are not good!!
23
De-blur an Image
One of the requirements we would like our program to have is the capability of being able to de-blur an image and recognise if there is a barcode on the image and if possible read that barcode. To de-blur an image is quite difficult and it all depends on how badly "blurred" the image is in the first place and it can actually be modelled by the following equation (based on help in Image Processing toolbox): o o o o o g = H*f + n g = blurred image H = distortion operator, also called Point Spread Function (PSF) f = original image n = additive noise.
Now the PSF is important as it describes the degree to which an optical system blurs a point of light. In mathematical terms the PSF is the inverse Fourier Transform of the Optical Transfer function (OTF). The distortion operation when convolved with the image, creates the distortion. The problem with our blurred images will be that we do not know the distortion operator, i.e. if the picture is blurred to begin with we don't know the exact PSF of it. This is why we have implemented the blind deconvolution algorithm because it will perform the deblurring without knowledge of the original PSF. This function is also quite an exhaustive function so it is recommended to have pictures of pixel size less than 1000 by 1000. Below is example of how our original deblur algorithm worked. So we have our original image:
24
We apply the de-blur function that does five steps (this algorithm is based on the help file 'De-blurring Images Using the Blind De-convolution Algorithm) 1. Uses an initial PSF based on a Gaussian distribution, now this was chosen because it best represents distortion of a camera lens being out of focus 2. The first iteration will use an array 4 pixels smaller than the PSF, i.e. UNDERPSF 3. The second iteration will use an array 4 pixels bigger than the PSF, i.e. OVERPSF 4. The third iteration will use a PSF of the same size, INITPSF 5. By using the edge function and changing threshold values we can get an array of "Weighted" values that will help with the final iteration of the blind deconvolution function .
25
This is quite an exhaustive function and to be practical in terms of performance time it was decided to only use the first iteration of UNDERPSF. Furthermore we found that with images that were only slightly blurred putting them through the above five iterations would make the image worse than before and generally the UNDERPSF did a pretty good job of de-blurring these slightly blurred images. Hence we use only the one iteration of the blind de-convolution and as you can see below we were able to successfully de-blur several images and thus read the barcodes. It must be noted that the blurring that occurs in the first place is RANDOM hence this is a very basic de-blur function that will only work on images that happen to be blurred a special way.
26
1. We would like to improve the functionality of the de-blur function so make the user able to input different variables so that other blurred images can be read. At the moment the random Gaussian PSF has fixed variables. This enhanced functionality would require user input fields in the GUI that would then be inputted into the de-blur function. The user would also need to understand the types of variables that they should enter in so for this enhancement there would need to be a help page on suitable types of input variables.
2. We would like to improve the Graythresh technique of converting the image from a uint8 to a binary (black and white) image. In the design process it details about the troubles with light intensity in photos if there is a lot of light some of the black parts in the image will be converted to white instead of black! A way to enhance this would be to look at a histogram of the light intensity of the images and if there is not a clear enough distinction between white and black do an image enhancement function that would attempt to lessen the light intensity in the image. This would require another checkbox function that the user could tick if they had an image that has been distorted by light intensity.
3. We would like to be able to identify two or more barcodes in an image as well. At the moment if there are two barcodes and you tick noise elimination one barcode will be outputted but the function doesn't know that there is another barcode in the image. This would have to do with the region properties function again and seeing the region of the barcodes as two distinct barcodes so output two arrays. This functionality would require changes to the GUI to be able to process two different outputs for different barcodes.
4. We would like to be able to identify 2D barcodes, these are not just made up of bars so our original find barcode function would not work. We would have to come up with another function altogether for finding 2D barcodes in an image.
27
CONCLUSION
Bar code technology is very useful technology. It is very cheap. Highly useful for identifying objects. In this case study, we presented an image processing framework for recognition of ID bar codes. This project aims at simulating a camera based barcode scanner. As already described above, the camera based scanner captures the entire image of a product and then uses advanced image processing techniques to decode it. We process the image (if the user wishes) and perform image cleaning processes of noise removal on it. The de-blurring algorithm is used for the purpose of image modification and image processing to make the image machine readable. The image reading algorithm performs image cropping and image resizing processes on the image to convert the machine readable image into a 95 bit array that makes it very simple to decode according the barcode decoding program. One can use any simple scanner and scan the barcode segment of the image and then process it. The output decoded barcode can be then linked to any database management software, and in this way, the monitoring of issuance of books in a library can be obtained. Similarly, applications where the speed of scanning and the magnitude of images are not large, and where the products do not need to be scanned at an extremely rapid rate, can make use of this software for carrying out daily activities. The UPC-A code was chosen only due to the convenience that it has only a standard length of 95 bits. The barcode recognition algorithm can enable the user to use the same program to read barcodes of other formats.
28
http://www.ukessays.com/essays/informationsystems/bar-code-data.php www.google.com
Barcode. Wikipedia, The Free Encyclopedia. Wikimedia Foundation, Inc. 8. Dec. 2011. Web. 9. Dec.2011. Universal Product Code. Wikipedia, The Free Encyclopedia. Wikimedia Foundation, Inc. 2 Dec. 2011. Web. 2 Dec. 2011