Sei sulla pagina 1di 7

Literature Review: Food Recognition Methods

Suvarna Pansambal, Yamini Tawde, Chetali Surti, Chhaya Patil, Dhara Patel

Department of Computer Engineering,

University of Mumbai, Atharva College of Engineering,

Malad West, Mumbai, India

Abstractpresently a days sharing food visual representations [1] [2] [3] the vast
related photographs via web-based majority of them are constrained to a couple
networking media has turned into a pattern food classes in controlled settings. Precise food
and individuals are searching for the intrigued recognition from just visual data is still a
food dishes and the eateries. So distinguishing troublesome assignment. Rather than articles,
the food things, arranging them and breaking food things are deformable and with high intra-
down have been the subject of top to bottom class inconstancy, e.g. distinctive cooking styles
reviews for different applications identified and seasonings will prompt to various
with food recognition, dietary habits and appearances of a similar food. In addition,
dietary appraisal. This paper gives a wide diverse foods share numerous fixings and
investigation of food recognition methods. It regularly contrasts between some food classes
likewise concentrates on the different feature are hard to distinguish. Additionally the
extraction strategies and in addition the distinction in appearance and presentation of
characterization systems. This paper same dish at different eateries add to the
additionally gives a brief review of the multifaceted nature of perceiving the dish. So
datasets accessible for the food. the food recognition consist of detection of
food in the image and background
Index TermsFeature extraction, food subtraction[10][11]. After that food features are
recognition, multiple food extracted, followed by classification[11] is
recognition,classification. done.In this paper the various food recognition
methods are reviewd in section 2. After that the
I. INTRODUCTION
section 3 gives the brief review of publically
Food related photographs have turned out to available food datasets followed by futurework
be mainstream, because of informal and conclusion. II. LITERATURE SURVEY M. M.
organizations, food suggestion and dietary Zhang et al.[10] utilized attribute-based
appraisal frameworks. Interpersonal interaction classification, they grouped plates of food to
destinations are these days overwhelmed with the right cooking by the nation, utilizing the
Food related photographs. For example, new ingredients as attributes for a plate of food.
pattern is sharing eating out encounters on Initially distinguished the ingredients, gave
informal organizations. Truth be told, every ingredient a likelihood name, and after
individuals are progressively intrigued by finding that utilized pairwise nearby components
and sharing new cooking styles, and knowing among the ingredients to decide the food
more about various parts of the food they classification, by computing the separation,
expend. Many takes a shot at food recognition introduction, and different properties between
have been proposed as of late in view of various every combine of ingredients. To distinguishing
cooking utilizing ingredient, attribute-based The output is a list of regions of interest i.e. list
classification is utilized. In the first place layer of recognized food which are processed by the
incorporates utilization of Earth Movers food class predictor. To predicate the food class
Distance as the low-level element. EMD ends up first global approach extracts the visual features
being hazardous for dishes where just a single from the whole region of interest , second local
ingredient is available. Ground truth for test approach extracts the visual features from local
tests at middle level is the pictures range patches of the region of interest. Final approach
proportion for the particular ingredient combines 2 the posterior probabilities
classifier, as separated from the pictures calculated by the global and local classifiers
attribute vector. In the last layer, anticipating with the sum and product operators. Second is
the cooking classification from the attributes, tray segmentation module used to detect the
this attribute vectors is used to prepare the last regions of interest. It is composed of four main
classifier, with the food class ID as ground truth. steps. First the input RGB image is resized to an
Thus the last order of the cooking styles is height of 320 pixels which undergoes two
separate from the crude pictures, as it uses the separate processing pipeline: a saturation-
middle of the road abnormal state attributes based one, and a color texture one. The
layer to anticipate the outcomes. Y. Maruyama segmentation detects the regions having similar
et al.[14] comprises of work in view of visual characteristics. The segmented image is
FoodLog framework, where the client takes then processed in order to remove non relevant
photographs of the foods and transfers them to regions .The final segmented image contains
the framework, and the framework performs with high probability the food regions and few
picture handling to distinguish food pictures non relevant ones. In overall, the proposed
and decide the food balance. The foodlog segmentation method outperforms the JSEG
framework permits the client to redress the one with an FMeasure of 0.724 against
aftereffects of the framework., They proposed a 0.323and shows that the proposed
technique to make utilization of the recti- segmentation strategy is able to effectively
fications made by the client by Naive Bayes locate the food regions. These images are then
which is one of the Bayesian systems. It fed to a feature extractor where several visual
additionally researches how to enhance the descriptors are computed. They used two
precision by utilizing client criticisms. Initially, classifiers as predictor: the k-Nearest Neighbour
the precision of the execution amongst SVM (k-NN) and Support Vector Machines. The
and Naive Bayes is analyzed.Then, making metric for the evaluation of the error in the tray
utilization of the clients adjustments as input, analysis is the tray accuracy. This is defined as
the Bayesian system is redesigned to enhance the percentage of trays correctly analysed. A
the execution. Ciocca et al [20] proposed a new tray is correctly analysed when all the foods
dataset for the evaluation of food recognition contained are correctly recognized. C. Rother et
algorithms that can be used in dietary al [13] develops the graph cut approach in three
monitoring applications. Each image depicts a regards. Initially, built up an all the more
real canteen tray with dishes and foods effective, iterative variant of the advancement.
arranged in different ways containing multiple Second, the force of the iterative calculation is
instances of food classes. The dataset contains utilized to disentangle significantly the client
1,027 canteen trays for a total of 3,616 food connection required for a given nature of result.
instances belonging to 73 food classes. They Third, a hearty calculation for outskirt
designed an automatic tray analysis technique tangling has been created to gauge all the
that takes a tray image as input, finds the while the alpha-matte around a protest limit
regions of interest, and predicts for each region and the shades of closer view pixels. Diagram
the corresponding food class. First step is tray slice is utilized to accomplish strong division
analysis which takes the tray image as input. even in disguise, when forefront and foundation
shading circulations are not very much isolated. between food types belonging to the same
Grapph cut technique is flopped as far as client cuisines. The approach is based on the
connections, can happen in three cases: (i) hypothesis that knowing the location (through
regions of low difference at the move from geo-tags) helps us in reducing down the number
closer view to foundation (ii) disguise, in which of food categories which in turn increases
the genuine frontal area and foundation recognition rates. To test this location
appropriations cover somewhat in shading information was dropped and SMO-MKL
space (iii) foundation material inside the client classifier was trained for all training images .
rectangle happens not to be enough spoken to The setup provided accuracy of 15.67% for 600
out of sight region. Graph cut is effective by images. But the overall average accuracy across
where the bouncing rectangle alone is an the 5 cuisines was 63.33%. Thus it was seen that
adequate client association to empower closer the average performance increased by 47.66%
view extraction to be finished naturally by when location prior was included. It was
GrabCut. Bettadapura et al [19] proposed a concluded that knowing the location of eating
technique that leverages the sensor data (i.e. activities helps in food recognition, and that it is
location) captured at the time photos are taken, better to build several smaller
with additional information about the restaurant/cuisine specific classifiers rather
restaurant available online, combined with than one all-category food classifier. Bejibom et
computer vision techniques to recognize the al[18] concentrates on restaurant scenario and
food being consumed. To train the classifier introduce a technique that addresses this
images from restaurants online menu databases problem specifically called Menu-Match. In
were used. For feature extraction from the their work, database contains atomic items as
training and test data, a Harris-Laplace point they are served at specific restaurants. Thus a
detector is used and for feature descriptor six meal can be directly classified as belonging to a
descriptors, two 2 color-based descriptors and 4 particular restaurant for example, the
SIFT-based are used. Color-based descriptors cheeseburger at Joes at Solo Grill in Toronto,
used are Color Moment Invariants and Hue and then accurate nutritional statistics can be
Histogram. SIFT based descriptors used are read from the database. They propose an
CSIFT, Opponent SIFT , RGB-SIFT ,SIFT. For application in which location information
image classification SMO-MKL multi-class SVM available on mobile devices (e.g., GPS) limits the
classification framework is used. They evaluate search for a particular image to a small set of
the performance of system in unconstrained, nearby restaurants, which greatly simplifies
real-world settings with food images taken in 10 recognition and offers accurate mapping of
restaurants across 5 different types of food images to nutritional information. The image
(American, Indian, Italian, Mexican and Thai). recognition framework is based on the bag of
First using the geo-location information, the visual words approach In the first step, five
menu for each restaurant was automatically types of base features are extracted from the
retrieved. Next, they perform interest point images: color , histogram of oriented gradients
detection, feature extraction, codebook (HOG) ,scale-invariant feature transforms (SIFT)
building for Bag of Words representation, , local binary patterns (LBP) , and filter
kernel pre-computation and finally classification responses from the MR8 filter bank . These base
using SMO-MKL .And the results are features are encoded with locality constrained
summarized as the individual confusion linear encoding (LLC) , using a dictionary with
matrices. Using this approach they achieved 1024 words learned via k-means clustering. The
good classification accuracy with American, encoded base features are then pooled using
India and Italian cuisines. However, for the max-pooling in a rotation-invariant pooling
Mexican and Thai cuisines, the accuracy is scheme. The implementation includes Semi-
limited due low degree of visible variability Automated Food Item Identification using
oneversus-rest linear Support Vector benchmark datasets has been done. Yanai et
Machine(SVM) and Fully Automated Estimation al[16] examined the effectiveness of deep
Of Food Statistics based on regression. The convolutional neural network (DCNN) for food
results show that fusing several features that is photo recognition task. In the DCNN approach,
the joint feature representation significantly an input data of DCNN is a resized image, and
outperforms the performance of any single the output is a class-label probability. That is,
feature type. The rotationally invariant pooling DCNN comprises all the object recognition
method increased the mean average precision phases such as local feature extraction, feature
for the joint feature from 38.3% to 51.2% coding, and learning. The first technique is by
compared to the traditional spatial pyramid means of a pre-trained DCNN with a large scale
pooling. They collected a dataset of actual meal dataset such as the ILSVRC dataset as a feature
images from three local restaurants. Which vector extractor for a small-scale data. The
contains accurate nutritional information and second technique is fine-tuning of the pre-
realistic food images called Menu-Match trained DCNN. Fine-tuning is tuning the
dataset. It contains a total of 646 images, with parameters pre-trained with a large-scale data
1386 tagged food items across 41 categories. using an additional small-scale data. With fine-
Also calorie counts for all food items thus tuning, the DCNN originally for a large-scale
containing nutritional metadata 3 . The data is altered and adapted to other tasks. In
advantage of this approach it is not needed to this case, a DCNN is used as both a feature
identify every item, considering the meal as a extractor and a classifier. In the experiments,
whole entity, ingredients and preparation they have achieved the best classification
details are encoded into the database, volume accuracy, 78.77% and 67.57%, for the UEC-
estimation is no longer needed. The technique FOOD100/256 dataset, which proved that the
has some limitations such when dish do not fine-tuning of the DCNN pre-trained with a large
come in discrete serving sizes, such as salad number of food-related categories (DCNN-
bars. To provide results for home-cooked food, FOOD) can boost the classification performance
it requires to build a custom menu .In case of the most greatly. In addition, they applied the
Takeout food the location information where food classifier employing the best combination
the food is consumed is not relevant, this of the DCNN techniques to Twitter photo data.
requires a user to specify the restaurant He et al[15] paper presents an ingredient-based
manually. Martin et al [17] proposed a food recognition method. DietCam, which
committee based recognition system that specifically addresses the variation of food
chooses the optimal features out of the existing appearances. DietCam consists two major
plethora of available ones (e.g., color, texture, components, ingredient detection and food
etc.). Each committee member is an Extreme classification. Food ingredients are identified
Learning Machine trained to categorize food through a combination of a deformable part-
plates on the basis of a single feature type. based model and a texture verification model,
Single member classifications are then and apply a multi-view multi-kernel SVM to
considered by a structural Support Vector classify the food ingredients. Their first
Machine to produce the final ranking of innovation is the improvement of the current
expected matches. This is accomplished by part-based object recognition model toward
filtering out the irrelevant features/classifiers, textures oriented and location-flexible to detect
thus considering only the relevant ones. The food ingredients. They modified the detector in
SCORE approach uses as many different three ways for the purpose of food ingredient
features as possible but exploits only a subset detection. The ingredient detector tries to find
of those to obtain optimal ranking performance. Fig. 1. EXAMPLE OF THE UNIMIB2016 FOOD
To demonstrate the benefits of the proposed DATASET[22] food ingredients on a single scale
SCORE approach extensive evaluations on three in order to retain relative ingredient scales.
After that, the scale invariance is achieved in a food) and a total of 5000 food images, gathered
multi-scale support vector machine (SVM) from the Web. It contains single food image .
during food classification. Their second Using MKL-based feature fusion, recognition
contribution is development of a multi-view accuracy of 61.34% was obtained. This dataset
multi-kernel SVM to classify various was enlarged to 85 food categories containing
combinations of food ingredients under 8500 food images called Food 85 dataset.The
occlusion. In the experiment, DietCam presents annotation of both these datasets is done using
reliability and outperformance in recognition of labels. These two datasets are proprietary [23].
food with complex ingredients on a database of D. TADA DATASET The TADA dataset contains
55 food types with 15262 food images. The images of real foods (256 images) as well as
recognition precision of DietCam was around food replica (50 images) and are acquired in lab
90% for general food items, and 85% for food controlled settings. The images can have
items in DCs 5 and 6. III. DATASETS FOR FOOD multiple food present making the dataset more
RECOGNITION A. UNIMIB2015 DATASET The challenging as it will require the segmentation
dataset contains 2,000 images of 15 classes of of each food in the image. It is a proprietary
foods placed on trays. The images were dataset [4]. E. UEC FOOD 256 The dataset UEC
acquired in a real canteen location, and are FOOD 256 contains 256-kind food photos both
paired with the corresponding leftover images single and multi food type acquired in an
acquired after the meals. It is the first dataset unconstrained environment . It contains 31,397
explicitly designed for both food recognition food images and each has a bounding box
and leftover estimation. UNIMIB2015 dataset is indicating the location of the food item in the
the only dataset presented to date specifically photo. For the recognition, SVM classifiers with
designed for canteen environments. The color histogram and SURF features are used,
annotation is done using polygonal areas and achieving a classification rate of 74.4% for the
the dataset is publicly available[20] [22] . B. THE top 5 category candidates when the ground-
UNIMIB2016 FOOD DATASET The dataset truth bounding boxes are given. Most of the
contains 1,027 tray images, 73 food categories, food categories in this dataset are popular
and a total of 3,616 food instances and has foods in Japan and other countries. The
been collected in a real canteen environment. annotation is done using Bounding Box and the
Each image depicts different foods on a tray, dataset is publicly available [5]. F. UNICT889
and some foods (e.g. fruit, bread and dessert) DATASET UNICT889 is the dataset with single
are placed on the placemats rather than on food type but has most food categories. It
plates. The acquisition of the images has been contains 889 classes on a total of 3,583 images
performed in a semi-controlled environment so acquired in an unconstrained environment . The
the images present visual distortions as well as purpose of the dataset was near duplicate food
illumination changes due to shadows making retrieval. Different features are tested and the
this dataset challenging requiring both the best results for near duplicate retrieval was
segmentation of the trays for food localization, achieved by color Bag-of-Textons with a mean
and a robust way to deal with multiple foods average precision of 67.5%. The annotation is
containing dishes. The UNIMIB2016 dataset is done using labels and the dataset is publicly
characterized by images that contain multiple available [6]. G. DIABETES DATASET
foods with accurate segmentation. The dataset Anthimopoulos et al. presented a dataset of
is annotated using improved version of our 4,868 single food images organized into 11
Image Annotation Tool that will help research classes acquired in an unconstrained
work on methods for food segmentation, as environment .The dataset can be used for food
well as food quantity estimation [20][22]. 4 C. recognition system based on the Bag-of-
FOOD 50 AND FOOD 85 DATASET Food 50 Features model. The system is intended to help
contains 50 food categories (mostly Japanese diabetic patients in controlling their
carbohydrates daily consumption. Different countrywise. 2] To get the high quality images
visual features and classification strategies were and make the food recognition easy try to keep
tested and the best combination gave a the uniformity in picture taking devices. It will
classification accuracy of slightly less that 78% help to make a centralized system for all the
using a 10,000 words dictionary. The annotation restaurant in a perticular continent.
is done using labels and the dataset is publicly REFERENCES [1] Z. Zong, D. T. Nguyen, P.
available[7] . H. FOOD 101 Food -101 is the Ogunbona, and W. Li, On the combination of
largest dataset currently available acquired in local texture and global structure for food
an unconstrained environment. It contains classification, in International Symposium on
101,000 single food images divided into 101 Multimedia, 2010, pp. 204211. [2] F. Kong and J.
food categories. Random forest are used to Tan, Dietcam: Regular shape food recognition
mine discriminant super pixel grouped parts in with a camera phone, in International
the food images. These parts are then classified Conference on Body Sensor Networks, 2011, pp.
with SVM achieving an average accuracy of 127132. [3] S. Yang, M. Chen, D. Pomerleau,
50.76% on the 101 classes. The annotation is and R. Sukthankar, Food recognition using
done using labels and the dataset is publicly statistics of pairwise local features, in
available [8] . I. PITTSBURGH FAST FOOD IMAGE International Conference on Computer Vision
DATASET Pittsburgh fast-food image data sets and Pattern Recognition, 2010, pp. 22492256.
contain 61 categories of food items. These 61 [4] Wang, Xin, Devinder Kumar, Nicolas Thome,
categories are a subset of the 101 categories. Matthieu Cord, and Frederic Precioso, Recipe
There are 3 instances of each food item, each recognition with large multimodal food
taken on different days. Classification accuracy dataset, In Multimedia & Expo Workshops
on this 61 categories are -Bag of SIFT features- (ICMEW), 2015 IEEE International Conference
9.2%, Color histogram- 11.3% [9]. IV. on, pp. 1-6. IEEE, 2015. 5 [5] Hassannejad,
CONCLUSION The paper has given the vast Hamid, Guido Matrella, Paolo Ciampolini, Ilaria
review of the recent developments in food De Munari, Monica Mordonini, and Stefano
recognition methods. Researchers have worked Cagnoni, Food Image Recognition Using Very
on many different problems like dietary Deep Convolutional Networks. Proceedings of
monitoring, food photos are taken, with the 2nd International Workshop on Multimedia
additional information about the restaurant, Assisted Dietary Management. ACM, 2016. [6]
restaurant Menu-Matching etc.it is observerd Farinella, Giovanni Maria, Dario Allegra, and
that the quality of the photo does affect the Filippo Stanco, A benchmark dataset to study
recognition of food. It seems that, keeping in the representation of food images, European
mind todays scenario of eating habits and Conference on Computer Vision. Springer
searching for good food by using the internet is International Publishing, 2014. [7]
a trend and many researchers are working on Anthimopoulos, Marios M., Lauro Gianola, Luca
food recognition domain. recognizing a single Scarnato, Peter Diem, and Stavroula G.
food item is easy as compare to recognising the Mougiakakou, A food recognition system for
multiple food items in a dish. Results are good diabetic patients based on an optimized bag-of-
for single item recognition as compare to features model. IEEE journal of biomedical and
multiple item recognition. The photos taken health informatics 18.4 (2014): 1261-1271. [8]
with the location information helps the user to Bossard, Lukas, Matthieu Guillaumin, and Luc
find the place quickly. V. FUTURE WORK Van Gool, Food- 101mining discriminative
Researchers have worked on so many components with random forests. European
approaches for the food recognition. . But Conference on Computer Vision. Springer
practically they are very far. Some further International Publishing, 2014. [9] Chen,
research work is outlined as follows. 1] Mei,Dhingra, K., Wu, W., Yang, L., Sukthankar,
Researchers an work on to catogorise the food R., & Yang, J.l. PFID: Pittsburgh fast-food image
dataset. 2009 16th IEEE International P. Napoletano, and R. Schettini, Food
Conference on Image Processing (ICIP). IEEE, recognition: a new dataset, experiments and
2009 [10] Swati, Shirke, and Suvarna results, IEEE journal of biomedical and health
Pansambal, Enhancement of IRIS recognition informatics (2016). [21]
using Gabor over FFBPANN, Computing for http://www.ivl.disco.unimib.it/activities/food-
Sustainable Global Development (INDIACom), recognition/ [22] Senyuva, Hamide Z., and John
2015 2nd International Conference on. IEEE, Gilbert, Immunoaffinity column cleanup
2015. [11] Shirke Suvarna, Pawar S, Shah K techniques in food analysis: A review, Journal
(2014), Literature review: model free human of Chromatography B 878.2 (2010): 115-132
gait recognition, In: 2014 fourth international
conference on communication systems and
network technologies (CSNT). IEEE, pp 891895
[12] Zhang, Mabel Mengzi, Identifying the
cuisine of a plate of food, Univ. of California at
San Diego, La Jolla, CA, USA, Tech. Rep. CSE 190
(2011). [13] C. Rother, V. Kolmogorov, and A.
Blake, GrabCut: Interactive foreground
extraction using iterated graph cuts, ACM
Transactions on Graphics, vol. 23, no. 3, pp.
309314, 2004. [14] Y. Maruyama, G. C. de Silva,
T. Yamasaki, and K. Aizawa, Personalization of
food image analysis, in International
Conference on Virtual Systems and Multimedia,
2010, pp. 7578. [15] He, Hongsheng, Fanyu
Kong, and Jindong Tan, DietCam: Multiview
Food Recognition Using a MultiKernel SVM.
IEEE journal of biomedical and health
informatics 20.3 (2016): 848-855. [16] Yanai,
Keiji, and Yoshiyuki Kawano, Food image
recognition using deep convolutional network
with pre-training and fine-tuning, Multimedia
& Expo Workshops (ICMEW), 2015 IEEE
International Conference on. IEEE, 2015. [17]
Martinel, Niki,Claudio Piciarelli, Christian
Micheloni,and Gian Luca Foresti, A Structured
Committee for Food Recognition, In
Proceedings of the IEEE International
Conference on Computer Vision Workshops, pp.
92-100. 2015. [18] Oscar Beijbom, Neel Joshi,
Dan Morris, Scott Sapona, Siddharth Khullar,
Menu-Match: Restaurant-Specific Food Logging
from Images, 2015 IEEE Winter Conference on
Applications of Computer Vision. [19] Vinay
Bettadapura, Edison Thomaz, Aman Parnami,
Gregory D. Abowd, Leveraging Context to
Support Automated Food Recognition in
Restaurants, 2015 IEEE Winter Conference on
Applications of Computer Vision. [20] Ciocca, G.,

Potrebbero piacerti anche