Sei sulla pagina 1di 33

CHAPTER 1

INTRODUCTION
1.1Introduction

In human body certain organ and tissue cells divides each other and increases. Some of
the cells get increased due to duplication and some are prone to death of cells to maintain the
equilibrium state concerning the organs integrity. DNA or genetic defects balances the creation
and death of the cells. The propagation and replication of the cells take place which results in
new emerging changes.

This can result in uncontrolled growth of cells which turns in to tumor. These cancer cells
grab all the nutrients from healthy cells and start encroach the nearby tissue cells. Some of the
cancer cells remain obscure doesn’t get replicated but some cancer cells also enters into other
parts of the body via glands or the body parts.

Due to more fast pace of life in recent years, lack of mental and physical exercises with
changing daily practices along with unhealthy food adoptions has increased the percentage
growth of increased diseases in human body. Breast cancer is the most dangerous type of tumor
which is diagnosed in females, most important reason of death among women worldwide with a
death of about 5,22,000 in 2015 [1].

Early diagnosis is necessary for this treatment; however it is difficult to analyze high
density breast tissues. The major Issue in the decrease of the death rate is early recognition of
mass in mammogram for diagnosis of breast cancer. In the recent cases,The radiologist to screen
the masses is a difficult task, due to the variations in contrast,edges of fuzzy and noise in
mammograms.

For the breast cancer diagnosis the two distinctive are Masses and micro-calcifications.
For classifying the density of mammograms, Computer-aided diagnosis systems has been
proposed , having as a major issues to define the features that better represent the images to be
classified.
In 2004 cancer accounted 13% of all death happened due to the breast cancer as per
WHO(World Health Organization) [2]. By the radiologist’s mammography has been used as tool
for diagnosis and screening for recognition of breast cancer at premature stages. In early
detecting of the breast cancer.Mortality rates reduced up to 25% using this technique. For
mammography radiologists Screening is not an easy task; during routine screening 10–30 % of
lesions are missed [1, 3].

Masses and micro-calcifications are the two Different breast cancer sign.In the breast.
the abnormal accumulation of calcium is small bright spot it appears is frequent of a micro-
calcification sign, and the average size is 0.4mm.

In the sign of masses is probable abnormality on a mammogram. Mainly two different


types of masses,malignant, its an appearance of irregular edges or a star- burst.

Large (30 to 50 mm), medium (15 to 30 mm), small (3 to 15 mm),these are three various
sizes available. The cells are varying shape, and low contrast thickness surrounded by non-
uniform tissue with same characteristics that some supporting tissues [4].

For the masses, micro-calcifications and areas of the affected density the digitized
mammographic image is analyzed to ensure the presence of cancerous cells. To inspect and to
differentiate mammographic images into benign or malignant the automated systems will help
the medical experts to take correct decisions for the diagnosis.

A large number of females saved their lives by early detection of the cancer by taking
cancer treatments after the identification of stage-zero or stage-one cancer.
1.2 Motivation

Breast cancer is the main reason of cancer death global and can affect females at any age
and increases mortality. Due to lack of metal and physical exercises, overweight,
postmenopausal issues lead to breast cancer in female. So the diagnosis of a particular area
affected in the breasts is always a challenging task.

There are two types of cancer growth one is benign and the other is malignant. The
identification of cancer cells in early stages helps the radiologists to analyze the growth of the
cells under different stages and conditions. These affect the healthy cells and also develop in
other organs of the body which is more harmful and may remain unknown.

So detection of such breast cancer location, identification and classification in earlier


stage is a serious issue in medical science and also bringing awareness about these diseases in
woman and available technologies to cure them is also one of the important issues that motivated
us to work on this research topic. This technology also helps doctors to analyze the internal
organs of the human body for correct diagnosis.

1.3 Project Objective

In the present days the consciousness regarding the effect of breast cancer are increasing
the interest in diagnosing with the early detection of are also increasing in females so the project
main goal is to “Design a system automated for breast cancer classification and detection by
k-means clustering and Linear Shift Invariant Wavelets based Enhancement”. This tool
will immensely help in detection and classification of cancer cells from an arbitrary collection of
mammographic images.

 To perform a study based on image processing techniques.


 To perform image preprocessing of mammograms.
 To segment the region of interest and enhancing the quality of the image.
 To detect Pectoral muscles using k-means clustering.
 To extract the statistical features.
 To classify using support vector machine.
1.4 Existing & Proposed System
For represents the important information that different levels and resolution one of the
most usable tool is wave-let transform, for various application [5].

The sequences of iterative elementary point-wise as well as dilation’s outline


transformations are used to pectoral muscle removal [6]. The forward and backward scanning
to reconstruct the artifact removed image over which the gray scale windowing technique is
used. The gray scale windowing is based on the data set and will need different window range
for different data sets.

A large number of artifacts produced by Decimated wavelet transform has the


restriction ,after processing reconstruction of the wavelet coefficients. For ex, according to Gibbs
phenomena the discontinuities neighborhood some of the lack of translation in variance of its
wavelet basis [3]. Using undecimated wavelet transform The results of denoising can be better
[7]. For undecimation analysis Haar filter is used for each band and same size of an original
image.

In a proposed system, k-means clustering shall be performed for pectoral muscle removal
which is based on unsupervised learning due to which this method may suit to any datasets with
slight modifications in the algorithm.

The proposed system shall perform Classification of breast cancer into different stages of
cancer based on the extracted features such as geometrical and textural.

1.5 Project Description


In this research work, developing an automated system for the classification and
detection of breast cancer in females is undergone. Here the mammograms images are given as
an input. These input images are preprocessed and segmented with the selection of region of
interest and quality of image is enhanced to detect the classification. Segmentation is done using
region growing algorithm and statistical features are extracted.
From the extracted features, database will be created. The detected image is there in the
feature data base.In features present in the database the feature of the image should be match,
then the breast cancer image it will be recognized else as no cancer image either normal or
abnormal images.

1.6 Outcome of the Project


To develop a computerized system in classification of breast cancer from mammographic
images we first preprocess the images to remove the noise, so we get an enhanced quality image.
Then after performing segmentation, we get segmented image as an output using region of
interest.

The statistical features are extracted such as mean, SD, skewness, kurtosis, asymmetry
etc. The extracted features for the test images are fed to the trained classifier for the breast cancer
classification stage which is Malignant using Support Vector Machine.

1.7 Advantages & Applications

 Breast cancer detection.


 Breast cancer Assessment and Classification .
 Provides Diagnostic data for Cancer Therapy.
 Pectoral muscle Detection and removal is performed based on unsupervised learning.
 Effective segmentation of region of interest.

1.8 Organization of the Report


Chapter 1 describes brief introduction of the project. It characterizes the issues confronted in
today’s reality and how our project overcomes them.

Chapter 2 describes about the previous available technologies which can be used to implement
the present project. It also includes the references of various technologies, we have used.

Chapter 3 explains about the creation of database, requirement specification. It briefs about how
the methodology implemented is described using block diagram.
Chapter 4 explains about the Implementation. It briefs about the implementation of the work
using algorithms.

Chapter 5 provides details about the Performance evaluation.

Chapter 6 explains the Result in detail.

Chapter 7 gives the details about the Conclusion & Future scope.
CHAPTER 2
LITERATURE SURVEY
The proposed methods are Computer-Aided Diagnosis (CAD) and varieties of algorithm
that are efficient are found in the literature [4, 5, 6, 7, 10, 15].

PelinGorgel et al. [8] for development and denoising of mammograms worked on a


technique using wavelet transform and Homomorphism filter . In this method tissue of dense
having a limited effect ,same contrast of mass, highly dense breast.

Mencattini et al. [9] he proposed a concept for an mammographic image of adaptive gain
setting that is based on mammograms denoising with micro calcification development and local
iterative noise variance estimation.The wavelet dyadic and With combination of morphological
operation the novel segmentation method was proposed. Over segmentation of mass may be
possible with dense-glandular mammograms.

Xinbo Gao et al. [10], he has proposed a method it is based on morphological component
analysis,in this method it uses mammographic images of decomposition into texture component
and also piecewise-smooth.Layer criteria is used to detect. Mammograms doubt able areas, in
breast tissue surrounding the mass is hidden due to lower contrast so limited effect in this
approach.

According to Maciej A. Mazurowski et al.[11] For template matching scheme CAD is


used for mammographic mass recognition of intelligently selected templates.As compare to
malignant masses these techniques have less result.

According to Jeong Hyun YOON et al. [12] he has projected a method based on wavelet
processing mammographic images with enhancement in non-linear homomorphism filter.In case
of mammographic dense-glandular and fatty-glandular the system of the result are unsatisfactory.
In an medical image the enhancing feature of masses and abnormalities with some
desired feature extraction of mathematical morphology called Rotational Morphological
Processing (RPM) has proposed by Yoshitaka Kimori [13]. However,when suspicious regions
are invisible and not able to identified boundaries properly and due to poor contrast embedded in
nearby breast tissue.

Guillaume Kom et al. [14] he has proposed a method it is based on adaptive threshold
technique and linear transformation enhancement filter.In an mammographic image for
automatic recognition of mass. Partial effect on enormously dense mammograms uses a linear
filter development technique, high thickness tissue and suspicious mass has similar
characteristics.

Digambar A Kulkarni et al. [15] worked for the automated detection of mammographic
images used adaptive histogram equalization to find out the irregularities in the images with the
enhancing techniques with k-means clustering algorithm for segmentation with support vector
machine for the classification into benign and malignant.

Dubey A.K et al.[16] proposed the for early recognition and curing of breast cancer using
k-means clustering algorithm with various computing factors such as centroid, distances, split
methods, epochs, attributes, and iterations. These combinations provided high potential accuracy of
clustering.

Snehali D. Sable et al. [17] proposed execution and Analysis Of K- means And Fuzzy C
MEANS Clustering Techniques adopted an X-ray method and used high resolution films which
detects tumors in breasts. K-means Centroid based algorithm with representation of object based
Fuzzy C means were proposed. The performance tests clustering resulting quality.

To Study of Brest cancer Classification Techniques with FNA Biopsy Data proposed by
Haowen You and George Rumbe [18].which gave an accurate detection of cancer cells. Some of
the languages like Bayesian classifier and Support Vector Machine and some Artificial neural
network classifiers like linear training,Back propagation,Learning vector quantization and K
nearest neighborhood were used on Wisconsin breast cancer.
Classification of breast cancer using Genetic programming and Support Vector Machine
gave an efficient tool in diagnosing breast cancer it has proposed by K.Menaka, S.Karpagavalli
[19]. Wisconsin breast cancer database to classify are carried out of this experiments. To classify
the type of cancer support vector machine was used.Genetic programming evolutionary
algorithm was used to train the models. It seemed to be fast and elegant method.

R. Ramani et.al [20] proposed Breast Cancer Detection in Mammography Images by the
Pre-Processing Techniques. Which is used for enhancement of a mammogram image.

In this paper the feature of the image is improved by removing the noise, preserving the
information of edges in an image, enhancing and smoothing of the image. Different filters such
as average, median, mean and wiener filter were used for filtering the images.

The Neural Network and the Support Vector Machine for the breast cancer classification
of cancerous cells into benign and malignant it has proposed by Ebrahim Edriss Ebrahim Ali,
Wu Zhi Feng [21] . The neural network and the support vector machine classifier used meant for
the classification purpose. The performance evaluation was done by calculation of accuracy and
precision for comparison.

2.1 Problem Statement

By analyzing and knowing the various technologies given by different authors above, our
problem statement is stated as “Large number of the artifacts produced by Reconstruction of the
wavelet coefficients in limitation of Decimated wavelet transform after processing”.

With the help of undecimated wavelet transform [11], the improved denoising result. In
this transform in each group of the original image with same volume can be Implemented using
Haar filter’s.

In the proposed system, k-means clustering shall be performed for pectoral muscle
removal which is based unsupervised learning due to which this method may suit to any datasets
with slight modifications in the algorithm.
The proposed system shall perform Classification of breast cancer into different stages of
cancer based on the extracted features such as geometrical and textural.
CHAPTER 3
METHODOLODY
3.1 Data collection
From MIAS – Mammography Image Analysis Society, used database in this research
work. This database includes highly digital mammographic database with good perception of
mammograms. The database consists of X-ray frame that is chosen from UK National breast
screening program.

The resolutions of digitized images are about the size 50µm height and width respectively,
that is measured with Joyce Lobel Scanning micro densitometer. The pixels of images are
depicted in the form of bits. The images are of the size 512* 512 pixels or 1024*1024 pixels.
Figure 3.1 represents an original mammogram image selected from database.

Figure 3.1 Original Mammogram Image.

3.2 Block diagram

Mammogram Preprocessing Segmentation Feature Classification


image extraction

Benign Malignant

Figure 3.2 Block diagram of mammogram cancer classification

The automated designing of detection and classification of mammogram images as


shown on the block diagram in figure 3.2. This show the multiple steps has to be carried out
design an automated tool using image processing techniques. The description of each steps are as
follows:

3.2.1 Image Acquisition:

This is first step to be carried out for choosing images from database. This includes
collection of different mammogram images from MIAS database. We have collected image in
which some are normal and other abnormal images that is available through websites.

3.2.2 Image Preprocessing:

The mammogram images collected were subjected to preprocessing for the removal of
noise using filtering techniques. Here we are mainly working on spatial domain methods i.e.
manipulating directly the pixels of an image.

3.2.3 Image segmentation:

The segmentation includes dividing of an image into different parts and analyzing each
part so that recognition becomes easy. Here we are using region of interest algorithm for
selection of region in which are interested.

3.2.4 Feature Extraction:

Extraction of significant quantitative piece of information for the overall understanding


of an image. The features extracted here are statistical features such as mean, standard deviation,
skewness, kurtosis, asymmetry etc.

3.2.5 Feature Database:

The extracted mammogram images are stored in feature database. These features are
trained using support vector machine for the classification purpose.

3.2.6 Image recognition and classification:

The test images were compared for matching with the features which are already present
in database. If the trained images features match with test images it identifies and classify into
whether it’s benign or malignant breast cancer.
3.3 Methodology
For the detection and classification of cancer whether it is benign or malignant, the
methodology used different techniques and efficient algorithms of image processing.The
preprocessing, segmentation, feature extraction, recognition and classification are are carried out
from image processing technique .The mammogram images shown in figure 3.1 are given as an
input. The preprocessing technique is applied for the removal of noise and for enhancing the
quality of the image.
The region of interest is selected by applying segmentation methods. For an effective
segmentation region growing algorithm is used. K-means clustering is used for detection and
removal of pectoral muscles. Statistical features are extracted such as mean, standard deviation,
skewness, kurtosis, asymmetry etc and stored in feature database.

Mammogram image

Noise removed using morphological


operations

Pectoral muscles removal using K-means


clustering

Segmentation using region growing method

Feature Statistical feature extraction


database

Classification using support vector


Trained machines

Benign Malignant

Figure 3.3 Training of mammogram images


After training of mammogram images, the query images were sent for testing. These
testing images would also undergo all these steps as shown in figure 3.4 as the features were
trained; these trained images were matched with tested images. These would classify the images
into benign and malignant cancers.

Mammogram image

Noise removed using morphological


operations

Pectoral muscles removal using K-means


clustering

Segmentation using region growing method

Feature Statistical feature extraction


database

Classification using support vector Testing


Trained machines

Benign Malignant

Figure 3.4 Testing of mammogram images

3.4 Requirements
In this project we required both hardware and software equipment's as follows :

3.4.1 Hardware Requirements

In hardware part we are using below requirements as follows:


-Advanced processors like AMD and Intel with min 2.5Ghz of clock

-Minimum 2GB of RAM

-500 GB of hard-disk for storage

-Input as well as output devices for operation

3.4.2 Software Requirements

In software part we are using below requirements as follows:

-MATLAB 2010 and above


CHAPTER 4
IMPLEMENTATION
This chapter gives the details about various steps carried out in the proposed
methodology in brief along with explanation of algorithm. A typical classification of
mammogram breast cancer image system comprises of pre-processing, segmentation and feature
extraction and classification phases.

4.1 Input data

Input data used in this research are mammogram images collected from MIAS database
which consists of approximately 300 images. But in this research we have considered 70 images
composed of both normal and abnormal images i.e. some belong to benign and some belong to
malignant. The mammogram image is shown in figure 3.1.

4.2 Preprocessing

The preprocessing improves the quality of the mammogram images making it suitable for
the decisive for feature extraction. This step is also called as image enhancement which gives
translated image in a better understanding way. Here we have used morphological operation such
as erosion and dilation as shown in figure 4.1.

The artifact suppression and pectoral muscle segmentation methods are used for pre
processing . The upper posterior periphery of the image region emerges triangular area across a
proper Medio-Lateral Oblique (MLO) as a chest connecting to the high intensity is called
pectoral. Hence, radiopaque artifacts of pectoral muscle i.e. (labels, wedges, etc.) shown in figure
for better detection of lesions from mammographic images removal [2, 8, 12].

The artifact suppression is performed using morphological operations. Erosions of layers


of pixels are performed to remove the unwanted elements and dilations are performed to regain
the layers of pixels removed by erosions but contributes during further processing steps.
4.2.1 Low-level Designs:
Level 0:

IMG tumor
User Segmentation User

Level 1:

IMG
User Region growing

K- means clustering

User
Support vector
Segmented
machine region

Level 2:
Mammogram Read the Mammogram
User
image

Conversion to gray level

Segmentation

Figure 4.1 Representing Low-level designs


4.2.2 Pectoral Muscles Removal

During processing of mammograms for finding the suspicious mass while mammogram
acquiring captured a Pectoral muscle shown in the figure 4.2. In mammogram the breast dense
tissue as same density,muscle pectoral appears the breast cancer affected by detection process.
With the help of muscle Pectoral breast abnormalities is to be find only in the region of breast.
The figure 4.1 shows the original mammogram image with detection of pectoral muscles with
artifacts identification along with pectoral muscles artifact suppressed mammogram image.

Figure 4.2 Mammogram image with pectoral muscles suppressed.

4.3 Segmentation

The method of image partitioning into distinct parts constitutes segmentation. Let the
image region be represented by B. the partitioning of B into d sub regions like B1, B2,
B3…………..Bd. for the segmentation to be accurate certain condition has to be satisfied i.e.
every pixels should not be beyond the image region, all the points must be connected in one or
the other way, disjoint regions should be present.

The growing of sub regions into a larger regions depending upon assertive conditions. In
first step involves selection of “seed” points and these seed points start connecting with the
neighbouring pixels with same property to grow into larger region. Then a specified seeds was
selected as input in an image that marks the pixels to be segmented. The region grows
considering all closest neighbouring pixels in the image region.
4.3.1 Region growing based segmentation

The algorithm for region of seed the segmentation proposed by Adamos and the Bischof.
It is extensively accurate method, fast and very easy for the execution. It is effectively used in
machine vision application where the data set of image change widely. It can be useful to gray
scale images and an extension also made to shade components with the selection of apt color
space. The first step starts with selection of firm seed points analogous to an each and every
region. Comparison is made between the seed points and their neighbours based on resemblance
criteria.

To compute the neighbors of pixels we have used 8-connectivity or 4- connectivity. The


4-connected pixels are the one where the pixels are adjacent to each other either horizontally or
vertically. The 8-connected pixels are the one in which the neighbouring are horizontally,
vertically and diagonally connected. Then the difference is calculated between the value of
intensity pixel of image and mean of related region to find out the similarity issues. The pixel is
labeled only if the given threshold lesser then the difference else the pixel is omitted and is not
labeled.

The algorithm of implementation of the seeded region given below:

Step 1: A points of seeded are calculated based on the seed selection algorithm and each seed
point is associated with a single region.

Step 2: For Selected seed point , labeled region assigned by the seed point.Intensity to the pixel
is initialized with seed point the mean region identical.Seed point neighbours are calculated and
is stored in the matrix, which stores the neighbours address of pixel to be examined.

Step 3: For the neighbour matrix where each and every pixel is stored. Check for the similarity
criteria, if the pixel is not labeled – the corresponding pixel is labeled in the region. The mean of
new region is computed related to corresponding region.
Then the pixels of neighbours are computed and are stored in neighbouring matrix if it is
not labeled. Else, skip the pixel and next neighbouring pixel is selected. The step 3 is repeated
until the checking of all the neighbouring pixels are over.

Step 4: The next seed is selected and for next region the step two is repeated.

Step 5: The seed points are computed once again If the pixels are not labeled of only unlabeled
pixels.

Step6: The above process is repeated till all pixels are labeled related to their corresponding
region. The image was divided into different region segments.

4.3.2 Appropriate Seed Points selection for Region Growing Segmentation:

The important consideration here is the proper selection of seed point in region growing
segmentation. The pixels selected as a seed has to follow certain conditions: The similarity
between the seed pixels and neighbouring pixels should be very high. For an every assumed
region at least one seed pixel has to be generated in the image.

Different region must not be connected related to seed .Seed selection was made based on
intensity values, frequency occurrences, after merging method. For the image given: the
algorithm for seed point computation is as follows:

Step 1: The image in the gray level were calculated and should be sorted in an ascending order.

Step 2: The image frequencies of the gray level were computed.

Step 3: First seed point were assigned by the first pixel.

Step 4: the pixels were merged to get an accurate seed point. Then we have replaced the seed
point of the merged pixels by mean. Then seed frequency is point added to pixel frequency.
Initial seed point is equal to the assigning the location of the seed point. Else pixels assigned by
as new seed point. Step 4 is repeated image are not merged to all pixel.

Step 5: In frequency descending order occurrences the final seed points were sorted.When
Region growing image segmentation the final seed point obtained.

4.4 K-means Clustering

For an unsupervised learning problem the clustering is considered to be an important


method. The concept is finding a shape in group of unlabeled data. The collection of objects of
similar type is clustering. Similarity in objects forms a cluster, objects different from each other
forms different cluster.

Separation methods are bifurcated into two types: first is medoids and other is centroid
algorithms. The medoid algorithm symbolizes the clusters based on occurrences nearer to gravity
center. The centroid algorithm represents cluster using the occurrences of gravity centers. K-
means is prominent algorithm based on centroid. In k-means method the dataset is divided into k
subsets, those subsets given surrounding all points nearer to the same center.

Select a random the K instances to which were cluster denoted. Based on selection
criteria, the left over instances were also allocated to their nearest centers. Then the new centers
by the mean of data points computed by k-means of the similar cluster. This is repeated till no
change occurs in the gravity centers as shown in the figure 4.3. If in advance if we cannot find
the K value, then different values of K were evaluated until we get correct value. The objective
lies in determining the correct distances between the instances.

The K-means clustering algorithm advantages are:


 It is simple and can be applied on huge dataset.
 It is fast when compared to Hierarchical clustering, with the smaller values of k.
 It gives intact clusters than hierarchical clusters especially when the clusters are globe-
shaped.
 It has high performance and highly reliable.

4.4.1 K-means Clustering Algorithm:

Step 1: Let M be the number of cluster that is previously known.

Step 2: We have selected M number of clusters centers such that they are far apart from the other.

µp = some values; p = 1,2,3…….m. ……………………… (1)

Step 3: Each pixel is considered and is assigned to the closest cluster.

Bp = { k: s (xk , µp) <= s (xk , µ1 ) 1≠ p, k = 1…..n } ………………………(2)

Step 4: Finding the mean of pixels belonging to similar cluster once again compute the cluster
center.

µp = 1/ ӏBiӏ ∑k ε Bi xk,, |B| = number of elements in B.

Step 5: The step 3 and step 4 was repeated till the clusters of center were shifted completely.

Start

Total number of clusters K

Centroids selected

Calculation of distance Move to group if


End
from the centroids no object found

Grouping related to
minimum distance

Figure 4.3 K-means clustering algorithm


4.5 Feature Extraction

Feature extraction is set of features obtained after the converting an image. Features are
the quantitative attributes of the objects of interest, the gives accurate relevant complete
information for the detailed understanding of an image.

These feature extraction methods analyses the objects and image from which the
important features that are attributes of variety classes of objects. Features extracted were given
as input to the classifiers which assign them to either benign or malignant to which they
represent. The statistical features extracted are:

1. Mean: It represents average value of an image. It is given as follows:

Mean = ……………….. (4)

2. Standard Deviation: It is calculated using expression

t
SD = …………………… (5)

3. Skewness: It is calculation of probability distribution of arbitrary variables. It can be


either positive value or negative value.

t
SK = …………………… (6)

4. Kurtosis: It is association of noise and resolution measurement. The equation is as


follows

tm
KUR = im
t ……………………. (7)
4.6 Classification

With Support Vector Machine (SVM) extracted features are trained and tested. In MAT
lab inbuilt functions are available “svm train” through which the features are trained using
classifier. The mammogram images were tested that follow same steps as shown in methodology
figure 4.2 till the features extraction step and was compared with extracted features stored in
feature database.

The MAT lab inbuilt function “SVM classify” was used to compare the features and
classify the mammogram image as either malignant or benign.SVM gives a strong mathematical
foundation with an appropriate working environment with more accurate and efficient feature
dimensionality in simple way and powerful classifier.

A standard SVM has built by binary classifier it is used constructs a hyper plane which is
used to separates class of members from the non relevant members in the input given. It detects
nonlinear function with correct decisions mapping the relevant input data into higher dimension
feature space and partitioning by means of a maximum plane.

It automatically detects the subsets of quantitative points called support vectors. These
were used for partitioning of hyper plane which is combination of quantitative points. SVM
finally optimizes and solves the problem.

A set of training data is given to the machine, (pj, qj) where the pj is the data of real world
instances and the qj labels which indicates classes the instances belongs. The pattern for
recognition two class problem, qj = +1 or qj = -1. The training example (pj, qj), if qj = +1 is
positive else negative otherwise. The hyper plane partition two classes and achieves maximum
partition between the two classes. Separation of classes with a wide ranging minimization
bounds the anticipated observation error.

CHAPTER 5
RESULT ANALYSIS
In this section here explained detailed experimental information of multiple images used
in the research observations, image database used, the various category and mammogram images,
training samples and testing samples,are taken from the different images and results obtained
after the feature extraction and identification, are given as follows:

In this research the we have used MIAS database for the early detection and classification
of the mammogram images. Figure 5.1 shows the snapshots of the MIAS database containing
different mammogram images. In this research we have considered 28 images out of which 70%
of samples were used as training data set and 30% of samples were used as tested data set of both
benign and malignant classes.

After the enhancing the mammogram image we have segmented using region growing
algorithm where it segments the affected cancer part. Then we have extracted the statistical
features of 28 mammogram images such as mean, standard deviation, skewness, kurtosis,
asymmetry etc.

Then, the performance of SVM classifier was evaluated by the observational calculation
of accuracy and precision values as shown in equation 1 and 2. The results were analyzed by
verifying true negative and positive,false positive and negative yielded by SVM classifier.

Accuracy: The accuracy is calculated by expression:


* 100 ……………………….. (1)

Precision: The precision is calculated using expression:

* 100 ………………………. (2)


True Positive (TP) – The mammogram abnormal image is abnormal as identified.


True Negative (TN) - The mammogram normal image is normal as identified.

False Positive (FP) - The mammogram normal image is abnormal as identified.

False Negative (FN) – The mammogram abnormal image is normal as identified.

N- Total number of mammogram input images

In which True Positive as represented by TP and True Negative as represented by TN,


which gives the ration or percentage of positive instances and negative instances that were
accurately distinguished. The cases which were reported as positive determines Benign classes
and the cases which were reported as negative determines the Malignant classes.

FP and FN represents False Positive and False negative respectively which determines
the cases that are negative but were incorrectly identified as positive and the cases that are
positive but incorrectly identified as negative.

The accuracy and precision values of images classified by using SVM evaluation is
summarized in table 2.

Calculation:
Total number of images: 28

True Positive images: 21

True Negative: 06

False Positive: 02

False Negative: 00
Chart 1: Represents the total number of images with classification of malignant and benign.

Accuracy = 21+6 / 28 * 100 = 96.42

Precision= 21/ 21+2 * 100 = 87.5

Mammogram images Accuracy Precision


Classification 96.45% 87.5%

Table 2: Result analysis for mammogram classification using SVM classifier.

Depending upon the accuracy and precision values obtained, the average accuracy
obtained was 96.45% and average precision obtained was 87.5 % respectively. So, we conclude
that SVM has achieve better outcome and accurately classifies to a rate of 96.45% refer to chart
2.

Chart 2: Represents the classification rate of SVM classifier in classifying the mammogram images.
5.1 Snapshots

Figure 5.1: Snapshot of K-means segmentation

The snapshot of K-means clustering output as shown in fig 5.1. In K-means clustering we
will get the output of image in which the pectoral muscle will be removed. Also we get the
values of features which all we extracted.

Figure 5.2: Snapshot of Feature Extraction Output


K-means clustering output will come in steps according to the algorithm which is
mentioned in this report. First step is to remove the pectoral muscle, and there will be a division
of clusters to 8 parts. In next step we will get the exact region where the tumor is there in the
breast.

The output of feature extraction is shown in the figure 5.2. Features like Accuracy,
Sensitivity and Specificity are calculated using feature extraction code. Different values for
different features are calculated for each image of given database.

Classification of cancer that is either Benign or Malignant is extracted in the last step. If
it is Benign then it is non-cancerous. If it is Malignant then it is cancerous.
CHAPTER 6
CONCLUSION & FUTURE SCOPE
In this research “The early detection and classification of Mammogram images are using
SVM classifier”, we have considered the mammogram images given as input, enhancing the
images using preprocessing morphological techniques. The preprocessed mammogram image
was segmented using region growing algorithm preparing it for K-means clustering algorithm.

The statistical features such as mean, standard deviation, skewness, kurtosis and
asymmetry etc were extracted from the segmented mammogram image. Support Vector Machine
classifier used for trained and testing of the extract feature which classifies as either benign or
malignant. The methodology proposed based on SVM got accurate good results and
classification rate of 96%. With this method the doctors can easily and early diagnose breast
cancer in females and can save the lives of many.

In future we can use different segmentation techniques to correctly identify the cancer
affected regions. To increase the accuracy we can use better classification methods to classify
into benign and malignant classes and some more quantitative features can be extracted which is
more efficient.
REFERENCES
1. American cancer society, cancer facts and figures. American Cancer Society, Atlanta,
Ga,2005-2015.
2. Tang, J.l., Rangayyan, R.M., Xu, J., Naqa, I.E., and Yang, Y., Computer-aided detection and
diagnosis of breast cancer with mammography: recent advances. IEEE Trans. Inf. Technol.
Biomed. 13(2):236–251, 2009
3. Juan Shan, H. D. Cheng, Yuxuan Wang, “A completely automatic segmentation method for
breast ultrasound images using region growing”.
4. http://www.wiau.man.ac.uk/services/MIAS/MIASweb.html.
5. http://marathon.csee.usf.edu/Mammography/Database.html.
6. Haowen You and George Rumbe, “Comparative Study of Classification Techniques on
Breast Cancer FNA Biopsy Data”, International Journal of Artificial Intelligence and
Interactive Multimedia, Vol. 1, 2010.
7. Anu Appukuttan, Sindhu.L, “Breast Cancer-Early Detection and Classification Techniques:
A Survey”, International Journal of Computer Applications (0975 – 8887) Volume 132 –
No.11, December2015.
8. Gorgel, P., Sertbas, A., and Ucan, O.N., A wavelet-based mammographic image denoising
and enhancement with homomorphicfiltering. J. Med. Syst. 34(6):993–1002, 2010.
9. Mencattini, A., Salmeri, M., Lojacono, R., Frigerio, M., and Caselli, F., Mammographic
images enhancement and denoising for breast cancer detection using dyadic wavelet
processing. IEEE Trans. Instrum. Meas. 57(7):1422–1430, 2008.
10. Gao, X., Wang, Y., Li, X., and Tao, D., On combining morphological component analysis
and concentric morphology model for mammographic mass detection. IEEE Trans. Inf.
Technol. Biomed. (2):266–273, 2010.
11. Mazurowski, M.A., Lo, J.Y., Harrawood, B.P., and Tourassi, G.D., Mutual information-
based template matching scheme for detection of breast masses: From mammography to
digital breast tomosynthesis. J. Biomed. Inform. 44(5):815–823, 2011.
12. A. Satish et.al, “A Comparative Study on K-Means and Fuzzy C-Means Algorithm for Breast
Cancer Analysis”, International Journal of Computational Intelligence and Informatics, Vol.
4: No. 1, April - June 2014.
13. Digambar A Kulkarni et.al, “Detection of Breast Cancer Using K Means Algorithm”,
International Journal of Emerging Technology and Advanced Engineering, (ISSN 2250-2459,
ISO 9001:2008 Certified Journal, Volume 6, Issue 4, April 2016.
14. Shweta Kansal, Pradeep Jain, “Automatic Seed Selection Algorithm For Image Segmentation
Using Region Growing”, International Journal of Advances in Engineering & Technology,
June, 2015.
15. Ebrahim Edriss Ebrahim Ali, Wu Zhi Feng, “Breast Cancer Classification using Support
Vector Machine and Neural Network”, International Journal of Science and Research (IJSR)
ISSN (Online): 2319-7064.
16. Angayarkanni.N, Kumar.D and Arunachalam.G, “The Application of Image Processing
Techniques for Detection and Classification of Cancerous Tissue in Digital Mammograms”,
Journal of Pharmaceutical Science and Research, ISSN 0975-1459, Vol. 8(10), 1179-1183,
2016.
17. R. Ramani et.al, “The Pre-Processing Techniques for Breast Cancer Detection in
Mammography Images”, I.J. Image, Graphics and Signal Processing, Vol. 5, 47-54, 2013.
18. K.Menaka, S.Karpagavalli, “Breast Cancer Classification using Support Vector Machine and
Genetic Programming”, International Journal of Innovative Research in Computer and
Communication Engineering (An ISO 3297: 2007 Certified Organization) Vol. 1, Issue 7,
September 2013.
19. V. Vishrutha and M. Ravishankar, “Early Detection and Classification of Breast Cancer”
https://link.springer.com/book/10.1007/978-3-319-11933, 2014.
20. Varsha J. Gaikwad, “Detection of Breast Cancer in Mammogram using Support Vector
Machine”, International Journal of Scientific Engineering and Research (IJSER) ISSN
(Online): 2347-3878 Volume 3 Issue 2, February 2015.
21. Moumena Al-Bayati, Ali El-Zaart, “Mammogram Images Thresholding for Breast Cancer
Detection Using Different Thresholding Methods”, Advances in Breast Cancer Research,
2013.
22. Navjot kaur et.al, “A review of detection of Breast cancer using Mammography”,
International journals of innovations in Engineering and Technology, ISSN 2319-1058, Vol 7,
Issue 2, August 2016.
23. Armen Sahakyan, et.al, “Segmentation of the Breast Region in Digital Mammograms and
Detection of Masses”, International Journal of Advanced Computer Science and Applications,
Vol. 3, No.2, 2012.

Potrebbero piacerti anche