Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
8, AUGUST 2019
Abstract— Due to the unpredictable location, fuzzy texture, sub-model is required to help segment the mis-segmented regions
and diverse shape, accurate segmentation of the kidney tumor in for the horizontal (vertical) sub-model. Thus, the two sub-models
CT images is an important yet challenging task. To this end, could complement each other to achieve the self-improvement
we, in this paper, present a cascaded trainable segmentation until convergence. In the experiment, we evaluate our method
model termed as Crossbar-Net. Our method combines two novel on a real CT kidney tumor dataset which is collected from
schemes: 1) we originally proposed the crossbar patches, which 94 different patients including 3500 CT slices. Compared with
consists of two orthogonal non-squared patches (i.e., the vertical state-of-the-art segmentation methods, the results demonstrate
patch and horizontal patch). The crossbar patches are able the superior performance of our method on the Dice simi-
to capture both the global and local appearance information larity coefficient, true positive fraction, centroid distance, and
of the kidney tumors from both the vertical and horizontal Hausdorff distance. Moreover, to exploit the generalization to
directions simultaneously. 2) With the obtained crossbar patches, other segmentation tasks, we also extend our Crossbar-Net to
we iteratively train two sub-models (i.e., horizontal sub-model two related segmentation tasks: 1) cardiac segmentation in MR
and vertical sub-model) in a cascaded training manner. During images and 2) breast mass segmentation in X-ray images, showing
the training, the trained sub-models are encouraged to become the promising results for these two tasks. Our implementation is
more focused on the difficult parts of the tumor automatically released at https://github.com/Qianyu1226/Crossbar-Net.
(i.e., mis-segmented regions). Specifically, the vertical (horizontal)
Index Terms— Deep convolutional neural network, kidney
tumors, crossbar-net, image segmentation, CT images.
Manuscript received August 8, 2018; revised February 6, 2019; accepted
March 7, 2019. Date of publication March 18, 2019; date of current version
June 20, 2019. This work was supported in part by the NSFC under I. I NTRODUCTION
Grant 61432008 and Grant 61673203, in part by the Young Elite Scientists
Sponsorship Program by CAST under Grant YESS 2016QNRC001, in part
by the CCF-Tencent Open Research Fund under Grant RAGR 20180114,
in part by the Projects of the Shandong Province Higher Educational Science
and Technology Program under Grant J18KA370 and Grant J15LN58, in part
R ENAL cell carcinoma is a common urologic cancer
arising from renal cortex [3]–[5]. Accurate quantification
and correct classification of tumors could largely influence
by the Project of the Shandong Medicine and Health Science Technology the effect of the following computer-aided treatment of renal
Development Plan under Grant 2017WSB04071, in part by the Shandong cell carcinoma [6]. In this meaning, for the quantification
Province Science and Technology Development Plan Project under Grant
2014GSF118086, in part by the Zhejiang Key Technology Research Devel- and classification, the accurate kidney tumor segmentation
opment Program under Grant 2018C03024, in part by the Jiangsu Key is a significant prerequisite. Traditional human-based manual
Technology Research Development Program under Grant BE2017664, in part delineation for kidney tumor segmentation is not desirable in
by the Suzhou Science and Technology Projects for People’s Livelihood
under Grant SYS2018010, in part by the Suzhou Science and Technology clinical practice, due to both the subjective (e.g., incorrect
Development Project under Grant SZS201818, and in part by the SND delineation) and objective (e.g., a large number of images)
Medical Plan Project under Grant 2016Z010 and Grant 2017Z005. The factors. Thus, computer-aided automatic segmentation meth-
associate editor coordinating the review of this manuscript and approv-
ing it for publication was Dr. Jocelyn Chanussot. (Corresponding authors: ods for kidney tumors (in CT images) are in high demand.
Yinghuan Shi; Yang Gao.) However, segmenting the kidney tumors automatically in
Q. Yu is with the State Key Laboratory for Novel Software Technology, CT images is a very challenging task. According to the clinical
National Research Institute for Big Data Science in Health and Medicine,
Nanjing University, Nanjing 210008, China, also with the Collaborative and experimental observation,
Innovation Center of Novel Software Technology and Industrialization, Nan- • The location of different kidney tumors in medical images
jing University, Nanjing 210008, China, and also with the School of Data
and Computer Science, Shandong Women’s University, Jinan 250000, China
is difficult to predict since the tumors could possibly
(e-mail: yuqian@sdwu.edu.cn). appear in very different places between different patients.
Y. Shi, J. Sun, and Y. Gao are with the State Key Laboratory for Novel • Different tumors between different patients usually show
Software Technology, National Research Institute for Big Data Science in
Health and Medicine, Nanjing University, Nanjing 210008, China, and also
very diverse shape appearance and volumetric size
with the Collaborative Innovation Center of Novel Software Technology according to the different growth stages.
and Industrialization, Nanjing University, Nanjing 210008, China (e-mail: • The tumors and their surrounding tissues are with very
syh@nju.edu.cn; jinquansun@gmail.com; gaoy@nju.edu.cn).
J. Zhu is with the Suzhou Science and Technology Town Hospital, Suzhou
similar texture information due to the low contrast of CT
215153, China (e-mail: zeno1839@126.com). images.
Y. Dai is with the Suzhou Institute of Biomedical Engineering and Although several works have been proposed recently [1],
Technology, Chinese Academy of Sciences, Suzhou 215163, China (e-mail:
daiyk@sibet.ac.cn). [2], [6]–[9], their segmentation performance could not be
Digital Object Identifier 10.1109/TIP.2019.2905537 robustly guaranteed in different cases (see Fig. 1). Both (1)
1057-7149 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
YU et al.: CROSSBAR-NET: NOVEL CNN FOR KIDNEY TUMOR SEGMENTATION IN CT IMAGES 4061
In particular, during the training process, the trained vertical a hybrid level set method with the ellipsoidal shape con-
and horizontal sub-models are encouraged to help each other straint, then calculated the low-level features (e.g., mean value,
in a way of asking the other one to help segment its difficult standard deviation, histogram of oriented gradients [25]) on
parts. Taking the vertical sub-model as an example, if it the obtained region, and finally performed the decision tree
cannot segment a region correctly in the vertical direction, to distinguish the kidney tumor and arterial blood vessel
the horizontal sub-model could complement the unsatisfactory regions. Linguraru et al. [1] first extracted kidney tumors
segmentation in the horizontal direction. Also, the vertical by the region growing for initial segmentation, and then
sub-model is required to help the horizontal sub-model in applied geodesic active contours to refine the segmenta-
the same way. Thus, the two sub-models could complement tion result. Also, Linguraru et al. [9] described the kidney
each other to achieve the self-improvement until convergence. tumors using the low-level visual features (e.g., histograms
Additionally, to make the training process more effective and of curvature features). Similarly, Lee et al. [7] first detected
efficient, we propose two sampling strategies (i.e., the basic region-of-interest by analyzing the textural and contextual
sampling strategy and the covering re-sampling strategy). information, and then extracted the mass candidates with the
The former samples the discriminative patches and balances region growing and active contours. Hodgdon et al. [8] first
the different classes (i.e., tumor or non-tumor) to allow the extracted the texture features (i.e., gray-level histogram mean
efficient model training with less patch redundancy, while the and variance, gray-level co-occurrence and run-length matrix
latter guarantees the complementary help between different features [26]), and then trained a support vector machine
sub-models. These two strategies facilitate self-improvement (SVM) to segment the fat-poor angiomyolipoma from renal
for these sub-models together. cell carcinoma in CT images. These aforementioned low-level
Since our proposed method involves the sampled crossbar methods perform well in the simple case when the tumors
patches from two directions, and the cascaded training process show different appearance with surrounding tissues. However,
to iteratively train the vertical and horizontal sub-models, their performance could not be fully guaranteed when the
we name our method as Crossbar-Net in the following parts. shape and texture of the tumors are close to the surrounding
Overall, the contributions of our work can be summarized in tissues.
the following four folds: Recent trends of using deep features or deep models have
• Our crossbar patches could capture both the local detail demonstrated the effectiveness in several segmentation tasks.
information and global contextual information. Also, Although the attempts of developing the specific deep feature-
these patches are easy to sample without introducing any based methods for segmenting kidney tumor are very lim-
additional parameters to train. ited, the related deep feature-based segmentation methods
• Our cascaded training process could help provide com- for segmenting other medical organs [10], [15], [23], [24],
plementary information between different sub-models to [27], can be borrowed to segment the kidney tumor. For
enhance the final segmentation. In our training process, instance, Ciresan et al. [23] employed multiple deep networks
the sub-models can perform the self-improvement itera- to segment biological neuron membranes by extracting the
tively. squared patches in multi-scales with sliding-window. Prasoon
• Our model is easy to implement and extend. Beyond et al. [27] designed a triplanar CNN model to segment
kidney tumor segmentation in CT images, we have eval- cartilage with multiple view patches. Moeskops et al. [15]
uated our method on cardiac segmentation in MR images proposed a multi-scale CNN model to segment brain MR
and breast mass segmentation in X-ray images, showing images. Wang et al. [24] devised a multi-branch CNN model
promising results and good generalization ability. to segment lung nodules. Shi et al. [10] proposed a cascaded
• Our method is fast to train and test, although it is deep domain adaptation model to segment the prostate in CT
trained in a cascaded manner. Taking kidney tumor as images. However, we notice that the squared patches used in
an example, the training time is about 1h and the testing these above works are one of the major bottlenecks when
time is about 1.5s on a regular GPU. borrowing them to segment the kidney tumors which are
The rest of this paper is organized as follows. Related works experimentally demonstrated in our experiment. To address
about kidney tumor segmentation in recent years are briefly the limitation of the local squared patch, several attempts
introduced in Section II. We then describe the framework and about the image-level segmentation are exploited in recent
technical details of Crossbar-Net in Section III. Experimental years. Mortazi et al. [12] presented a novel multi-view CNN
results are reported in Section IV. Finally, we conclude our model to fuse the information from the axial, sagittal, and
paper in Section V. coronal views of cardiac MRI. This model performed well
in the task of the left atrium and proximal pulmonary veins
segmentation. Ronneberger et al. [19] proposed a widely-used
II. R ELATED W ORK model in medical image segmentation tasks, namely U-Net,
For kidney tumor segmentation, according to the way of fea- to solve the cell tracking problem. Recently, a new image-
ture representation, most of the previous methods belong to the based model, SegCaps [22] was developed which achieved a
low-level methods. The low-level methods either employ the promising result in the task of segmenting pathological lungs
energy minimization-based models or learn the segmentation from low dose CT scans. Also, He et al. [28] introduced a
model on the extracted hand-crafted features. For example, fully convolutional network with distinctive curve to segment
Skalski et al. [2] first located the kidney region through the pelvic organ.
YU et al.: CROSSBAR-NET: NOVEL CNN FOR KIDNEY TUMOR SEGMENTATION IN CT IMAGES 4063
Fig. 3. Framework of the proposed method. Vi and Hi are vertical sub-model and horizontal sub-model of the i-th round, respectively, i = 1, · · · , T .
In summary, compared with the previous segmentation can perform the self-improvement during different rounds to
methods, all of them employ either the image-level or squared complement each other iteratively.
patch-level segmentation, while our Crossbar-Net involves the
crossbar patches (non-squared patches) to capture the both III. M ETHOD
(1) local detail information and (2) global context information In this section, we first introduce our methodology and
from vertical and horizontal directions. sampling strategy of crossbar patch, then present the sub-
Recently, training deep models in a cascaded or boosting- model setup and illustrate the training process, and finally
like manner for better performance have aroused considerable discuss the testing process.
interests [29]–[33]. For example, Shalev-Shwartz et al. [32]
proposed a SelfieBoost model, which boosted the performance
A. Our Methodology
of a single network based on minimizing the maximal loss.
Karianakis et al. [30] proposed an object detection framework The framework of Crossbar-Net is schematized in Fig. 3,
according to the boosted hierarchical features. Walach and which includes the training and testing stages. Note that,
Wolf [29] introduced a boosted-CNN model where the latter we convert our segmentation task to a pixel-wise classification
CNN was added according to the error of the former CNN, problem, which intends to predict a patch to be a tumor or
and finally all CNNs were joined via a sum layer. Similarly, non-tumor class.
Havaei et al. [33] presented a cascaded architecture consisting In the training stage, firstly, the crossbar patches are ini-
of two CNNs to segment glioblastomas in MR images, where tially extracted from the training CT images under the basic
the output probabilities of the first CNN was added to the sampling strategy (detailed in Section III-B) with the manual
layers of the second CNN. For these above methods, multi- segmentation available as the ground truth. Also, the vertical
modal fusion and enhancement in a boosting-like manner and horizontal sub-models are initially trained in the 1-st
are the common choices. Also, our Crossbar-Net consists of round, denoted as V1 and H1 , respectively. Then, regarding the
the fusion from vertical and horizontal sub-models. As for cascaded training process, at the t-th round, we evaluate the
what to enhance, in addition to hierarchical features [30], [33], segmentation performance of the current trained vertical and
the misclassified samples are enhanced in the next rounds horizontal sub-models, and select the mis-segmented regions
to raise the concerns from the classifier. Adaboost [34] and of each sub-model. Formally, the mis-segmented region indi-
co-training models [35]–[39] are the typical misclassified- cates the vertical or horizontal patch whose central pixel is
samples-enhancing methods. Inspired by this, we train a cas- misclassified, i.e., if a central pixel along with its located
caded model composed of vertical and horizontal sub-models vertical patch is misclassified by a vertical sub-model, then
by enhancing the region around misclassified pixels. its corresponding vertical patch is a mis-segmented region.
Compared with the previous cascaded methods, the major And then, we re-sample the mis-segmented regions using the
distinctions of Crossbar-Net are (1) our model learns both covering re-sampling strategy (detailed in Section III-B) to
local detail features and global context features from two obtain the corresponding re-sampling patches. Then, we feed
directions simultaneously, and (2) our model is composed of these re-sampling patches to another sub-model for its model
the sub-models from two directions, in which the sub-models training. Meanwhile, beyond the aforementioned re-sampling
4064 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 8, AUGUST 2019
TABLE I
D ETAILS OF S UB -M ODEL A RCHITECTURES . C ONV AND P OOLING D ENOTE C ONVOLUTIONAL L AYER AND P OOLING L AYER , R ESPECTIVELY
the vertical and horizontal sub-models in the i -th round as Vi the tumor patches, we sample the red points on different
and Hi , respectively. The training process can be detailed as rows or columns as the central pixels. The sampling
follows: intervals for these two types of patches are different
Firstly, extracting crossbar patches with the basic sampling from that in the previous rounds. Meanwhile, if a patch
strategy and training the initial vertical and horizontal sub- has already been extracted in the covering re-sampling,
models, i.e., V1 and H1 , respectively. We use an example it cannot be sampled in the basic sampling in the same
to illustrate the detailed implementation process of basic round. The advantage of the above way is to avoid
sampling strategy: redundant sampling which might cause over-fitting. Here,
As shown in Fig. 4(b), in this round, we first sample pixels the covering re-sampling strategy aims to enhance the
on the green contours as the centers of non-tumor patches. role of mis-segmented regions, while the basic sampling
Then, we select the red points as the centers of tumor patches strategy wishes to control the amount and distribution of
on the odd rows inside the tumor. The sampling interval on the training samples and prevent the sub-model from over-
contours near the tumor is smaller than that on the contours emphasizing the misclassified pixels.
which are far away from the tumor, with the goal of sampling • Employing these newly generated patches to train Vi−1
more patches near the tumor boundary. Here, we denote the to obtain Vi .
sets of these pixels (i.e., the location of these pixels) and • Updating the Hi from Hi−1 , similarly.
their ground truth labels as X and Y , and the corresponding Finally, we repeat the aforementioned steps by updating
V1 H1 ,
vertical and horizontal training patches as Pbasic and Pbasic Hi and Vi from Hi−1 and Vi−1 iteratively, until (1) maximum
respectively. training round number reaches or (2) the training error of each
Secondly, we continuously update our Crossbar-Net based sub-model could not be reduced significantly anymore.
on the currently obtained sub-models. Specifically, in the i -th Overall, the advantages of our cascaded training can be
round (i > 1), summarized as: The vertical and horizontal sub-models could
• Performing the evaluation on Hi−1 . In particular, we input complement each other during the cascaded training. When
H 1 to Hi−1 for the evaluation at the i -th round,
the Pbasic the features in one direction are not very discriminative for
by predicting the label of central pixel in each patch segmentation, the features in another direction could make
H 1 . In this meaning, the mis-segmented regions
from Pbasic up for the current direction. For example, if the boundaries
in Hi−1 are determined according to the predicated labels. are blurred in the vertical direction, they might be sharp in
Formally, assuming that all these predicted labels as Y H , the horizontal direction. In the remaining rounds, the vertical
we define the misclassified pixels as: (horizontal) sub-model iteratively feeds the generated crossbar
patches using covering re-sampling strategy to the horizon-
i−1
C H = x j |x j ∈ X ∧ I ( y j = y j ), j = 1, . . . , N (2) tal (vertical) sub-model in the same round, until the conver-
gence. This can emphasize the learning on the mis-segmented
where x j is the central pixel of the j -th patch, y j and regions and guarantee the sub-models to complement each
y j are the predicted and ground truth label of x j , yj ∈ other.
H and y j ∈ Y . N is the number of horizontal training
Y As a boosting-like algorithm, both the horizontal and verti-
patches. I (s) is an indicator function. I (s) = 1 if and cal sub-model can perform self-improvement with both basic
only if the statement s is true and I (s) = 0 otherwise. sampling patches and covering re-sampling patches as the
• Performing both covering re-sampling strategy (on the training samples. Here, the self-improvement means that the
mis-segmented regions in Hi−1 ) and the basic sampling performance of sub-model in one direction can be improved
strategy, to obtain the newly generated vertical patches. along with the increase of rounds. We claimed that the basic
Firstly, the covering re-sampling is performed. We also sampling patches in the current round are sampled at different
record the position of the patches obtained by covering intervals with the previous rounds, which is equivalent to
re-sampling strategy in the current round, denoted as adding new training data to the sub-model. This might be a
i
PreV . Then, the basic sampling strategy is sequentially major cause of the self-improvement. In addition, if both sub-
performed. Specifically, as i varies, as shown in Fig. 4(b), models in the same round fail on a same pixel, the correspond-
we sample central pixels for non-tumor patches on dif- ing mis-segmented region of this pixel in the current round
ferent contours which are labeled by different colors. For will be definitely enhanced in both vertical and horizontal
4066 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 8, AUGUST 2019
E. Testing
In the testing phase, for a new coming image, we first extract
the crossbar patches for each pixel. Then, we input these
Fig. 6. Example of the cropped breast mass image and the extracted crossbar
extracted patches to the trained vertical and horizontal sub- patch. (a) is the cropped image. (b) shows the crossbar patch extracted outside
models in each round, to predict the central pixel of each patch the image.
belongs to the tumor region or not. Each sub-model outputs
a segmentation result, and the final result is generated by a
majority voting on all obtained results. Formally, assuming radiologists. There are in total 1,000 selected images, among
T as the number of maximum round, we can obtain 2T which 600 and 100 images are training and validation set
sub-models as V1 , · · · , VT and H1 , · · · , HT . The result is and the remaining 300 images are testing set. In most of
determined by these 2T sub-models. the breast mass segmentation methods, the image is cropped
to the bounding box of ROI [47], [48] or 1.2 times of the
IV. E XPERIMENTAL R ESULTS bounding box [49]. We follow the cropping manner in [49].
An example is shown in Fig. 6(a). According to the basic
Now we validate the advantages of Crossbar-Net qualita- sampling strategy, some patches might be extracted outside
tively and quantitatively. After the introduction of datasets, the image (Fig. 6(b)). For these patches, the parts outside the
evaluation criteria, and implementation details, we first inves- image are filled with black. When implementing the covering
tigate the characteristics of Crossbar-Net. Then, we present re-sampling strategy, the black part of mis-segmented regions
comparisons between Crossbar-Net and baseline methods on would not be re-sampled. Most tumors in these images are
kidney tumor dataset. Finally, we apply Crossbar-Net to the smaller than 450 pixels in diameter with very few of them
cardiac and breast mass segmentation task to show that our whose diameter is about 600 pixels.
model could be extended to other organ segmentation. 3) Cardiac Dataset: This dataset is available from [50] and
comprised of cardiac MRI sequences for 33 subjects with total
A. Data 7,980 MR slices. The image resolution is 256×256 with the in-
1) Kidney Tumor Dataset: This dataset is independently plane pixel size as 0.9-1.6 mm2 , and the inter-slice thickness as
collected from Suzhou Science and Technology Town Hos- 6-13 mm. In each image, endocardial and epicardial contours
pital. 3,500 CT slices of 94 subjects in total are used for per- of the left ventricle (LV) are provided as the ground truth.
formance evaluation, with one tumor per slice. The resolution We randomly select 20 and 3 subjects as the training and
of the image is 512 × 512 with 1 × 1 mm2 /pixel, and the validation set to train sub-models. The images of the remaining
spacing between slices is 1 mm. For each image, the diameter 10 subjects form a testing set for evaluation. The amount of
of tumors ranges from 7 pixels to 90 pixels, and the tumor training patches is about 350,000 in the 1-st round, and about
is manually annotated by the physician as the ground truth 100,000 and 50,000 samples are sampled in the 2-nd and 3-rd
for training. We randomly partitioned the dataset into three round respectively.
subsets including the training, validation and testing sets which
consist of 50, 8 and 36 subjects respectively. The sub-models B. Evaluation Criteria
in the 1-st round is trained using about 580,000 patches being We employ the Dice similarity coefficient (DSC) and the
extracted by the basic sampling strategy. The epoch number of true positive fraction (TPF) as the primary evaluation criteria
each sub-model is automatically determined by the validation for assessing the segmentation performance. DSC is usually
set, and the performance of each sub-model in each round is employed to measure the overlap between the prediction and
still evaluated by the training set. manual segmentation. A large DSC indicates a high segmen-
2) Breast Masses Dataset: A subset of DDSM [44], [45] tation accuracy. TPF indicates the percentage of correctly
is introduced to investigate the performance of our method. predicted tumor pixels in the manually segmented region.
The image in this dataset is LJPEG format and we convert The higher the TPF is, the larger the coverage of the true
them into PNG format as [46]. There are 1,923 malignant and tumor region is. We also introduce the centroid distance (CD)
benign cases in DDSM, and each case includes two images and the Hausdorff distance (HD) to evaluate the segmentation
of each breast. The Regions of Interesting (ROI) are given in accuracy. CD indicates the distance between the central pixels
images containing the suspicious areas. Since the ROIs are not of the final segmentation and manual segmentation which is
the accurate boundary of a tumor, the boundary of each tumor used to indicate the Euclidean distance between two central
is annotated again as the ground truth by the experienced points in a 3-D space. Similar to CD, a smaller HD indicates
YU et al.: CROSSBAR-NET: NOVEL CNN FOR KIDNEY TUMOR SEGMENTATION IN CT IMAGES 4067
C. Implementation Details
For the scale of crossbar patch on all datasets, we set the
size of the horizontal patch as 20 × 100 and vertical patch as
100 × 20 on kidney data and cardiac data, and 50 × 500 and
500 × 50 on DDSM, respectively. We implement our networks
on MatConvnet toolbox [53]. In order to improve the credi-
bility of segmentation, the training and testing procedure are
repeated 3 times in all experiments. In each time, the training, Fig. 7. Illustration of training error of each sub-model. The sub-models are
validation, and testing subsets are selected randomly, and we trained separately.
report the final average performance. Each sub-model reaches
convergence within 20, 25 and 25 epoches on kidney tumor,
cardiac and DDSM, respectively. We run all deep models on
NVIDIA GTX 1080 Ti.
D. Characteristics of Crossbar-Net
We analyze four aspects about the characteristics of the
proposed Crossbar-Net, which including: (1) the sub-model
can perform self-improvement, (2) the sub-models in the same
training iteration can complement and benefit each other,
(3) the advantage of combining all the sub-models for final
segmentation, and (4) the effectiveness of crossbar patches.
1) Self-Improvement: The purpose of this experiment is to
show that if the vertical sub-model can self-improve without
Fig. 8. Performance of each sub-model on kidney tumor. The left column
the involvement of the horizontal sub-model, and vice versa. indicates the ground truth image, the second to forth columns indicate the
We first train the vertical sub-model separately to obtain results of three vertical sub-models, and the last three columns indicate the
V1 . Then, we re-sample the mis-segmented regions with the results of three horizontal sub-models, respectively.
covering re-sampling strategy, where the re-sampling patches
are the vertical patches instead of the horizontal patches. Then, TABLE II
we train V1 with these patches together with those gotten from AVERAGE M ETRICS FOR E ACH S UB -M ODEL ON T ESTING S ET
the basic sampling strategy to get V2 . Similarly, we can obtain
the following V3 , V4 , · · · in a same way. We repeat training
the sub-model (10 times here) until the training segmenta-
tion error converges. The horizontal sub-model is excluded
throughout the process. The same process is also employed
on horizontal sub-model. As illustrated in Fig. 7, the training
error of each sub-model decreases with the increase of rounds.
This experiment also shows that although each sub-model can background correctly while V1 is better. At the same time,
self-improve, it takes 10 rounds for vertical (horizontal) sub- the degree of disagreement is reduced between V2 and H2 . V3
model to its convergence. In fact, only 3 rounds are needed if and H3 both achieve the promising results eventually. For other
we train the sub-models in our manner shown in Fig. 3 instead tumors in the remaining rows, in addition to complementary
of using this separate manner. regions, there are more or less common misclassified pixels.
2) Complement Between Sub-Models: In this experiment, For example, in the second tumor, the upper right boundary
we train sub-models in the manner as illustrated in Fig. 3. and the lower right boundary are incorrectly segmented by
After 3 rounds, the training error of the vertical and horizontal both sub-models in the first two rounds. In this case, V2 and
sub-models reaches its convergence. Thus, the sub-models in H2 are obviously superior to their former sub-models. Also,
these 3 rounds, which are V1 , V2 , V3 and H1 , H2 , H3 , will be H3 and V3 achieves the best performance. This observation
used in all the following kidney tumor experiments. As shown verifies that sub-models can benefit and complement each
in Fig. 8, we illustrate a typical case for visualization in the other. Basically, in all cases, the performances of later sub-
first row. In this case, we can observe V1 fails to properly models are superior to the former sub-models.
segment the upper and lower parts of the tumor from vertical 3) The Way of Combination: Table II is the corresponding
direction while H1 performs well from the horizontal direction. quantitative result of Fig. 8, which confirms that the latter sub-
Similarly, H1 cannot segment the tumor with its left and right model works better than the previous sub-model. However,
4068 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 8, AUGUST 2019
Fig. 10. t-SNE visualization of the high level representations in Crossbar-Net. (a) The test image. (b) Cluster of squared patches. (c) Cluster of vertical
patches. (d) Cluster of horizontal patches.
TABLE III
C OMPARISON A MONG D IFFERENT M ETHODS ON K IDNEY T UMORS
Fig. 12. Examples of segmentation results with ground truth on kidney tumor dataset. The red, blue, yellow and green curves are manual annotation,
Crossbar-net, multi-scale 3D-CNN [55], and SegCaps [22] contour, respectively.
cases, even though the tumor has a distinctive texture. The 3D-CNN is competitive with SegCaps especially on small
first and second image in the first row of Fig. 12 are tumors, which may be related to its 3 × 3 × 3 small con-
the unsatisfactory cases of the SegCaps. The multi-scale volutional kernels.
YU et al.: CROSSBAR-NET: NOVEL CNN FOR KIDNEY TUMOR SEGMENTATION IN CT IMAGES 4071
TABLE IV
AVERAGE DSC OF E ACH S UB -M ODEL IN LV S EGMENTATION
AND B REAST M ASS S EGMENTATION
TABLE V
DSC OF E ACH M ETHOD IN B REAST M ASS S EGMENTATION
Fig. 15. Performance of each sub-model in LV cave segmentation on
the cardiac dataset. The left column is ground truth image, the second to
forth columns are three vertical sub-models, and the last three columns are
horizontal sub-models, respectively.
Fig. 16. Examples of resulting segmentations for subjects 24-33 of the cardiac dataset [65].
information simultaneously. For the way of sampling patches, [15] P. Moeskops, M. A. Viergever, A. M. Mendrik, L. S. de Vries,
the basic sampling strategy and covering re-sampling strat- M. J. Benders, and I. Išgum, “Automatic segmentation of MR brain
images with a convolutional neural network,” IEEE Trans. Med. Imag.,
egy are proposed. The combination of these two strategies vol. 35, no. 5, pp. 1252–1261, May 2016.
not only enhances the role of mis-segmented regions but [16] G. Wang et al., “Interactive medical image segmentation using deep
also prevents sub-models from being over-emphasized on the learning with image-specific fine tuning,” IEEE Trans. Med. Imag.,
vol. 37, no. 7, pp. 1562–1573, Jul. 2018.
mis-segmented regions. For the cascaded training style, the
[17] R. Zhang et al., “Automatic segmentation of acute ischemic stroke from
segmentation result of sub-models in one direction can be DWI using 3-D fully convolutional densenets,” IEEE Trans. Med. Imag.,
complemented by sub-models in the other direction, and each vol. 37, no. 9, pp. 2149–2160, Sep. 2018.
sub-model can perform self-improvement with re-sampling the [18] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks
for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern
mis-segmented region. Our model can simultaneously learn Recognit. (CVPR), Jun. 2015, pp. 3431–3440.
a variety of information and achieve promising segmentation [19] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
results on different size, shape, contrast and appearance of for biomedical image segmentation,” in Proc. Int. Conf. Med. Image
Comput. Comput. -Assist. Intervent., 2015, pp. 234–241.
kidney tumors. Moreover, the successful application on cardiac
[20] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-Net: Fully convolutional
and breast mass segmentation shows that Crossbar-Net has neural networks for volumetric medical image segmentation,” in Proc.
a wide range of application. The future work is to extend 4th Int. Conf. 3D Vis. (3DV), Oct. 2016, pp. 565–571.
the direction of symmetric information from horizontal and [21] L. Zhang, M. Sonka, L. Lu, R. M. Summers, and J. Yao, “Combining
fully convolutional networks and graph-based approach for automated
vertical axes to the other axes. segmentation of cervical cell nuclei,” in Proc. IEEE 14th Int. Symp.
Biomed. Imag. (ISBI), Apr. 2017, pp. 406–409.
[22] R. LaLonde and U. Bagci. (Apr. 11, 2018). “Capsules for object
R EFERENCES segmentation.” [Online]. Available: https://arxiv.org/abs/1804.04241
[23] D. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber, “Deep
[1] M. G. Linguraru et al., “Automated noninvasive classification of renal neural networks segment neuronal membranes in electron microscopy
cancer on multiphase CT,” Med. Phys., vol. 38, no. 10, pp. 5738–5746, images,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2012,
2011. pp. 2843–2851.
[2] A. Skalski, J. Jakubowski, and T. Drewniak, “Kidney tumor segmenta-
[24] S. Wang et al., “Central focused convolutional neural networks: Devel-
tion and detection on computed tomography data,” in Proc. IEEE Int.
oping a data-driven model for lung nodule segmentation,” Med. Image
Conf. Imag. Syst. Techn. (IST), Oct. 2016, pp. 238–242.
Anal., vol. 40, no. 40, pp. 172–183, 2017.
[3] J. J. Oh et al., “Accurate risk assessment of patients with patho-
[25] G. Tsai, Histogram of Oriented Gradients, vol. 1, no. 1. Ann Arbor, MI,
logic t3an0m0 renal cell carcinoma,” Sci. Rep., vol. 8, no. 1, 2018,
Art. no. 13914. USA: Univ. Michigan Press, 2010, pp. 1–17.
[4] American Cancer Society. (2018). Cancer Facts & Figures. [Online]. [26] X. Tang, “Texture information in run-length matrices,” IEEE Trans.
Available: https://www.cancer.org/research/cancer-facts-statistics/all- Image Process., vol. 7, no. 11, pp. 1602–1609, Nov. 1998.
cancer-facts-figures/cancer-facts-figures-2018.html [27] A. Prasoon, K. Petersen, C. Igel, F. Lauze, E. Dam, and M. Nielsen,
[5] D. Xiang et al., “Cortexpert: A model-based method for automatic “Deep feature learning for knee cartilage segmentation using a triplanar
renal cortex segmentation,” Med. Image Anal., vol. 42, pp. 257–273, convolutional neural network,” in Proc. Int. Conf. Med. Image Comput.
Dec. 2017. Comput.-Assist. Intervent., 2013, pp. 246–253.
[6] D.-Y. Kim and J.-W. Park, “Computer-aided detection of kidney tumor [28] K. He, X. Cao, Y. Shi, D. Nie, Y. Gao, and D. Shen, “Pelvic organ seg-
on abdominal computed tomography scans,” Acta Radiologica, vol. 45, mentation using distinctive curve guided fully convolutional networks,”
no. 7, pp. 791–795, 2004. IEEE Trans. Med. Imag., vol. 38, no. 2, pp. 585–595, Feb. 2019.
[7] H. S. Lee, H. Hong, and J. Kim, “Detection and segmentation of small [29] E. Walach and L. Wolf, “Learning to count with cnn boosting,” in Proc.
renal masses in contrast-enhanced ct images using texture and context Eur. Conf. Comput. Vis., 2016, pp. 660–676.
feature classification,” in Proc. IEEE 14th Int. Symp. Biomed. Imag. [30] N. Karianakis, T. J. Fuchs, and S. Soatto. (Mar. 21, 2015). “Boosting
(ISBI), Apr. 2017, pp. 583–586. convolutional features for robust object proposals.” [Online]. Available:
[8] T. Hodgdon, M. D. McInnes, N. Schieda, T. A. Flood, L. Lamb, and https://arxiv.org/abs/1503.06350
R. E. Thornhill, “Can quantitative ct texture analysis be used to [31] H. Schwenk and Y. Bengio, “Boosting neural networks,” Neural Com-
differentiate fat-poor renal angiomyolipoma from renal cell carcinoma put., vol. 12, no. 8, pp. 1869–1887, 2000.
on unenhanced ct images?” Radiology, vol. 276, no. 3, pp. 787–796, [32] S. Shalev-Shwartz. (Nov. 14, 2014). “Selfieboost: A boosting algorithm
2015. for deep learning.” [Online]. Available: https://arxiv.org/abs/1411.3436
[9] M. G. Linguraru et al., “Computer-aided renal cancer quantification and [33] M. Havaei et al., “Brain tumor segmentation with deep neural networks,”
classification from contrast-enhanced ct via histograms of curvature- Med. Image Anal., vol. 35, pp. 18–31, Jan. 2017.
related features,” in Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., [34] Y. Freund, R. Schapire, and N. Abe, “A short introduction to boosting,”
Sep. 2009, pp. 6679–6682. J.-Jpn. Soc. Artif. Intell., vol. 14, no. 5, pp. 771–780, 1999.
[10] Y. Shi, W. Yang, Y. Gao, and D. Shen, “Does manual delineation
[35] A. Blum and T. M. Mitchell, “Combining labeled and unlabeled data
only provide the side information in ct prostate segmentation?” in
with co-training,” in Proc. 11th Annu. Conf. Comput. Learn. Theory
Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent., 2017,
(COLT), 1998, pp. 92–100.
pp. 692–700.
[11] B. He, D. Xiao, Q. Hu, and F. Jia, “Automatic magnetic resonance image [36] S. A. Goldman and Y. Zhou, “Enhancing supervised learning with
prostate segmentation based on adaptive feature learning probability unlabeled data,” in Proc. 17th Int. Conf. Mach. Learn., 2000,
boosting tree initialization and CNN-ASM refinement,” IEEE Access, pp. 327–334.
vol. 6, pp. 2005–2015, Feb. 2018. [37] I. Muslea, S. Minton, and C. A. Knoblock, “Selective sampling with
[12] A. Mortazi, R. Karim, K. S. Rhode, J. Burt, and U. Bagci, “CardiacNET: redundant views,” in Proc. Nat. Conf. Artif. Intell., 2000, pp. 621–626.
Segmentation of left atrium and proximal pulmonary veins from MRI [38] Z.-H. Zhou, K.-J Chen, and H.- Dai, “Enhancing relevance feedback in
using multi-view CNN,” in Proc. Med. Image Comput. Comput.-Assist. image retrieval using unlabeled data,” ACM Trans. Inf. Syst., vol. 24,
Intervent., 2017, pp. 377–385. no. 2, pp. 219–244, 2006.
[13] M. Khened, V. Alex, and G. Krishnamurthi, “Densely connected fully [39] E. M. Ardehaly, and A. Culotta. (Sep. 13, 2017). “Co-training for
convolutional network for short-axis cardiac cine MR image segmenta- demographic classification using deep learning from label proportions.”
tion and heart diagnosis using random forest,” in Proc. Int. Workshop [Online]. Available: https://arxiv.org/abs/1709.04108
Statistical Atlases Comput. Models Heart, 2018, pp. 140–151. [40] Y. Shi, Y. Gao, S. Liao, D. Zhang, Y. Gao, and D. Shen, “Semi-automatic
[14] J. Patravali, S. Jain, and S. Chilamkurthy, “2D-3D fully convolutional segmentation of prostate in CT images via coupled feature representation
neural networks for cardiac MR segmentation,” in Proc. Int. Workshop and spatial-constrained transductive lasso,” IEEE Trans. Pattern Anal.
Stat. Atlases Comput. Models Heart, 2018, pp. 130–139. Mach. Intell., vol. 37, no. 11, pp. 2286–2303, 2015.
4074 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 8, AUGUST 2019
[41] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Qian Yu received the master’s degree from the
boltzmann machines,” in Proc. 27th Int. Conf. Mach. Learn. (ICML), School of Computer Science and Technology,
2010, pp. 807–814. Shandong University, China, in 2009. She is cur-
[42] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and rently pursuing the Ph.D. degree with Nanjing
R. R. Salakhutdinov. (Jul. 3, 2012). “Improving neural networks by University, China. Her research interests include
preventing co-adaptation of feature detectors.” [Online]. Available: computer vision and medical image analysis.
https://arxiv.org/abs/1207.0580
[43] Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, “Efficient
backprop,” in Proc. Neural netw., Tricks Trade, 2012, pp. 9–48.
[44] M. Heath, Jr., K. Bowyer, D. Kopans, R. Moore and P. Kegelmeyer,
“The digital database for screening mammography,” in Proc. 4th Int.
Workshop Digit. Mammogr., 2000, pp. 212–218.
Yinghuan Shi received the B.Sc. and Ph.D. degrees
[45] M. Heath et al., “Current status of the digital database for screening
from the Department of Computer Science, Nanjing
mammography,” in Proc. Fourth Int. Workshop Digit. Mammography,
University, China, in 2013 and 2007, respectively.
1998, pp. 457–460.
He was a Visiting Scholar with the University of
[46] A. Sharma. (2015). DDSM Utility. [Online]. Available:
North Carolina at Chapel Hill, and the University
https://github.com/trane293/DDSMUtility
of Technology Sydney. He is currently an Associate
[47] N. Dhungel, G. Carneiro, and A. P. Bradley, “Deep structured learning
Professor with the Department of Computer Science
for mass segmentation from mammograms,” in Proc. IEEE Int. Conf.
and Technology, Nanjing University. His research
Image Process. (ICIP), Sep. 2015, pp. 2950–2954.
interests include computer vision and medical image
[48] W. Zhu, X. Xiang, T. D. Tran, G. D. Hager, and X. Xie, “Adversarial
analysis. He has published more than 50 research
deep structured nets for mass segmentation from mammograms,” in
papers in related journals and conferences. He was
Proc. IEEE Int. Symp. Biomed. Imag. (ISBI), Apr. 2018, pp. 847–850.
elected as the Young Elite Scientist by the China Association for Science and
[49] J. S. Cardoso, N. Marques, N. Dhungel, G. Carneiro, and A. P. Bradley,
Technology in 2016, as the ACM Rising Star, Nanjing, in 2017, and received
“Mass segmentation in mammograms: A cross-sensor comparison of
the Science and Technology Award for Youth by the Jiangsu Computer Society
deep and tailored features,” in Proc. IEEE Int. Conf. Image Process.,
in 2017.
Sep. 2017, pp. 1737–1741.
[50] A. Andreopoulos and J. K. Tsotsos, “Efficient and generalizable statisti-
cal models of shape and appearance for analysis of cardiac MRI,” Med. Jinquan Sun received the B.Sc. degree from
Image Anal., vol. 12, no. 3, pp. 335–357, 2008. the School of Computer Science and Technology,
[51] W. Zhang et al., “Deep convolutional neural networks for multi-modality Harbin Institute of Technology, China, in 2016. He is
isointense infant brain image segmentation,” NeuroImage, vol. 108, currently pursuing the master’s degree with Nanjing
pp. 214–224, Mar. 2015. University, China.
[52] Y. Shi, Y. Gao, S. Liao, D. Zhang, Y. Gao, and D. Shen, “A learning-
based CT prostate segmentation method via joint transductive feature
selection and regression,” Neurocomput., vol. 173, no. 2, pp. 317–331,
2016.
[53] A. Vedaldi and K. Lenc, “MatConvNet: Convolutional neural networks
for MATLAB,” in Proc. 23rd ACM Int. Conf. Multimedia, 2015,
pp. 689–692.
[54] L. V. D. Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Yang Gao received the Ph.D. degree from the
Learn. Res., vol. 9, no. Nov, pp. 2579–2605, 2008. Department of Computer Science and Technology,
[55] R. Dey, Z. Lu, and Y. Hong, “Diagnostic classification of lung nodules Nanjing University, China, in 2000. He is currently
using 3D neural networks,” in Proc. IEEE 15th Int. Symp. Biomed. Imag. a Professor and also the Deputy Director of the
(ISBI), Apr. 2018, pp. 774–778. Department of Computer Science and Technology,
[56] K. Simonyan and A. Zisserman. (Sep. 4, 2014). “Very deep convolu- Nanjing University, where he is also directing the
tional networks for large-scale image recognition.” [Online]. Available: Reasoning and Learning Research Group. He has
https://arxiv.org/abs/1409.1556, 2014 published more than 100 papers in top-tier confer-
[57] G. V. Rossum and F. L. Drake, “Python 3 reference manual,” Paramount, ences and journals. His current research interests
CA, USA: CreateSpace, 2009. include artificial intelligence and machine learning.
[58] A. Paszke et al., “Automatic differentiation in PyTorch,” in Proc. 31st He also serves as the program chair and area chair
Conf. Neural Inf. Process. Syst. Workshop Autodiff, Long Beach, CA, for many international conferences.
USA, Dec. 2017.
[59] Segcaps. (2018). GitHub. [Online]. Available: https://github.com/
lalonderodney/SegCaps Jianbing Zhu is currently a Chief Physician and
[60] M. Abadi et al.. (Mar. 14, 2016). “Tensorflow: Large-scale the Associate Director of the Radiology Department,
machine learning on heterogeneous distributed systems,” Dis- Suzhou Science and Technology Town Hospital (The
tributed, Parallel, and Cluster Computing. [Online]. Available: Affiliated Suzhou Hospital, Nanjing Medical Uni-
https://arxiv.org/abs/1603.04467 versity), with practical and research experience of
[61] J. Zhang, B. Chen, M. Zhou, H. Lan, and F. Gao, “Photoacoustic image more than 20 years, including the diagnosis of CT
classification and segmentation of breast cancer: A feasibility study,” and MRI, and more focus in abdomen imaging,
IEEE Access, vol. 7, pp. 5457–5466, 2019. tumor imaging, and medical image analysis. He has
[62] C. Santiago, J. C. Nascimento, and J. S. Marques, “Fast and accurate published more than 30 papers.
segmentation of the LV in MR volumes using a deformable model with
dynamic programming,” in Proc. IEEE Int. Conf. image Process. (ICIP),
Sep. 2017, pp. 1747–1751.
[63] C. Santiago, J. C. Nascimento, and J. S. Marques, “Combining an Yakang Dai is currently a Professor and the Asso-
active shape and motion models for object segmentation in image ciate Director of the Medical Imaging Department,
sequences,” in Proc. 25th IEEE Int. Conf. Image Process., 2018, Suzhou Institute of Biomedical Engineering and
pp. 3703–3707. Technology (SIBET), Chinese Academy of Sciences.
[64] C. Zotti, Z. Luo, O. Humbert, A. Lalande, and P.-M. Jodoin, “GridNet He has published more than 40 papers and 30 China
with automatic shape prior registration for automatic MRI cardiac innovation patents. In addition, he has developed
segmentation,” in Proc. Int. Workshop Statist. Atlases Comput. Models several open medical image/signal analysis soft-
Heart, 2018, pp. 73–81. ware toolboxes (including MITK, 3DMed, eConnec-
[65] C. Santiago, J. C. Nascimento, and J. S. Marques, “A new robust active tome, iBEAT, and aBEAT). His research interests
shape model formulation for cardiac MRI segmentation,” in Proc. IEEE include medical image analysis (MRI, CT, PET, and
Int. Conf. Image Process. (ICIP), Sep. 2016, pp. 4112–4115. MEG/EEG).