Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
a r t i c l e i n f o a b s t r a c t
Article history: With the rapid development of computer technology, people pay more attention to the
Received 7 April 2016 security of computer data and the computer virus has become a chief threat to computer
Accepted 30 June 2016 data security. By using an antivirus system that can identify randomly generated computer
Available online 7 July 2016
viruses and on the basis of the basic characteristics of the computer code, this paper
Communicated by X. Wu
investigates the heuristic scanning technique. This paper proposes the minimum distance
Keywords: classifier and detection model through the analysis of the malicious code. This model can
Security indigital systems identify unknown feature codes of illegal procedures and construct a healthy network
Software engineering environment by using a combination of model and experimental method, which can
Performance evaluation intercept the illegal virus program in the installation and operation stages.
© 2016 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.ipl.2016.06.014
0020-0190/© 2016 Elsevier B.V. All rights reserved.
20 B. Zhang et al. / Information Processing Letters 117 (2017) 19–24
Table 2
Relationship between four types of ROC indices.
Total a + c the actual number of white samples b + d the actual number of black samples Quantity
Total Malware negatives Malware positives All subjects
dk2 = ( X − U k )t ( X − U k ) (5)
Table 3
Malicious code behavior vector table.
Table 4 Table 5
Table of statistical characteristics. Data of the experimental sample.
Sample ID EVENT1 EVENT2 ··· EVENT N Sample type Test set Train set Sample space
1 1 0 ··· 1 0 Sample 1 576 242 818
2 1 1 ··· 1 1 Sample 0 387 243 630
3 0 0 ··· 0 Total 963 485 1448
.
.
.
N 1 1 ··· 0 0 vector, and, based on the category, c ∈ {0, 1}, which satis-
Note: 1 sample space for black sample set, 0 sample space for normal fies i = 1, 2, . . . , n.
process. The experimental steps are as follows:
parameters to study the host behavior of the malicious (1) Choose the samples 0 and 1 from the training set.
code [10]. (2) Set the center vectors of samples 0 and 1 as U k and
Writing file operations are one of the steps imple- U k′ (k = 0, 1), respectively.
mented by the illegal procedure, which makes it difficult (3) Calculate attribute variance σki .
to determine whether it is a malicious code. If we adopt (4) Calculate total variance
'2 σi .
the call parameters, the system will display the difference (5) Calculate σi = k=1 σki , and satisfy (k = 0, 1; i =
between the names of the folder. Malicious code is repre- 1, 2, . . . , n).
sented by a program that is more likely to participate in
the computer system [11]. Calculating the distance dk2 between X and U k is the
In this paper, we assume that the sample of each host’s key of the experiment. The coefficient of (xi − ukj )2 has
behavior characteristic is a statistical table tuple, and the a decisive effect on the total distance dk2 of the attribute
vector of the multidimensional space is the feature defini- value. The coefficients of the non-normalized and normal-
tion. ized Euclidean distances are 1 and 1/σkj , respectively.
The malicious code detection model classifies programs The first experimental model is the standardized Eu-
into normal program and malicious code program. In this clidean distance model:
paper, mathematical models are used to train and ana- Training set on the variance of the k class attribute i is
lyze these two types of samples to detect problems in σkj , thus
malicious code classification. Therefore, a two-dimensional
table (Table 4) is used to describe the behavior characteris- n
( ) *2
tics of each sample, ⟨α1 , α2 , . . . , αn , c ⟩ denotes conjunction dk2 = 1/(σki + %) × (xi − uki ) . (10)
characteristic quantity, where α represents a host behav- i =1
ior characteristic, c is the sample type, and range is (0, 1),
The second model is the evolution of the first experimental
where 1 represents the normal procedure samples and 0
model:
the black sample collection. As shown earlier, this feature
The variance of the whole training set is σ j , thus
does not change with the number of calls to API and the
value of 1. n
( ) *2
dk2 = 1/(σki + %) × (xi − uki ) .
3.3.3. Experimental procedure i =1
The black sample U is the malicious code sample and
the normal application sample is denoted by U ′ . Sample The third type is the sublimation of the second experimen-
data are as follows (Table 5): the total number of samples tal model, thus
is set to 1444, divided into samples 1 (black sample) and 0 1 n
( ( ) *2
(normal procedure). Data tuple X = [x1 , x2 , . . . , xn , c ] T is σi = σki dk2 = 1/(σi + %) × (xi − uki ) . (11)
written as a feature vector, x j is a behavior characteristic
k =0 i =1
B. Zhang et al. / Information Processing Letters 117 (2017) 19–24 23
Table 6
Choice of auxiliary constants.
In this paper, two different training samples, four types This study was supported by the Fundamental Research
of feature classifiers, and variances of the four attributes Funds for the Central Universities (No. 3091601510).
are established. The aim is to detect the characteristic of
the behavior of malicious code using a user-friendly model. References
To determine the difference between a sample and the
[1] Ryuiti Koike, Naoshi Nakaya, Yuji Koi, Development of system for the
overall sample, we conduct the training experiment 10
automatic generation of unknown virus extermination software, in:
times. It can be seen from Fig. 2 that after 10 experimental Proceedings of the 2007 International Symposium on Applications
runs, all four model classifier evaluation indices are stable. and the Internet, Hiroshima, Japan, 2007.
This confirms the reasonable distribution of the training [2] Hassan Salmani, Mohammad Tehranipoor, Jim Plusquellic, A novel
technique for improving hardware Trojan detection and reducing
set type and quantity and makes the center of the sample
Trojan activation time, IEEE Trans. Very Large Scale Integr. (VLSI)
feature clearer and objectively reflected. Syst. 6 (2011).
The study results show that the standard of minimum [3] Christopher Kruegel, Increase dynamic coverage, Secure Systems Lab,
Euclidean distance classifier detection index is higher than Technical University, Vienna, Sep. 2007.
other methods by about 4%, which proves that this method [4] Francesco Di Cerbo, Andrea Girardello, Florian Michahelles, Svetlana
Voronkova, Detection of malicious applications on Android OS, IEEE
is better than the other methods. The deficiency of this Comput. Soc. 11 (2010).
method lies in the fact that the false detection rate of di- [5] Po-Ching Lin, Ying-Dar Lin, Yuan-Cheng Lai, A hybrid algorithm of
alog samples is higher than that of the other methods by Backward Hashing and automaton tracking for virus scanning, IEEE
0.6%, and hence, this method needs to be improved. Trans. Comput. (2011) 594–601.
[6] U. Bayer, C. Kruegel, E. Kirda, TT analyze: a tool for analyzing mal-
ware, in: 15th Annual Conference of the European Institute for Com-
4. Conclusions puter Antivirus Research, Vienna, 2006.
[7] Wook Shin, Shinsaku Kiyomoto, Kazuhide Fukushima, Toshiaki
This paper focuses on dynamic heuristic scanning tech- Tanaka, A formal model to analyze the permission authorization and
nique and malicious code detection model. First, the dy- enforcement in the Android frame work, IEEE Comput. Soc. (2010).
[8] Symantec Security Response Center, http://www.symantec.com/
namic heuristic scanning technique is analyzed and sum- business/security_response, 2010.
marized, because this technique is widely used in the field [9] Thomas Blasing, Leonid Batyuk, Aubrey-Derrick Schmidt, Seyit Ahmet
of antivirus software and can detect the malicious code. Camtepe, Sahin Albayrak, An Android application sandbox system
24 B. Zhang et al. / Information Processing Letters 117 (2017) 19–24
for suspicious software detection, in: 2010 IEEE International Confer- in: IEEE Symposium on Computational Intelligence in Cyber Security,
ence on Malicious and Unwanted Software, MALWARE, October 2010, CICS’09, vol. 7, 2009, pp. 91–98.
pp. 55–62. [11] K. Ashcraft, D. Engler, Using programmer-written compiler extensions
[10] Bobby D. Birrer, Richard A. Raines, Rusty O. Baldwin, Mark E. Oxley, to catch security holes, in: 2002 IEEE Symposium on Security and
Using qualia and multi-layered relationships in malware detection, Privacy, May 2002, pp. 143–159.