Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
In 2009, majumdar et. Al[15] propose a T. Rashid et al. proposed that parameter learning
comprehensive database intrusion detection task in HMMs is to find, given an output sequence
system that integrates different types of evidences or a set of such sequences, the best set of state
using an extended Dempster-Shafer theory. Besides transition and emission probabilities. The task is
combining evidences, they also incorporate usually to derive the maximum likelihood estimate
learning in our system through application of prior of the parameters of the HMM given the set of
knowledge and observed data on suspicious output sequences. No tractable algorithm is known
users.In 2016, bertino et. Al[14] e tackle the insider for solving this problem exactly, but a local
threat problem from a data-driven systemic view. maximum likelihood can be derived efficiently using
User actions are recorded as historical log data in a the Baum–Welch algorithm or the Baldi–Chauvin
system, and the evaluation investigates the date algorithm. The Baum–Welch algorithm is a special
that users actually process. From the horizontal case of the expectation-maximization algorithm. If
view, users are grouped together according to their the HMMs are used for time series prediction, more
responsibilities and a normal pattern is learned sophisticated Bayesian inference methods, like
from the group behaviours. They investigate a Markov chain Monte Carlo (MCMC) sampling are
suspected user also from the diachronic view by proven to be favorable over finding a single
comparing his/her historical behaviours with the maximum likelihood model both in terms of
historical average of the same group. accuracy and stability.[12] Log data are considered
as high-dimensional data which contain irrelevant
Anomaly detection has been an important research
and redundant features. Feature selection methods
problem in security analysis, therefore
can be applied to reduce dimensionality, decrease
development of methods that can detect malicious
training time and enhance learning performance .
insider behavior with high accuracy and low false
each set of query patterns. Each Transaction T is
3. OUR APPROACH denoted as
3.1 Basic Notations <Uid, Tid, <q1, q2, … qn>>
Large organisations deal with tremendous amount where
of data whose security is of prime interest. The data
in databases comprises of attributes describing real qi denotes the ith query, i ∈ [1 … n]
life objects called as entities. The attributes have
For example, suppose a user has id 1001. He/she
varying levels of sensitivity, i.e. not all attributes are
then executes the following set of SQL queries:
equally important to the integrity of database. As
an example, the signatures and other biometric q1: SELECT a,b,c
data are highly sensitive data attributes for a
FROM R1,R2
financial organisation like Bank in comparison to
others like name, gender etc. So, unauthorised WHERE R1.A>R2.B
access to the crucial attributes is of a greater
q2: SELECT P
concern. Only certain employees may have access
to such data elements and access by all others must FROM R5
be blocked instantaneously to ensure
Confidentiality and consistency of data. WHERE R5.P==10
< T1234,U1001,<R(Account_number),R(balance)>>
where,
𝑤
p(𝑎 ) =
∑ 𝑤
where, CID represents the cluster centroid, and of equal length n, the modified Jensen Shannon[27]
distance is computed as
{R} is a set of rules which is formed by taking the
union of all the rules that the members of the given 𝐷(𝑈𝑉 ||𝑈𝑉 )
fuzzy cluster abide by. 1 + 𝑝 (𝑎 ) ∗ 𝑤(𝑎 )
1 + 𝑝 (𝑎 ) ∗ 𝑤(𝑎 ) log
We have used Fuzzy c-means[26] clustering to ⎛ 1 + 𝑝 (𝑎 ) ∗ 𝑤(𝑎 ) ⎞
⎜ + ⎟
create cluster. Each user belongs to a cluster to a ⎜ ⎟
certain degree wij. 1 + 𝑝 (𝑎 ) ∗ 𝑤(𝑎 )
1 + 𝑝 (𝑎 ) ∗ 𝑤(𝑎 ) log
⎝ (1 + 𝑝 (𝑎 ) ∗ 𝑤(𝑎 ))⎠
Where: =
2
wij represents the membership coefficient of the ith
where, w(ai) is the semantic weight associated with
user (ui) with the jth cluster
the aith attribute
The centre of a cluster (α) is the mean of all points,
User profile generator: This module takes user
weighted by their membership coefficients[28].
vectors and the cluster profiles as input and
Mathematically,
generates user profiles. A user profile is of the form
𝑤 = Ui=<UID, < p(a1), p(a2), p(a3) … p(ak) >, < c1, c2, … cC >
|| ||
∑
| | >
where
As an Example:
where
Table 3.5 User profile for the given Example
n is the total number of users,
Consider a system with 4 fuzzy clusters and 4
C is the number of clusters, and attributes, the given table illustrates the profile of
m is the fuzzifier. user U1001.
Uid φ
The variation of precision, recall, TNR, accuracy Approach 1. Our approach using modified Jenson-
with the various thresholds, namely 𝛿1, 𝛿2, фUT , фLT Shanon distance and modified Jaccard index.
that were defined in section 3 is shown in the
Approach 2. Using unmodified Jaccard index with
following figures:
Jenson-Shanon distance.
Fig 6(c) shows the variation of Precision, recall, TNR,
Approach 3. Using Euclidean distance with
accuracy with 𝛿1. It can be observed from the graph
unmodified Jaccard index.
that Precision, TNR and Accuracy increase with the
S.No. PERFORMANCE FORMULA
MEASURE
1 TNR TN
TN + FP
2 Precision TP
TP + FP
3 Accuracy TP + TN
TN + FP + TP + FN
4 F1 Score 2 ∗ Precision ∗ Recall
Precision + Recall
5 PPV TP
TP + FP
6 ACC TP + TN
TP + TN + FP + FN
7 NPV TN
TN + FN
8 FDR FP
FP + TP
9 FOR FN
TN + FN
10 BM TPR + TNR – 1
11 FPR FP
FP + TN
12 FNR FN
FN + TP
13 MK PPV + NPV – 1
14 MCC TP × TN − FP × FN
(TP + FP)(TP + FN)(TN + FP)(TN + FN)
Table 1 ( Performance Measures)
In table 2 we have compared the three approaches If we compare Approach 1 with Approach 3, we
with each other. observe that:-
Sensitivity Approach 1 Approach 2 Approach 3 TNR and precision of Approach 1 is a lot better
Measures
than the TNR and precision for Approach 3
PPV 0.96 0.73 0.74
.It has also got better accuracy as compared to
TPR 0.81 0.95 1.00
Approach 3.
ACC 0.89 0.80 0.83
Approach 1 also has a much lower FPR and FDR
F1 Score 0.88 0.83 0.85 score as compared to Approach 3.
NPV 0.83 0.93 1.00 Amongst other performance measures, MK
FDR 0.04 0.27 0.26 and MCC values of Approach 1 are also slightly
FOR 0.17 0.07 0.00 better than that of Approach 3.
BM 0.77 0.60 0.65 Approach 3, on the other hand has got better
FPR 0.03 0.34 0.34 TPR, NPV and FOR measures as compared to
TNR 0.96 0.65 0.65 Approach 1. In fact, it has the best values for
FNR 0.19 0.05 0.00 these parameters in the entire table.
MK 0.79 0.66 0.74 Also, both Approach 1 and Approach 3 have got
MCC 0.78 0.63 0.70 somewhat similar F1 score.
Table 2 (Comparison of our approaches) In the measures like TNR and precision,where
Approach 1 has one of the best score in the entire
From the table, following observations can be
table, Approach 3 performs rather poorly. Also,
made:-
Approach 3 lags far behind in measures like FPR and
If we compare Approach 1 with Approach 2, we can FDR score. On the other hand, in the measures in
observe that:- which Approach 3 performs better than Approach
1, Approach 1 is also performing quiet nicely. For
TNR and FPR of Approach 1 is a lot better than example, in case of NPV, both the approaches have
the TNR and precision for Approach 2. good scores, with Approach 3 performing better.
Approach 1 has also got better accuracy as similar trends are observed in case of all other
compared to Approach 2. measures except FNR, where Approach 3 has is far
Approach 1 has a much lower FPR and FDR superior. Considering all the above scenario, we can
score as compared to Approach 2. say that the overall even though Approach 3 has the
Amongst other performance measures, MK best values for some performance measures, its
and MCC values of approach 1 are also better poor performance in other measures are clearly a
than that of Approach 2. disadvantage due to which Approach 1 is better
Approach 2, on the other hand has got better than Approach 3.
TPR, NPV and FOR measures as compared to
Approach 1.
Both Approach 1 and Approach 2 have got Table 3 shows a comparison of our approaches with
somewhat similar F1 score. various other related works. If we compare our
approach with other related approaches, we
In the measures like FPR and TNR where Approach
observe that:-
1 has good performnce, Approach 2 performs
rather poorly. However, in measures like TPR and In comparison to HU Panda, our approach
NPV, where Approach 2 performs better, Approach works better with respect to all the
1 also has good performance. For example, both performance measures considered for the
Approach 1 and Approach 2 have similar NPV scores purpose of comparison.
with Approach 2 performing slightly better As In comparison to the work of Mostafa et al. our
Approach 1 performs far better than Approach 2 in approach performs better with respect to all
most of the measures, we can conclude that the the performance measures that are considered
overall performance of Approach 1 is better than for comparison.
Approach 2.
Sensitivity Approach Approach Approach HU Panda Hashemi Mostafa Mina Majumdar EliSa UP Rao et
Measures 1 2 3 et al. et al. Sohrabi et al. (2006) Bertino al.(2016)
et al. et al.
PPV 0.96 0.73 0.74 0.88 0.97 0.94 0.93 0.88 0.94 0.61
TPR 0.81 0.95 1.00 0.73 0.71 0.75 0.66 0.70 0.91 0.70
ACC 0.89 0.80 0.83 0.81 0.84 0.85 0.80 0.80 0.93 0.64
F1 Score 0.88 0.83 0.85 0.79 0.82 0.83 0.77 0.78 0.92 0.65
NPV 0.83 0.93 1.00 0.77 0.77 0.79 0.73 0.75 0.91 0.68
FDR 0.04 0.27 0.26 0.12 0.03 0.06 0.07 0.13 0.06 0.39
FOR 0.17 0.07 0.00 0.23 0.23 0.21 0.27 0.25 0.09 0.32
BM 0.77 0.60 0.65 0.63 0.69 0.70 0.60 0.60 0.85 0.35
FPR 0.03 0.34 0.34 0.10 0.02 0.05 0.05 0.10 0.06 0.45
TNR 0.96 0.65 0.65 0.90 0.98 0.95 0.94 0.90 0.94 0.65
FNR 0.19 0.05 0.00 0.28 0.29 0.25 0.35 0.30 0.09 0.30
MK 0.79 0.66 0.74 0.65 0.74 0.73 0.66 0.63 0.85 0.29
MCC 0.78 0.63 0.70 0.63 0.72 0.71 0.63 0.61 0.85 0.29
In comparison to the work of Hashemi et al. and recall, both approaches have somewhat
even though our approach scores just a little similar score. Since our work is mostly related
less in measures like TNR and precision, it to finding Critical Data Items in a dataset,
scores a lot better with respect to rest of the higher TNR and precision scores are more
performance measures. desirable as compared to other performance
If we consider the work of Mina Sohrabi et al. measures. Since our approach performs quiet
our approach performs better with respect to well with respect to other performance
all the performance measures that are present measures as well, better TNR and precision
in the table. scores can easily cover up lower recall values.
In comparison to the work of Majumdar et al. 7. Analysis and Conclusion
our approach performs better with respect to
In this paper we have tried to detect malicious
all the performance measures that we have
transactions keeping in mind that certain data
considered for the purpose of comparison.
elements hold more critical information. We also
With comparison to the work of UP Rao et al.
take into consideration user behaviour pattern in
our approach performs better in context to all
this approach. A user regularly behaving as a
the measures that are considered in the table
normal user will be gradually improving his
for comparison.
suspicion score. We then analyse the approach
In comparison to the work of Elisa Bertino, our
w.r.t different parameters by conducting
approach gives better TNR and precision
experiments. Finally, we conclude that the
scores. It also gives comparatively better FDR
approach works efficiently in determining the
and FPR scores. In other measures, except TPR
nature of a transaction.
References
1. I-Yuan Lin ; Xin-Mao Huang ; Ming-Syan Chen “Capturing user access patterns in
the Web for data mining” Published in: IEEE ;Proceedings 11th International
Conference on Tools with Artificial Intelligence pp 9-11 Nov. 1999
2. R.S. Sandhu ; P. Samarati “ Access control: principle and practice” Published in:
IEEE Communications Magazine ( Volume: 32 , Issue: 9 , Sept. 1994 )
3. Denning, D.E. (1987) An Intrusion Detection Model. IEEE Transactions on Software
Engineering, Vol. SE-13, 222-232.
4. Knuth, Donald E., James H. Morris, Jr, and Vaughan R. Pratt. "Fast pattern
matching in strings." SIAM journal on computing 6.2 (1977): 323-350.
5. Wang, Ke. "Anomalous Payload-Based Network Intrusion Detection" . Recent
Advances in Intrusion Detection. Springer Berlin. doi:10.1007/978-3-540-30143-1_11
6. Douligeris, Christos; Serpanos, Dimitrios N. (2007-02-09). Network Security:
Current Status and Future Directions. John Wiley & Sons. ISBN 9780470099735.
7. Christina Yip Chung, Michael Gertz and Karl Levitt (2000), “DEMIDS: a misuse
detection system for database systems”, Integrity and internal control information
systems: strategic views on the need for control, Kluwer Academic Publishers,
Norwell, MA.
8. A. S. McGough, D. Wall, J. Brennan, G. Theodoropoulos, E. Ruck-Keene, B. Arief, et
al., "Insider Threats: Identifying Anomalous Human Behaviour in Heterogeneous
Systems Using Beneficial Intelligent Software (Ben-ware)," presented at the
Proceedings of the 7th ACM CCS International Workshop on Managing Insider
Security Threats, Denver, Colorado, USA, 2015.
9. S. D. Bhattacharjee, J. Yuan, Z. Jiaqi, and Y.-P. Tan, "Context-aware graph-based
analysis for detecting anomalous activities," presented at the Multimedia and Expo
(ICME), 2017 IEEE International Conference on, 2017.
10. P. A. Legg, O. Buckley, M. Goldsmith, and S. Creese, "Automated insider threat
detection system using user and role-based profile assessment," IEEE Systems
Journal, vol. 11, pp. 503-512, 2015.
11. I. Agrafiotis, A. Erola, J. Happa, M. Goldsmith, and S. Creese, "Validating an
Insider Threat Detection System: A Real Scenario Perspective," presented at the
2016 IEEE Security and Privacy Workshops (SPW), 2016.
12. T. Rashid, I. Agrafiotis, and J. R. C. Nurse, "A New Take on Detecting Insider
Threats: Exploring the Use of Hidden Markov Models," presented at the Proceedings
of the 8th ACM CCS International Workshop on Managing Insider Security Threats,
Vienna, Austria, 2016.
13. Zamanian Z., Feizollah A., Anuar N.B., Kiah L.B.M., Srikanth K., Kumar S. (2019)
User Profiling in Anomaly Detection of Authorization Logs. In: Alfred R., Lim Y.,
Ibrahim A., Anthony P. (eds) Computational Science and Technology. Lecture Notes
in Electrical Engineering, vol 481. Springer, Singapore
14. Yuqing Sun, Haoran Xu, Elisa Bertino, and Chao Sun. 2016. A Data-Driven
Evaluation for Insider Threats. Data Science and Engineering Vol. 1, 2 (2016), 73--85.
[doi>10.1007/s41019-016-0009-x]
15. S. Panigrahi, S. Sural and A. K. Majumdar, "Detection of intrusive activity in
databases by combining multiple evidences and belief update," 2009 IEEE
Symposium on Computational Intelligence in Cyber Security, Nashville, TN, 2009, pp.
83-90. doi: 10.1109/CICYBS.2009.4925094
[16] Yi Hu, Bajendra Panda, A data mining approach for database intrusion detection,
SAC '04 Proceedings of the 2004 ACM symposium on Applied computing Pages 711-
716, doi>10.1145/967900.968048
[17] Abhinav Srivastava , Shamik Sural , A. K. Majumdar, Weighted intra-
transactional rule mining for database intrusion detection, Proceedings of the 10th
Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining, April
09-12, 2006, Singapore [doi>10.1007/11731139_71]
18 TPC-C benchmark: http://www.tpc.org/tpcc/default.asp
19 Mina Sohrabi, M. M. Javidi, S. Hashemi, ”Detecting intrusion transactions in
database systems: a novel approach”, Journal of Intelligent Info Systems 42:619-644
DOI 10.1007 Springer 2014.
20 UP Rao et. al ,“Weighted Role Based Data Dependency Approach for Intrusion
Detection in Database”, International Journal of Network Security, Vol.19, No.3,
PP.358-370, May 2017 (DOI: 10.6633/IJNS.201703.19(3).05).
[21] R. Agrawal, T. lmieliiski, and A. Swami, ”Mining Association Rules between Sets
of Items in Large Databases”, in Proceedings of the 1993 ACM SIGMOD International
Conference on Management of data, 1993.
[22] Sattar Hashemi, Ying Yang,Davoud Zabihzadeh and Mohammadreza Kangavari,
“Detecting intrusion transactions in databases using data item dependencies and
anomaly analysis”, Article in Expert Systems 25(5):460-473 · November 2008 DOI:
10.1111/j.1468-0394.2008.00467.
[23] Mostafa Doroudian, Hamid Reza Shahriari, “A Hybrid Approach for Database
Intrusion Detection at Transaction and Inter-transaction Levels”, 6th Conference on
Information and Knowledge Technology (IKT 2014), May 28-30, 2014, Shahrood
University of Technology, Tehran, Iran.
24 E. Bertino, A. Kamra, E. Terzi and A. Vakali (2005), "Intrusion detection in RBAC
administered databases, " in Proceedings of the Applied Computer Security
Applications Conference (ACSAC).
25 Lee, V. C.S., Stankovic, J. A., Son, S. H. Intrusion Detection in Real-time Database
Systems Via Time Signatures. In Proceedings of the Sixth IEEE Real Time Technology
and Applications Symposium, 2000.
26. Weina Wang, Yunjie Zhang, Yi Li and Xiaona Zhang (2006), "The Global Fuzzy C-
Means Clustering Algorithm," 2006 6th World Congress on Intelligent Control and
Automation, Dalian, 2006, pp. 3604- 3607.
27. Fuglede, Bent; Topsøe, Flemming (2004). "Jensen-Shannon divergence and
Hilbert space embedding - IEEE Conference Publication". ieeexplore.ieee.org.
28. Dunn, J. C. (1973-01-01). "A Fuzzy Relative of the ISODATA Process and Its Use in
Detecting Compact Well-Separated Clusters". Journal of Cybernetics. 3 (3): 32–57.
doi:10.1080/01969727308546046. ISSN 0022-0280.
29. A. Mangalampalli and V. Pudi (2009), "Fuzzy association rule mining algorithm for
fast and efficient performance on very large datasets," 2009 IEEE International
Conference on Fuzzy Systems, Jeju Island, 2009, pp. 1163-1168
30. Vorontsov, I.E., Kulakovskiy, I.V. & Makeev, V.J. Algorithms Mol Biol (2013) 8: 23.
https://doi.org/10.1186/1748-7188-8-23 “ Jaccard index based similarity measure to
compare transcription factor binding site models”