Sei sulla pagina 1di 17

The growth of Machine Learning in

cybersecurity: Evolution of processes.


Zachary Standridge

Middle Georgia State University

Zachary Standridge, School of Information Technology, Middle Georgia State University

Correspondence concerning this article should be addressed to Zachary Standridge,

School of Information Technology, Middle Georgia State University, Macon GA, 31206

Contact: zachary.standridge@mga.edu
Machine Learning cybersecurity

With the threats of system intrusion growing at a geometric rate with each day, the use of
a computerized system for automated security monitoring was inevitable. With phones,
machines, IOT devices that are all open to being taken over for nefarious reasons, the
signals are no longer coming from just us humans. Machine learning programs have been
developed to help the systems be more efficient and safe, but how do they learn the
difference between a user and an attacker? Unfortunately, programmers are not there yet,
and it should be stressed, yet. Machine Learning (ML) programs are just another tool in
the box of the Network Security personnel. The attackers are using ML programs for
attacking systems at the same time as ML programs are protecting it. The flaw is that there
will always be someone looking for a hole in the firewalls, in the algorithms, and just to
plainly bypass the security measures set down by the administration. As stated before, ML
is a tool. It is a tool which is very good at regression, prediction, and classification of big
data.

What exactly is Machine Learning (ML)? Machine learning is data analysis that streamlines
analytical model building. It is still AI in that it is a system which can learn from data, see
patterns, and learn how to make decisions without a human interaction first (sas, 2019).
ML is just a way of utilizing AI so that a protecting system is capable to see a history of
what happened, how we humans reacted, interpreting the present threats, and acting
accordingly. The theory of AI is not what we are going for with ML, the fantasy idea of
Skynet taking over the world by sending out robot Terminators is not the idea. Creators of
ML are just looking for ways to reduce efforts by making a machine do the biggest share
of the work analyzing data. ML is supposed to see the patterns by using historical data
instead of some programmer telling it what to do in each instance. With a system taking in
information all the time and seeing what needs to be done, each instance changes the
pre-existing coding and algorithms. That is Machine Learning.
Deep Learning is a system of techniques which ML uses to recognize patterns. Using
Deep Learning(DL) the system can identify the lines and shapes which specific objects
have. It sees the edges, the structure, the type, and finally it can identify the object for what
it is. There is more to DL than just that though, using Deep Neural Networks, such as
Deep Q Leaning, algorithms can be used to improve the speed and quality of seeing these
patterns.
See Figure 1.
The types of Machine Learning are varied in complexity and in their use by the situation
or environment which they are going to be deployed into. There are three groups of types
of learning. Supervised, Unsupervised, and Reinforced. These three types of ML can then
be broken down again in to more specific methods of learning which include: Regression,
Classification, Clustering, Association, Dimensionally reduction, and Generative models.
Regression is a just a task which takes past data and predicts the next value based on
that data. The data that has been taken in has to be used in some way to gain a new idea
or process. Regression has five different methods as well. Logical Regression is used to
classify binary targets. Logical regression is used primarily to determine yes or no
questions. Ordinal Logistic Regression is used to predict the ordinal dependent variable if
you have more than one other independent choice. It is the simplistic view of a multilinear
regression of binomial logistic regression. Ordinary Least Square (OLS) method is mainly
used to give a fair estimate of the parameters in a Linear regression model. It takes the
difference from an observed value and predicted values, then minimizing the sum for use
in the model. Regularization techniques are used to prevent overfitting by streamlining the
model’s performance.
Classification is used to for separating different things into different categories by
predicting the category which the data belongs in. This method was rudimentarily used in
the first spam filters. The supervised teaching method of teaching the machine which mail
was spam and which was legitimate changes the algorithm so that in the future, mail which
matches the ones that were sent to the spam folder are no longer even seen by the user
and directed directly to the junk folder, thus learning has happened. Supervised
classification are defined from the start and built upon as time goes by. In every event of
a monitored system there has to be a set of terms which are set down as primers to follow
in the future. (Nouns, Verbs, Adverbs) This IP address accessed the company’s asset
from this location at this time. Building this model works based on a score system and like
others, builds upon the past to become a better filter over time with access to new and old
data. Using the history of activities in a log, the machine can use data to correlate old data
and compare it to ongoing and in process activities before damage is done. Machine
learning for this type of filter includes: logistic regression (LR), K-Nearest neighbor(K-NN),
Support Vector Machine (SVM), Kernel SVM, and Naive Bayes. Logistic Regression-, and
K-Nearest Neighbor algorithms are algorithm models which come from the field of
statistics, as most do. Support Vector Machine is used for representing points on a map
so a clear distinction of the range is formed. When new data comes in, the points are over
lapped so that a prediction can be made. The problem with this algorithm is that unless
the data is given a name by a human, the machine cannot properly categorize it thus
rendering it useless. The most widely accepted algorithms that work with both
classification and decision methods are Decision Tree and Random Forest. These types
of algorithms are formed by setting one decision down and recording the historical
decisions made from the base options. The longer the list of options, the more detailed the
final process can be. The problem with type of algorithm is that with time, the choices
become too specific and are prone to falling out of out of bounds thus creating a
generalization error. Deep learning methods have shown to work best when the data pool
is considerably larger. The down side to them is that DL methods are notorious for
consuming more resources than ML methods and algorithms. When in production, re-
training the DL methods are going to take more time. Artificial and Convoluted Neural
Networks are based on biologic thinking processes. They bring together many different
methods and algorithms to classify the data given to them. These models work by having
simple assets to begin with called artificial neurons. These Neurons receive input, change
their internal coding structure based on that input, then send out data based on that input.
This process forms a new network with each data point accepted into the process creating
a weighted graph for mapping future possible data analysis. The weight and the functions
which are send to the process are modified by the rules set down when programmed in its
learning rules. Deep learning networks work better when more data is available because
of this adaptation from each data point.
Similar to classification, clustering uses the same methods to create models but are used
when the data coming into the algorithms is unknown. This type of method is best used in
investigative fields such as finance, health care, data science, and any investigative use.
This method uses unsupervised learning to find the data points which are not within the
range of the others without cause. Because clustering is more adept at grouping data
points with similar aspects together, this method is best for seeing where the data belongs
within a particular group and solving a particular problem without outside influence.
Algorithm models for ML clustering include: K-Means, Mixture Model (LDA), DBSCAN,
Bayesian, Gaussian Mixture Model, and Agglomerative models. The K-means model was
designed to be used in signal processing of data mining programs. The model takes in
data of observed clusters, seeks the closest mean and uses it as the prototype for the next
generation of model for new data. LDA, DBSCAN, and Bayesian models are used to define
the parameters of the model data. They take in data from regularly distributed data points
within a population and differentiates them into subgroups. Agglomerative modelling or
Hierarchical clustering takes in data using either an Agglomerative, from the top down, or
Divisive, from the bottom up, approach to clustering the data into hierarchal groups. Using
Deep Learning to cluster data a Self-Organized Maps (SOM) or Kohonen Network is used.
SOM and the Kokonen are used for taking in high dimensional data and sending an output
of low dimensional view for easier use. These networks were designed for classifying data
without any supervision. They use approach vectors for the input and organize the data to
the target vector.
Association Rules Learning is very important and used widely in market basket analysis.
This concept forms groups of random data points into similar smaller groups based on
their predetermined use. Think of the concept as a shopping center. All the data is under
one roof (the store), then the data can be separated into aisles such as the baking aisle,
the cereal aisle, and the canned foods aisle. Putting these different foods into similar
groups based on their use not only reduces shopping time but could be used to add to the
shopper’s interest in a similar item. Machine Learning algorithm models for this way of
classification include: Apriori, and FP Growth. Apriori uses prior knowledge of which data
points are frequently associated together then uses a Boolean association rule for
grouping the points into conjoining circles of knowledge. FP-Growth (Frequent Pattern
Growth) calculated the frequency of an item, identifies the data point, then uses a tree
structure to encode the data without putting the points into specific data point sets. This
could be seen as if you buy bread and milk, you usually buy butter. The DL networks for
this type of grouping include Restricted Boltzmann Machine (RBM) and Deep Belief
Network(DBN). A DBN consists of two different sets of variables, hidden and visible. For
this machine, the visible variables have to be inherently different from the hidden. Building
on the RBM, when associating together multiple machines, a Deep Belief Network is
created. These RMB stacks are all utilizing the hidden data inferred from the visible points
in each and every RBM and seeing associations within the points.
When dealing with overly complex systems, Dimensionality reduction as a classification is
needed. Many times unlabeled data points make it impossible for clustering since regular
models require the algorithm to restrict features or just don’t work. Using Dimensional
reduction, the number of functions and features are removed. This type of learning is
usually used in conjunction with other models for streamlining the process of organizing
data. This type of learning is most commonly used in phones and other monitoring devices
for facial detection and other cybersecurity aspects.

Machine learning dimensionality reduction

 Principal Component Analysis (PCA)


 Singular-value decomposition (SVD
 Linear Discriminant Analysis (LDA)
 Latent Semantic Analysis (LSA)
 Factor Analysis (FA)
 Independent Component Analysis (ICA)
 Non-negative Matrix Factorization (NMF
 Dimensionality reduction — or generalization, a task of searching common and most important

features in multiple examples.

 Generative models — a task of creating something based on the previous knowledge of the

distribution.

Approaches to Solving ML Tasks

Past ways

 Supervised learning. Task Driven approach. First of all, you should label data like feeding a

model with examples of executable files and saying that this file is malware or not. Based on this

labelled data, the model can make decisions about the new data. The disadvantage is the limit

of the labelled data.

 Ensemble learning. This is an extension of supervised learning while mixing different simple

models to solve the task. There are different methods of combining simple models.

Current ways

 Unsupervised Learning. Data Driven approach. The approach can be used when there are no

labelled data and the model should somehow mark it by itself based on the properties. Usually it

is intended to find anomalies in data and considered to be more powerful in general as it’s

almost impossible to mark all data. Currently it works less precisely than supervised approaches.

 Semi-supervised learning. As the name implies, semi-supervised learning tries to combine

benefits from both supervised and unsupervised approaches, when there are some labelled

data.

Possible new Future trends (well, probably)


Reinforcement learning. Environment Driven approach can be used when the behavior should somehow

react on the changing environment. It’s like a kid who is learning environment by trial and error.

Active learning. It’s more like a subclass of Reinforcement learning that probably will grow into a

separate class. Active learning resembles a teacher who can help correct errors and behavior in addition

to environment changes.

Machine Learning tasks and Cybersecurity

Regression

Classification

Clustering

Dimensionality Reduction

Generative Models

The task of generative models differs from the above-mentioned


ones. While those tasks deal with the existing information and
associated decisions, generative models are designed to simulate
the actual data (not decisions) based on the previous decisions.

The simple task of offensive cybersecurity is to generate a list of


input parameters to test a particular application for Injection
vulnerabilities.
Alternatively, you can have a vulnerability scanning tool for web
applications. One of its modules is testing files for unauthorized
access. These tests are able to mutate existing filenames to identify
the new ones. For example, if a crawler detected a file called
login.php, it’s better to check the existence of any backup or test its
copies by trying names like login_1.php, login_backup.php,
login.php.2017. Generative models are good at this.

Machine learning generative models

 Markov Chains
 Genetic algorithms

Deep learning generative models

 Variational Autoencoders
 Generative adversarial networks (GANs)
 Boltzmann Machines

Recently, GANs showed impressive results. They successfully


mimic a video. Imagine how it can be used for generating
examples for fuzzing.

Cybersecurity Tasks and Machine Learning

Instead of looking at ML tasks and trying to apply them to


cybersecurity, let’s look at the common cybersecurity tasks and
machine learning opportunities. There are three dimensions
(Why, What, and How).
The first dimension is a goal, or a task (e.g., detect threats, predict
attacks, etc.). According to Gartner’s PPDR model, all security
tasks can be divided into five categories:

 prediction;
 prevention;
 detection;
 response;
 monitoring.

The second dimension is a technical layer and an answer to the


“What” question (e.g., at which level to monitor issues). Here is
the list of layers for this dimension:

 network (network traffic analysis and intrusion detection);


 endpoint (anti-malware);
 application (WAF or database firewalls);
 user (UBA);
 process (anti-fraud).

Each layer has different subcategories. For example, network


security can be Wired,Wireless or Cloud. Rest assured thatyou
can’t apply the same algorithms with the same hyper parameters
to both areas, at least in near future. The reason is the lack of data
and algorithms to find better dependencies of the three areas so
that it’s possible to change one algorithm to different ones.

The third dimension is a question of “How” (e.g., how to check


security of a particular area):

 in transit in real time;


 at rest;
 historically;
 etc.

For example, if you are about endpoint protection, looking for the
intrusion, you can monitor processes of an executable file, do
static binary analysis, analyze the history of actions in this
endpoint, etc.

Some tasks should be solved in three dimensions. Sometimes,


there are no values in some dimensions for certain tasks.
Approaches can be the same in one dimension. Nonetheless, each
particular point of this three-dimensional space of cybersecurity
tasks has its intricacies.

It’s difficult to detail them all so let’s focus on the most important
dimension — technology layers. Look at the cybersecurity solution
from this perspective.

Machine learning for Network Protection

Network protection is not a single area buta set of different


solutions that focus on a protocol such as Ethernet, wireless,
SCADA, or even virtual networks like SDNs.

Network protection refers to well-known Intrusion Detection


System (IDS) solutions. Some of them used a kind of ML years ago
and mostly dealt with signature-based approaches.
ML in network security implies new solutions called Network
Traffic Analytics (NTA) aimed at in-depth analysis of all the traffic
at each layer and detect attacks and anomalies.

How can ML help here? There are some examples:

 regression to predict the network packet parameters and


compare them with the normal ones;
 classification to identify different classes of network attacks
such as scanning and spoofing;
 clustering for forensic analysis.

Machine learning for Endpoint Protection

The new generation of anti-viruses is Endpoint Detection and


Response. It’s better to learn features in executable files or in the
process behavior. Keep in mind that if you deal with machine
learning at endpoint layer, your solution may differ depending on
the type of endpoint (e.g., workstation, server, container, cloud
instance, mobile, PLC, IoT device). Every endpoint has its own
specifics but the tasks are common:

 regression to predict the next system call for executable process


and compare it with real ones;
 classification to divide programs into such categories as
malware, spyware and ransomware;
 clustering for malware protection on secure email gateways
(e.g., to separate legal file attachments from outliers).

Academic papers about endpoint protection and malware


specifically are gaining popularity. Here are a few examples:
 Malware Detection by Eating a Whole EXE
 Deep learning at the shallow end: Malware classification for
non-domain experts
 TESSERACT: Eliminating Experimental Bias in Malware
Classification across Space and Time
Machine learning for Application Security

Application securityis my favourite area, by the way, especially


ERP Security.

Where to use ML in app security? — WAFs or Code analysis, both


static and dynamic. To remind you, Application security can differ.
There are web applications, databases, ERP systems, SaaS
applications, micro services, etc. It’s almost impossible to build a
universal ML model to deal with all threats effectively in near
future. However, you can try to solve some of tasks.

Here are examples what you can do with machine learning for
application security:

 regression to detect anomalies in HTTP requests (for example,


XXE and SSRF attacks and auth bypass);
 classification to detect known types of attacks like injections
(SQLi, XSS, RCE, etc.);
 clustering user activity to detect DDOS attacks and mass
exploitation.
Machine learning for User Behavior

Thisstarted asSecurity Information and Event Management


(SIEM).
SIEM was able to solve numerous tasks if configured properly
including user behavior search and ML. Then the UEBA solutions
declared that SIEM couldn’thandlenew, more advanced types of
attacks and constant behavior change.

The market has accepted thepointthat a special solution is


required if the threats are regarded from the user level.

However, even UEBA tools don’t cover all things connected with
different user behavior. There are domain users, application users,
SaaS users, social networks, messengers, and other accounts that
should be monitored.

Unlike malware detection focusing on common attacks and the


possibility to train a classifier, user behavior is one of the complex
layers and unsupervised learning problem. As a rule, there is no
labelled dataset as well as any idea of what to look for. Therefore,
the task of creation a universal algorithm for all types of users is
tricky in user behavior area. Here are the tasks that companies
solve with the help of ML:

 regression to detect anomalies in User actions (e.g., login in


unusual time);
 classification to group different users for peer-group analysis;
 clustering to separate groups of users and detect outliers.
Machine learning for Process Behavior

The process area is the last but not least. While dealing with it, it’s
necessary to know a business process in order to find something
anomalous. Business processes can differ significantly. You can
look for fraud in banking and retail system, or a plant floor in
manufacturing. The two are totally different, and they demand a
lot of domain knowledge. In machine learning feature engineering
(the way you represent data to your algorithm) is essential to
achieve results. Similarly, features are different in all processes.

In general, there are the examples of tasks in the process area:

 regression to predict the next user action and detect outliers


such as credit card fraud;
 classification to detect known types of fraud;
 clustering to compare business processes and detect outliers.

Conclusion

There are moreareas left. I have outlined the basics. On the one
hand, machine learning is definitely not a silver-bullet solution if
you want to protect your systems. Undoubtedly, there are many
issues with interpretability (particularly for deep learning
algorithms), but humans also cannot interpret their own
decisions, right?

On the other hand,with the growing amount of data and


decreasing number of experts, ML is an only remedy. It works now
and will be mandatory soon. It is better to start right now.

Keep in mind, hackers are also starting to use ML in their attacks.


My next article will reveal how exactly attackers can utilize ML.
figure 1.

Data Features

data science

data mining Classical Programming

ML

Algorithms
Bibliography

SAS (2019.) retrieved March 29, 2019, from


https://www.sas.com/en_us/insights/analytics/machine-learning.html

(sas, 2019)

Juvonen, Antti, & Tuomo. (2014, October 28). Anomaly Detection Framework Using
Rule Extraction for Efficient Intrusion Detection. Retrieved from
https://arxiv.org/abs/1410.7709v1
(Juvonen, Antti, & Tuomo, 2014)

Loganathan, G., Samarabandu, J., & Wang, X. (2018). Sequence to Sequence Pattern
Learning Algorithm for Real-Time Anomaly Detection in Network Traffic. 2018 IEEE
Canadian Conference on Electrical & Computer Engineering (CCECE).
doi:10.1109/ccece.2018.8447597
(Loganathan, Samarabandu, & Wang, 2018)

Obst, O. (2013). Distributed Fault Detection in Sensor Networks using a Recurrent


Neural Network. Neural Processing Letters, 40(3), 261-273. doi:10.1007/s11063-013-
9327-4

(Obst, 2013)
Ogino, T. (2015). Evaluation of Machine Learning Method for Intrusion Detection
System on Jubatus. International Journal of Machine Learning and Computing, 5(2),
137-141. doi:10.7763/ijmlc.2015.v5.497
(Ogino, 2015)
Qayyum, A., Islam, M., & Jamil, M. (n.d.). Taxonomy of statistical based anomaly
detection techniques for intrusion detection. Proceedings of the IEEE Symposium on
Emerging Technologies, 2005. doi:10.1109/icet.2005.1558893
(Qayyum, Islam, & Jamil)
Shah, S. A., & Issac, B. (2018). Performance comparison of intrusion detection systems
and application of machine learning to Snort system. Future Generation Computer
Systems, 80, 157-170. doi:10.1016/j.future.2017.10.016
(Shah & Issac, 2018)

Thi, N. N., Cao, V. L., & Le-Khac, N. (2017). One-Class Collective Anomaly Detection
Based on LSTM-RNNs. Transactions on Large-Scale Data- a nd Knowledge-Centered
Systems XXXVI Lecture Notes in Computer Science, 73-85. doi:10.1007/978-3-662-
56266-6_4
(Thi, Cao, & Le-Khac, 2017)

Traffic Anomaly Detection. (2016). Network Security, 2016(6), 4. doi:10.1016/s1353-


4858(16)30055-1
(2016)
Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Evaluating effectiveness
of shallow and deep networks to intrusion detection system. 2017 International
Conference on Advances in Computing, Communications and Informatics (ICACCI).
doi:10.1109/icacci.2017.8126018
(Vinayakumar, Soman, & Poornachandran, 2017)
Zamani, Mahdi, Movahedi, & Mahnush. (2015, May 09). Machine Learning Techniques
for Intrusion Detection. Retrieved from https://arxiv.org/abs/1312.2177v2
(Zamani, Mahdi, Movahedi, & Mahnush, 2015)

Potrebbero piacerti anche