Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Department: Computer
4) Abstract:
Phishing is the most unsafe criminal exercises in cyber space.
Since most of the users go online to access the services provided by
government and financial institutions, there has been a significant
increase in phishing attacks for the past few years. Phishers started to
earn money and they are doing this as a successful business. Various
methods are used by phishers to attack the vulnerable users such as
messaging, VOIP, spoofed link and counterfeit websites. It is very easy
to create counterfeit websites, which looks like a genuine website in
terms of layout and content. Even, the content of these websites would
be identical to their legitimate websites. The reasonfor creating these
websites is to get private data from users like account numbers, login
id, passwords of debit and credit card, etc. Moreover, attackers ask
security questions to answer to posing as a high level security measure
providing to users. When users respond to those questions, they get
easily trapped into phishing attacks. Many researches have been going
on to prevent phishing attacks by different communities around the
world. Phishing attacks can be prevented by detecting the websites
and creating awareness to users to identify the phishing websites.
Machine learning algorithms have been one of the powerful techniques
in detecting phishing websites.
Keywords —machine learning, B2B sales forecasting, sales prediction,
XGBoost regressor
A. Motivation: After collecting and integrating the dataset that is acquired, the
nature of data needs to be understood. For doing so correlations, general
trends and outliers need to be identified by calculating the mathematical
statistics (mean, median, range, standard deviation, etc.) for each feature. By
doing so the features that affect the sales of a product can be shortlisted.
Visualization techniques are also used to support the needed. After the
preliminary analysis missing, duplicate and inconsistent values are checked
for and corrected. The features that can be grouped together or those that are
not needed are filtered. Dimension analysis further enhances the feature
selection approach. If this step is not performed when a lot of unrequired
features would be analysed in the further steps and might produce a major
difference in the result obtained. This makes working with the dataset easier
and faults tolerant.
.
B. Data Cleaning: the nature of the data and finding a correlation between
different After understanding features and target variable i.e. sales. The
erroneous values in the dataset need to be replaced with values that make
sense, the missing values need to be replaced with appropriate numerical or
categorical value depending on the type of feature [5]. The redundant
information in the dataset is to be removed. This fills the gaps within the
dataset and makes it wholesome, which enables better results.
C. Feature Transformation: Data cleansing gives us a wholesome error-free
dataset to work with, Feature Transformation is the family of algorithms
used to create new features from existing features, in this we use a linear
combination of two or more features to make a new feature, this new feature
gives better results with respect to target variable i.e. sales. This system also
uses categorical feature transformation to numerical feature transformation.
The redundant features are dropped from the dataset for the new ones.