Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract: The Central Statistics Agency (BPS) is a government agency that runs in the household economic and social
needs. Every two years BPS conducts Susenas (National Socio-Economic Survey) to find out how to predict poverty
levels in Indonesia. Every year BPS is tasked with providing information on how the community is in their economic
and social fields. In this very rapid development, there are many methods to determine predictions of poverty levels.
one of them is with the rapid development of E-commerce in Indonesia and is able to determine the level of poverty in
Indonesia today. Therefore, the authors built an application to complement BPS in predicting poverty levels in an area,
namely the application of poverty rate prediction based on e-commerce data using the K-Nearest Neighbor method and
the selection of Information Thereotical Based features. This application was built using the waterfall model, using the
Python programming language and the MySQL database. With this application, it is expected to be able to complete
the BPS Census and Susenas in predicting poverty levels in an area.
.
Keywords: The Central Statistics Agency (BPS), waterfall, Python, MySQL, K-Nearest Neighbor, Information
Thereotical Based.
in Indonesia in the last 10 years increased by 17 2. Making an application displaying graphs from the
percent with a total number of e-commerce results of prediction of poverty levels based on e-
businesses reaching 26.2 million e-commerce units. commerce data by using the K-Nearest Neighbor
Over the past 4 years, the growth of e-commerce in learning machine and Information There based on
Indonesia has increased to reach 500 percent. statistics.
Besides these data, the greatest potential achieved
by the e-commerce industry in Indonesia is also 3. By using a feature selection algorithm, namely
influenced by online shopping styles, such as the Information Thereotical Based.
millennial generation [3].
2. Related Work
From the problems above, a solution can be taken to 2.1 Poverty
solve the National Socio-Economic Survey
(Susenas) which will cost a lot of time and is using Table 1 This table shows another method for predicting an
area’s poverty level
the kNN (K - Nearest Neighbor) machine learning
method and Information Theoretical based feature
selection. Machine Learning is used specifically to No Dat Method Country Result References
deal with predictions of poverty levels in an area aset
with e-commerce data that is being used by
Indonesians. E-commerce data is very used to 1 Sateli CNN to Malawi, Model A. Perez, C. Yeh,
predict poverty rates because it can be seen from the t predict Nigeria, VGG-F G. Azzari, M.
average history of purchases of goods or houses in Lands sunlight Rwanda, and Burke, D. Lobell,
an area. This e-commerce data was obtained from at 7, intensity Tanzania model and S. Ermon,
2000- classes and GBT2 “Poverty
Pulse Lab Jakarta - United Nations Global Pulse 2010 (0,1, or Uganda Prediction with
using a dataset taken from the OLX e-commerce 2) Public Landsat 7
platform (olx.com). By using this machine learning, Satellite Imagery
it can help in maximizing the level of poverty and Machine
prediction in e-commerce data to be very helpful, Learning,” no.
Nips, 2017.
more accurate in obtaining poverty data in Indonesia
and does not take longer and can complete a census Calculate V. Soto, V. Frias-
in predicting poverty levels in an area. 2 Cell Vector- Amerika Tessellat Martinez, J.
Phone For latin ion Virseda, and E.
Note, features Voronoi Frias-Martinez,
1.2 Formulations of Problem 2010 in each “Prediction of
BTS socioeconomic
Based on the exposure to sub-backgrounds, the (Base levels using cell
existing problems are: Transcei phone records,”
ver Lect. Notes
1. How to complete the results of the survey and Station) Comput. Sci.
census in an area in Indonesia so that it does not (including Subser.
Lect. Notes Artif.
take time and money based on E-Commerce data? Intell. Lect. Notes
Bioinformatics),
2. How to present the results of poverty data vol. 6787 LNCS,
prediction based on e-commerce data with the no. 1, pp. 377–
application of the K-Nearest Neighbor Neighbor and 388, 2011.
Information Thereotical based method?
, United
States,
Canada,
and
Czech
Figure 1 Shows the performance of the proposed method. This
shows that the Pre-Processing
2.2 K – Nearest Neighbor In the picture above, there are several processes
Table 2 This table shows a prediction in various problems using before entering into data processing in an E-
K-Nearest Neighbor method Commerce dataset-based application for predicting
poverty levels using the K-Nearest Neighbor method
and the Information Thereotical Based algorithm.
No Name and Problems Method Result
Year
1. Data Process
13 sum_price_motor 63 std_price_apt_rent
14 avg_price_motor 64 sum_sold_apt_rent
15 std_price_motor 65 avg_sold_apt_rent
16 sum_sold_motor 66 std_sold_apt_rent
17 avg_sold_motor 67 sum_viewer_apt_rent
18 std_sold_motor 68 avg_viewer_apt_rent
19 sum_viewer_motor 69 std_viewer_apt_rent
20 avg_viewer_motor 70 sum_buyer_apt_rent
21 std_viewer_motor 71 avg_buyer_apt_rent
22 sum_buyer_motor 72 std_buyer_apt_rent
23 avg_buyer_motor 73 sum_price_land_sell
24 std_buyer_motor 74 avg_price_land_sell
25 sum_price_rumah_sell 75 std_price_land_sell
26 avg_price_rumah_sell 76 sum_sold_land_sell
27 std_price_rumah_sell 77 avg_sold_land_sell
28 sum_sold_rumah_sell 78 std_sold_land_sell
29 avg_sold_rumah_sell 79 sum_viewer_land_sell
30 std_sold_rumah_sell 80 avg_viewer_land_sell
31 sum_viewer_rumah_sell 81 std_viewer_land_sell
32 avg_viewer_rumah_sell 82 sum_buyer_land_sell
33 std_viewer_rumah_sell 83 avg_buyer_land_sell
34 sum_buyer_rumah_sell 84 std_buyer_land_sell
35 avg_buyer_rumah_sell 85 sum_price_land_rent
36 std_buyer_rumah_sell 86 avg_price_land_rent
37 sum_price_rumah_rent 87 std_price_land_rent
38 avg_price_rumah_rent 88 sum_sold_land_rent
39 std_price_rumah_rent 89 avg_sold_land_rent
40 sum_sold_rumah_rent 90 std_sold_land_rent
41 avg_sold_rumah_rent 91 sum_viewer_land_rent
42 std_sold_rumah_rent 92 avg_viewer_land_rent
43 sum_viewer_rumah_rent 93 std_viewer_land_rent
44 avg_viewer_rumah_rent 94 sum_buyer_land_rent
45 std_viewer_rumah_rent 95 avg_buyer_land_rent
46 sum_buyer_rumah_rent 96 std_buyer_land_rent
47 avg_buyer_rumah_rent
48 std_buyer_rumah_rent From the features contained in the E-Commerce
49 sum_price_apt_sell dataset above, a feature selection will be performed
50 avg_price_apt_sell to find the highest feature data in the accuracy of
51 std_price_apt_sell data prediction.
52 sum_sold_apt_sell
53 avg_sold_apt_sell 2. Normalization
54 std_sold_apt_sell
After the data processing is carried out, it is then
55 sum_viewer_apt_sell
continued with the data normalization process in
56 avg_viewer_apt_sell
which the specified data will be scaled to equal data
57 std_viewer_apt_sell
values. In this normalization process, the Rescaling
58 sum_buyer_apt_sell
avg_buyer_apt_sell
method (min-max normalization) will be used.
59
std_buyer_apt_sell Following is the basic formula of the Rescaling
60
61 sum_price_apt_rent method.
62 avg_price_apt_rent
After making a feature selection by getting relevant At the data pre-processing stage, data cleaning
features, the data will then be entered into each will be carried out to eliminate ambiguous data
machine learning. The machine learning provided is that is not in line with expectations, disturbing
k-nearest neighbor. The following is an algorithm data such as -2 values and inconsistent data,
from KNN. which can hinder the next process. In this
m process will change the null value to 0.
2
𝑑(𝑥, 𝑦) = √∑(xi − yi )
i=1
2. Data Normalization Stage
The idea of this formula is from the Pythagorean After cleaning the data from a null value is
formula. changed to the number 0, then proceed to the
data normalization process. this process will
change all data to scale from 0-10.
𝑐 = √𝑎2 − 𝑏 2
*d (x, y) read the distance between x and y 4.2 Implementation of Feature Selection
Figure 4 Shows the performance of the proposed method. This Figure 6 Shows the performance of the proposed method. This
shows that the feature rangking result of DISR feature selection shows that the RMSE and RSquare of CIFE Feature Selection
5. Conclusion
After carrying out the stages of application
development with the chosen method (waterfall)
such as needs analysis, design, system design,
program code implementation and testing of Poverty
Prediction Applications based on E-Commerce Data
Using the KNN Method and Information Theoretical
Figure 9 Shows the performance of the proposed method. This
shows that the RMSE and RSquare graphic of MRMR Feature Based Algorithms:
Selection 1. This application meets the needs of users in
poverty prediction using the KNN method and
Information Theoretical Based algorithm.
3. Prediction Testing with DISR Feature 2. This application is able to display a graph of
Selection the results of prediction and the results of
poverty data based on e-commerce accuracy.
The following is an accuracy test with r2 and
rmse on Knn machine learning by using the References
DISR feature selection.
[1] B. P. Statistik, "Badan Pusat Statistik," [Online].
Available:
https://www.bps.go.id/subject/23/kemiskinan-dan-
ketimpangan.html. [Accessed 29 September 2019].
B. P. Statistik, "