Sei sulla pagina 1di 14

Research Topics

Open High Quality E-Commerce Dataset


One of the major obstacles in advancing research in many fields is the availability
of standard datasets. With millions of traffics daily and petabytes of data,
Bukalapak can play the role as data provider for many research problems.
However, standard datasets should have high quality, carefully collected and
organized. Also, some of them may have to be annotated or labeled before others
can use.

Dataset: search keywords, products summary and description, product images,


complaints and resolutions, product reviews, chats, transactions, etc
Developing Bukalapak Annotation Platform
Today, the majority of successful machine learning solutions are based on
supervised learning. However, its reliance on labeled data has hindered
many from making significant progress. We aim to build a platform for
high quality data annotation (labeling) to help researchers and
practitioners collect training data for supervised learning.

Dataset: product images, search keywords, product descriptions/title


Training Set Expansion
Some problems may only have few training data. Can we still train a model
in such situation? One approach to this problem is by expanding the
training set automatically from few highly confident seeds. We are looking
into applying this method for machine learning problems in Bukalapak.

Dataset: product description, images, reviews


Automatic Tagging for Bukalapak Blogs
Bukalapak has produced numerous contents on their blog. Typically each
content have several tags which associate itself with related products or
entities. With this research, we want to automatically infer relevant tags
based on the information in the text.

Dataset: blog entries, product tags


Identifying Middlemen Using Transaction Graph
Dropshipping is a common practice in many e-commerce sites. Although it
is not prohibited,their existence does have impact to buyer’s experience. If
we can build a graph that models transactions in Bukalapak, identifying
dropshippers would be trivial. Having such graph would also be useful for
many other purposes such as fraud prevention, consumer targeting, etc.

Dataset: transactions, users


Product Categorization using Deep CNN Models
Many products in Bukalapak are located in incorrect categories. We want
to reduce the number of wrong category selection by suggesting the
correct category when a seller creates a product page. The product image
will be used as the signal for detection or classification model.

Dataset: product images, categories, google open image dataset


Developing Bukalapak Products Ontology
We believe that Semantic Web can provide a framework to enhance the
discoverability of our products and services. It may also increase the
reusability of product catalogs across the Web. This would not only benefit
us as a company but also research community and ecommerce industry in
large.

Dataset: products, GoodRelations ontology


Collaborative Filtering
We have evidence that recommendation can be made more accurate
through personalization. For many years, collaborative filtering has been
used for building personalized recommendation. With millions of users and
billions of recorded behavioral data, the challenge is how to make
collaborative filtering methods scalable.

Dataset: users by transactions, search patterns.


Learning-to-Rank for Recommender System
Although our current recommender system has already generated
significant GMV increase, we are continuously working to improve it.
Ranking is one of the most crucial and challenging problems to make
better recommendations. With the massive amount of data,
Learning-to-rank is a promising area that we want to explore.

Dataset: product detail views, user clicks


Product Discovery Sequence Prediction
User experience when searching for products can be enhanced if we are
able to predict what products that are highly likely to be viewed next.
These next n products are determined based on the sequence of previous
product views (similar to time series prediction). We can expand this
capability to develop infinite product discovery queue.

Dataset: product detail views


Open Topics
The list of topics that we mention here is not exhaustive. We are open if
you want to propose a new topic or area that we have not considered. Both
basic and applied research are welcome.
Contacts @Telegram, Email
Ibam: @iarief, ibam@bukalapak.com
Pray: @prynglh, prayana.galih@bukalapak.com
Diko: @mardiko, rahmatri.mardiko@bukalapak.com
Zulva: @zulvacupa, zulva.fachrina@bukalapak.com
Thank You

Potrebbero piacerti anche