Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
on the history of previously answered questions. A similarity metric is established and experts are detected by comparing their afnity with new questions in the system. Methods
based on Query Likelihood Language Model (Li and King
2010), TF-IDF (Riahi et al. 2012), pLSA (Xu, Ji, and Wang
2012), and probabilistic topic models (Liu, Liu, and Yang
2010; Riahi et al. 2012; Ni et al. 2012) have been proposed
using this general framework. Another text-based approach
poses expert detection as a classication problem where the
user space is split into two classes depending on their ability
to reply a specic question (Zhou, Lyu, and King 2012).
In our approach, we build a prediction model for users
based on the latent topics of replied questions and provided
answers. We use a supervised learning paradigm, where
topic assignments and prediction parameters are learned
concurrently by means of bayesian inference. Using this
method we are able to leverage together both textual features
and answers quality metrics.
Abstract
This paper presents a supervised bayesian approach
to model expertise in online forums with application
to question routing. The proposed method extends the
well-known sLDA model to the multi-task case, accounting for a supervised stage with multiple outputs
per document corresponding to the users of the system.
A study of the characteristics of real world data revealed
a number of challenges in the practical application of
this model, relevant to the research community.
110
Wd,n
Zd,n
n Nd
Yd,a
kK
dD
aA
(d,n)
Nk
+W
A
The sparsity of this particular dataset renders the application of linear regression modeling methods unfeasible. Nevertheless, we experimented with General Linear Models using several approaches (Ordinary Least Squares, Weighted
Least Squares, Multitask Learning and Poisson Regression).
In all the cases, the model performed signicantly better
than the TF-IDF baseline, but was clearly outperformed by a
second baseline based on popularity ranking, in which users
are ranked based exclusively on their number of kudos, independently of the question content.
Our next step will consider the utilization of zero inated
models for the regression stage. In particular, Zero Inated
Negative Binomial (ZINB) regression can be used to model
count variables with excessive number of zeros, which ts
perfectly the sparsity characteristics of our collection. This
method assumes that the excess of zeros is generated by a
separate process that is modeled independently, normally by
means of logistic regression.
a=1
The proposed approach provides a exible and powerful framework to model multiple independent outcomes in a
sLDA context, with the ability to incorporate not just textual
and quality features, but also a priori knowledge about users.
Although conceived for expert detection in CQA contexts,
this model presents a myriad of additional applications, such
as targeted advertising.
111
References
Blei, D. M., and McAuliffe, J. D. 2010. Supervised Topic
Models. ArXiv preprint arXiv:1003.0783.
Graber, J. B., and Resnik, P. 2010. Holistic sentiment analysis across languages: multilingual supervised latent Dirichlet
allocation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP
10, 4555. Stroudsburg, PA, USA: Association for Computational Linguistics.
Jurczyk, P., and Agichtein, E. 2007. Discovering authorities in question answer communities by using link analysis.
In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, CIKM
07, 919922. New York, NY, USA: ACM.
Li, B., and King, I. 2010. Routing questions to appropriate
answerers in community question answering services. In
Proceedings of the 19th ACM international conference on
Information and knowledge management, CIKM 10, 1585
1588. New York, NY, USA: ACM.
Liu, M.; Liu, Y.; and Yang, Q. 2010. Predicting best answerers for new questions in community question answering. In
Proceedings of the 11th international conference on Webage information management, WAIM10, 127138. Berlin,
Heidelberg: Springer-Verlag.
Liu, J.; Song, Y. I.; and Lin, C. Y. 2011. Competition-based
user expertise score estimation. In Proceedings of the 34th
international ACM SIGIR conference on Research and development in Information Retrieval, SIGIR 11, 425434.
New York, NY, USA: ACM.
Ni, X.; Lu, Y.; Quan, X.; Wenyin, L.; and Hua, B. 2012. User
interest modeling and its application for question recommendation in user-interactive question answering systems.
Information Processing & Management 48(2):218233.
Riahi, F.; Zolaktaf, Z.; Shaei, M.; and Milios, E. 2012.
Finding expert users in community question answering. In
Proceedings of the 21st international conference companion
on World Wide Web, WWW 12 Companion, 791798. New
York, NY, USA: ACM.
Xu, F.; Ji, Z.; and Wang, B. 2012. Dual role model for question recommendation in community question answering. In
Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval,
SIGIR 12, 771780. New York, NY, USA: ACM.
Zhang, J.; Ackerman, M. S.; and Adamic, L. 2007. Expertise networks in online communities: structure and algorithms. In Proceedings of the 16th international conference
on World Wide Web, WWW 07, 221230. New York, NY,
USA: ACM.
Zhou, T. C.; Lyu, M. R.; and King, I. 2012. A classicationbased approach to question routing in community question
answering. In Proceedings of the 21st international conference companion on World Wide Web, WWW 12 Companion, 783790. New York, NY, USA: ACM.
112