Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 82
tales como PMI (Church y Hanks, papel que juegan fenómenos concretos a la hora
1990; Hearst 1992; Pantel y de elegir hipónimos e hiperónimos que sean
Pennacchiotti, 2006), medición de relevantes para un dominio de conocimiento.
entropía entre pares de palabras (Ryu En un plano de análisis lingüístico, la
y Choi, 2005), así como cálculos de mayoría de estos métodos se han enfocado en
vectores para medir la distancia encontrar nuevas instancias de hipónimos e
conceptual entre palabras (Ritter, hiperónimos a partir de un conjunto de
Soderland y Etzioni, 2009). instancias semilla, que sean reconocibles en
Como un mecanismo de apoyo para contextos oracionales (Hearst, 1992; Pantel y
corroborar si los candidatos a Pennacchiotti, 2006; Ritter, Soderland y
hipónimos e hiperónimos mantienen Etzioni, 2009; Ortega, Villaseñor y Montes,
una relación canónica, autores como 2007; Ortega et al., 2011). Sin embargo, no se
Hearst (1992), así como Ritter, ha considerado aún el potencial de relaciones de
Soderland y Etzioni (2009) emplean hiponimia que puede generar un hiperónimo en
la base léxica WordNet (Fellbaum, su función como núcleo de una frase nominal.
1998) como fuente de consulta. De acuerdo con lo expresado por Croft y Cruse
Una vez que se ha corroborado cuál es (2004), creemos que un hiperónimo unipalabra
el contenido de información que un más un rasgo semántico pueden generar
par de palabras comparte como hipónimos relevantes que den cuenta de la
hipónimos e hiperónimos, se pasa a estructura de un dominio de conocimiento, e
una evaluación para determinar el igualmente reflejar perspectivas de clasificación
grado de precisión & Recall (Van de un hiperónimo.
Rijsbergen, 1979) que se ha logrado Siguiendo esta idea, en este trabajo nos
alcanzar con el método empleado, enfocamos en frases nombre + adjetivo,
haciendo ajustes con una medida-F en teniendo en mente la función semántica que
caso de que sea requerido (Ortega, cumplen los adjetivos como unidades que
Villaseñor y Montes, 2007; Ortega et expresan y priorizan rasgos conceptuales, cuya
al., 2011). selección puede ser condicionada por el
dominio de conocimiento en el cual se
3 Problemas en la selección de manifiestan, como es el caso de la terminología
hipónimos e hiperónimos pertinentes a un médica.
dominio de conocimiento Consideramos entonces que si no se toma en
cuenta la observación hecha por Croft y Cruse,
Tomando en cuenta los métodos de extracción
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 83
donde un buen hipónimo no es un buen general (p. e., enfermedades gástricas versus
taxónimo de un hiperónimo, existe una enfermedades del estómago).
definición directa del hipónimo en términos del Retomando lo que plantea Demonte (1999),
hiperónimo más un rasgo semántico simple, p. existen dos clases de adjetivos que asignan
e.: propiedades a los nombres: los calificativos y
los relacionales. La diferencia entre los dos
Semental = Caballo macho consiste en el número de propiedades que cada
uno conlleva, así como la manera en que se
A pesar de esta situación, no es posible explicar vinculan con el nombre. Por un lado, los
por qué en algunos casos ciertos hipónimos con calificativos refieren a un rasgo constitutivo del
un rasgo semántico simple sí podrían nombre modificado, el cual exhibe o caracteriza
representar una buena taxonomía, y en otros una única propiedad física: color, forma,
casos no. Un ejemplo de esto es: carácter, etc.: libro azul, señora delgada,
Composicionalidad C (cuchara, w) = (de té, hombre simpático, y otros similares.
de café, de sopa…) Por otro lado, los relacionales se refieren a
La taxonomía anterior enfatiza la función del un conjunto de propiedades o características
objeto cuchara, por lo que puede ser relevante que puedan ser vinculadas a una entidad o
para algunos fines. Por otro lado, en el siguiente evento concreto, p. e.: puerto marítimo, vaca
caso: lechera, paseo campestre, etc.
Composicionalidad C (cuchara, w) = Dado lo anterior, nuestra propuesta consiste
(redonda, profunda, grande…) en enfocar la atención en los adjetivos
Tenemos rasgos que son conceptualmente relacionales y, para discernir éstos de los
simples y que muestran poca o nula utilidad calificativos, tomamos en cuenta las
para elaborar una clasificación del hiperónimo. observaciones hechas por Demonte para
Una pregunta que surge aquí es: ¿los rasgos diferenciarlos.
conceptualmente simples podrían ser
indicativos de hipónimos no relevantes? Si la
5.1 Composicionalidad semántica
respuesta es afirmativa, entonces es necesario
discernir si tal relación muestra rasgos La alternancia entre adjetivos relacionales y
conceptualmente simples o complejos, de modo calificativos se explica en términos de
que esto ayude a ubicar hipónimos que composicionalidad semántica. Para nuestros
expresen valoraciones generales, versus fines, entendemos la composicionalidad
aquellos que configuren una red conceptual semántica como un principio que regula la
jerarquizada subyacente en un dominio asignación de significados específicos a cada
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 84
este significado específico se debe a un proceso Por otro lado, si consideramos un adjetivo
de composicionalidad semántica, introducido relacional de la tabla 2, por ejemplo,
para establecer diferencias entre conceptos cardiovascular, tenemos que modifica también
relacionados. a un conjunto de nombres, como se muestra en
la tabla 3:
5.2 Hiperónimos y sus campos léxicos
Tabla 3. Nombres modificados por el adjetivo
El hiperónimo, dado su estatus de categoría relacional cardiovascular
genérica, puede estar en relación directa con C(wi,cardiovascular) C(wi,rara)
más de un modificador que refleje conceptos o efecto, problema, congreso, televisión, enfermedad,
categorías específicas (p. e., enfermedad función, evento, relación, complicación, infancia,
cardiovascular), o simplemente valoraciones examen, inestabilidad, niño, color, obesidad, mhc,
trastorno, enfermedad, nucleótido, sustancia,
sensibles al contexto (p. e., enfermedad rara). bypass, causa, beneficio, mutación, trastorno, grupo,
Así, para el caso del hiperónimo enfermedad, sistema, reparador, meconio, epistaxis,
encontramos un conjunto de 132 relaciones, de descompensación, cirugía, derecha, síndrome, cáncer,
las cuales, 76 pueden considerarse relevantes operación, mortalidad, alelo, forma, caso, párpado
aparato, educación,
(58%). Si consideramos una medida de síntoma, eficiencia,
asociación como la información mutua puntual episodio, riesgo,
estandarizada (PMI) propuesta por Bouma investigación,
(2009), que tradicionalmente se ha usado en la manifestación, afección,
extracción de colocaciones, tenemos que las 10 medicamento, director,
muerte, salud
relaciones más relevantes se encuentran en la
tabla 1:
Ergo, tenemos que tanto el hiperónimo como el
Tabla 1. Adjetivos con PMI más alta adjetivo, sea relacional o calificativo, pueden
C(enfermedad, wi) PMI estar vinculados con otros elementos, situación
Transmisible 0.59 que evidencia cómo opera aquí el principio de
Prevenible 0.52
Diarreica 0.45
composicionalidad, restando Precision a las
Diverticular 0.44 medidas de asociación para detectar relaciones
Indicadora 0.41 útiles.
Autoinmunitaria 0.39
Aterosclerótica 0.39 6 Heurísticas lingüisticas para el
Meningococica 0.39
Cardiovascular 0.38
filtrado de hipónimos relevantes
Pulmonar 0.37 Con la finalidad de obtener una lista de paro de
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 85
Hiperónimo PR RRs P
(Schmid, 1994). Enfermedad 132 76 58
Infección 125 69 55
7.3 Extracción automática de CDs Tratamiento 112 39 35
Vacuna 79 41 52
Siguiendo la metodología propuesta por Sierra Problema 67 40 60
et al. (2010), así como Acosta, Sierra y Aguilar Afección 64 38 59
(2011), extraemos un conjunto de hiperónimos Trastorno 61 45 74
más frecuentes detectados en contextos Examen 60 33 55
Dolor 54 26 48
definitorios (o CDs), y los llevamos a una
Célula 47 22 47
segunda etapa de extracción de hipónimos,
considerando únicamente adjetivos como
modificadores del hiperónimo. 8.2 Ranking por PMI
Consideramos la medida PMI, una versión
estandarizada propuesta por Bouma (2009),
7.4. Elaboración de una lista de paro de
cuya normalización obedece a dos cuestiones
adjetivos no relevantes fundamentales: usar medidas de asociación
En este punto asumimos que los adjetivos cuyos valores tengan una interpretación fija y
calificativos presentan rasgos conceptualmente reducir la sensibilidad a frecuencias bajas de
simples y de poca utilidad para generar ocurrencia de datos. La fórmula de PMI
hipónimos relevantes. Dado lo anterior, normalizada es la siguiente:
obtenemos automáticamente un conjunto de
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 86
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 87
Church. K., y Hanks, P. 1990. Word Association Approach to Taxonomy Extraction for Ontology
Norms, Mutual information and Lexicography. Learning. En: Ontology Learning from Text:
Computational Linguistics, 16(1): 22-29. Methods, Evaluation and Applications, páginas
Demonte, V. 1999. El adjetivo. Clases y usos. La 15-28, IOS Press (Amsterdam).
posición del adjetivo en el sintagma nominal. En: Saurí, R. 1997. Tractament lexicogràfic dels
Gramática descriptiva de la lengua española, adjectius: aspectes a considerar. Papers de
Vol. 1, Cap. 3, .páginas 129-215, Espasa-Calpe l'IULA: Monografies, Universitat Pompeu Fabra
(Madrid). (Barcelona).
Galicia, S. y Gelbukh, A. 2007. Investigaciones en Schmid, H. 1994. Probabilistic Part-of-Speech
análisis sintáctico del español. Instituto Tagging Using Decision Trees. En: Proceedings
Politécnico Nacional (México DF). of International Conference of New Methods in
Girju, R., A. Badulescu, y D. Moldovan. 2006. Language: www.ims.uni-
Automatic Discovery of Part–Whole Relations. stuttgart.de~schmid.TreeTagger.
Computational Linguistics, 32(1): 83-135. Sierra G., R. Alarcón, C. Aguilar, y C. Bach. 2010.
Hearst, M. 1992. Automatic Acquisition of Definitional verbal patterns for semantic relation
Hyponyms from Large Text Corpora. En: extraction. En: Probing Semantic Relations:
Proceedings of COLING-92, páginas 539-545, Exploration and Identification in Specialized
Nantes (Francia). Texts, páginas 73-96. John Benjamins Publishing
Jackendoff, R. 2002. Foundations of Language: (Amsterdam/Philadelphia).
Brain, Meaning, Grammar, Evolution. Oxford Snow, R., D. Jurafsky, y A. Ng. 2006. Semantic
University Press (Oxford, UK). Taxonomy Induction from Heterogeneous
Kilgarriff, A., P. Rychly, P. Smrz, y D. Tugwell. Evidence. En: Proceedings of the 21st
2004. The Sketch Engine. En: Proceedings of International Conference on Computational
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 88
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
2nd International Workshop on Exploiting Large
Knowledge Repositories (E‐LKR)
1st International Workshop on Automatic Text
Summarization for the Future (ATSF)
Organizadores:
Ernesto Jiménez‐Ruiz (University of Oxford)
María José Aramburu (Universitat Jaume I)
Roxana Dánger (Imperial College London)
Antonio Jimeno‐Yepes (National Library of Medicine, USA)
Horacio Saggion (Universitat Pompeu Fabra)
Elena Lloret (Universidad de Alicante)
Manuel Palomar (Universidad de Alicante)
The main goal of this workshop is to bring together researchers that are working on the
creation of new LKRs on any domain, or on their exploitation for specific information
processing tasks such as data analysis, text mining, natural language processing and
visualization, as well as for knowledge engineering issues, like knowledge acquisition,
validation and personalization.
Research, demo and position papers showing the benefits that exploiting LKRs can bring to the
information processing area will be especially welcome to this workshop.
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 90
Information on the Web is constantly updated sometimes without any quality‐control, an
important proportion of the information being informal and ephemeral, a typical example
being that of opinions and messages on the Internet.
What techniques can be used to produce appropriate summaries in this context?
How to measure relevance of ill‐formed input?
How to produce understandable summaries from noisy texts? How to identify the
most relevant information in a set of opinions?
High quality documentation such as technical/scientific articles and patents, has not received
all the attention that the field deserves. Given the explosion of technical documentation
available on the Web and in intranets, scientist and research and development facilities face a
true scientific information deluge: summarization should be a key instrument not only for
reducing the information content but also for measuring information relevance in context,
providing to users adequate answers in context.
What techniques can be used to extract knowledge from complex technical
documents?
How to compile back the information in a well formed summary?
How to measure relevance in a network of scientific articles, beyond mere citation
counts?
Another summarization research topic lying behind is non‐extractive summarization, the
generation of a concise summary which is not a set of sentences from the input. This is a very
difficult problem since summarization systems must be able to easily adapt from one domain
to another in order to recognize what is important and how to produce a coherent text from a
textual or conceptual representation.
The workshop “Automatic Text Summarization of the Future” aims to bringing together
researchers and practitioners of natural language processing to address the aforementioned
and related issues.
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
Los artículos completos de este taller han sido publicados en: http://ceur‐ws.org/Vol‐882/
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
PROGRAMA
A Challenge for Automatic Text Summarization
Leo Wanner, ICREA and DTIC, UPF
Towards an ontology based large repository for managing heterogeneous knowledge
resources
Nizar Ghoula, Gilles Falquet
Enhancing the expressiveness of linguistic structures
José Mora, José A. Ramos, Guadalupe Aguado de Cea
Integrating large knowledge repositories in multiagent ontologies
Herlina Jayadianti, Carlos B. Sousa Pinto, Lukito Nugroho, Paulus Insap Santosa
A proposal for a European large knowledge repository in advanced food composition tables
for assessing dietary intake
Oscar Coltell, Francisco Madueño, Zoe Falomir, Dolores Corella
Redundancy reduction for multi‐document summaries using A* search and discriminative
training
Ahmet Aker, Trevor Cohn, Robert Gaizauskas
A dependency relation‐based method to identify attributive relations and its application in
text summarization
Shamima Mithun, Leila Kosseim
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
Short Papers
Using biomedical databases as knowledge sources for large‐scale text mining
Fabio Rinaldi
Exploiting the UMLS metathesaurus in the ontology alignment evaluation initiative
Ernesto Jiménez‐Ruiz, Bernardo Cuenca Grau, Ian Horrocks
Statements of interest
KB_Bio_101: a repository of graph‐structured knowledge
Vinay K Chaudhri, Michael Wessel, Stijn Heymans
If it's on web it's yours!
Abdul Mateen Rajput
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
TASS ‐ Taller de Análisis de Sentimientos
en la SEPLN
Organizadores:
Julio Villena (Daedalus, SA)
Sara Lana (Universidad Politécnica de Madrid)
Alfonso Ureña (Universidad de Jaén)
According to Merriam‐Webster dictionary, reputation is the overall quality or character of a
given person or organization as seen or judged by people in general, or, in other words, the
general recognition by other people of some characteristics or abilities for a given entity.
Specifically, in business, reputation comprises the actions of a company and its internal
stakeholders along with the perception of consumers about the business. Reputation affects
attitudes like satisfaction, commitment and trust, and drives behaviour like loyalty and
support. In turn, reputation analysis is the process of tracking, investigating and reporting an
entity's actions and other entities' opinions about those actions. It covers many factors to
calculate the market value of reputation. Reputation analysis has come into wide use as a
major factor of competitiveness in the increasingly complex marketplace of personal and
business relationships among people and companies.
Currently market research using user surveys is typically performed. However, the rise of social
media such as blogs and social networks and the increasing amount of user‐generated
contents in the form of reviews, recommendations, ratings and any other form of opinion, has
led to creation of an emerging trend towards online reputation analysis. The so‐called
sentiment analysis, i.e., the application of natural language processing and text analytics to
identify and extract subjective information from texts, which is the first step towards the
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
online reputation analysis, is becoming a promising topic in the field of marketing and
customer relationship management, as the social media and its associated word‐of‐mouth
effect is turning out to be the most important source of information for companies and their
customers' sentiments towards their brands and products.
Sentiment analysis is a major technological challenge. The task is so hard that even humans
often disagree on the sentiment of a given text. The fact that issues that one individual finds
acceptable or relevant may not be the same to others, along with multilingual aspects, cultural
factors and different contexts make it very hard to classify a text written in a natural language
into a positive or negative sentiment. And the shorter the text is, for example, when analyzing
Twitter messages or short comments in Facebook, the harder the task becomes.
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 93
sentiment analysis based on short text opinions extracted from social media messages
(specifically Twitter) published by a series of representative personalities.
The challenge task is intended to provide a benchmark forum for comparing the latest
approaches in this field. In addition, with the creation and release of the fully tagged corpus,
we aim to provide a benchmark dataset that enables researchers to compare their algorithms
and systems.
PROGRAMA
Overview of TASS 2012 ‐ Workshop on Sentiment Analysis at SEPLN
Julio Villena‐Román, Janine García‐Morera, Cristina Moreno‐García, Linda Ferrer‐Ureña, Sara
Lana‐Serrano, José Carlos González‐Cristóbal, Adam Westerski, Eugenio Martínez‐Cámara, M.
Ángel García‐Cumbreras, M. Teresa Martín‐Valdivia, L. Alfonso Ureña‐López .......................... 94
TASS: Detecting Sentiments in Spanish Tweets
Xabier Saralegi Urizar, Iñaki San Vicente Roncal ...................................................................... 103
Techniques for Sentiment Analysis and Topic Detection of Spanish Tweets: Preliminary
Report
Antonio Fernández Anta, Philippe Morere; Luis Núñez Chiroque, Agustín Santos .................... 112
The L2F Strategy for Sentiment Analysis and Topic Classification
Fernando Batista, Ricardo Ribeiro ............................................................................................. 125
Sentiment Analysis of Twitter messages based on Multinomial Naive Bayes
Alexandre Trilla, Francesc Alías ................................................................................................. 129
UNED at TASS 2012: Polarity Classification and Trending Topic System
Tamara Martín‐Wanton, Jorge Carrillo de Albornoz ................................................................. 131
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
UNED @ TASS: Using IR techniques for topic‐based sentiment analysis through divergence
models
Angel Castellano González, Juan Cigarrán Recuero, Ana García Serrano ................................. 140
SINAI en TASS 2012
Eugenio Martínez Cámara, M. Angel García Cumbreras, M. Teresa Martín Valdivia, L. Alfonso
Ureña López ............................................................................................................................... 147
Lexicon‐Based Sentiment Analysis of Twitter Messages in Spanish
Antonio Moreno‐Ortiz, Chantal Pérez‐Hernández .................................................................... 156
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
TASS - Workshop on Sentiment Analysis at SEPLN
Abstract: This paper describes TASS, an experimental evaluation workshop within SEPLN to
foster the research in the field of sentiment analysis in social media, specifically focused on
Spanish language. The main objective is to promote the application of existing state-of-the-art
algorithms and techniques and the design of new ones for the implementation of complex
systems able to perform a sentiment analysis based on short text opinions extracted from social
media messages (specifically Twitter) published by representative personalities. The paper
presents the proposed tasks, the contents, format and main statistics of the generated corpus, the
participant groups and their different approaches, and, finally, the overall results achieved.
Keywords: TASS, reputation analysis, sentiment analysis, social media.
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 95
about those actions. It covers many factors to The main objective is to improve the
calculate the market value of reputation. existing techniques and algorithms and design
Reputation analysis has come into wide use new ones in order to perform a sentiment
as a major factor of competitiveness in the analysis in short text opinions extracted from
increasingly complex marketplace of personal social media messages (specifically Twitter)
and business relationships among people and published by a series of important personalities.
companies. The challenge task is intended to provide a
Currently market research using user benchmark forum for comparing the latest
surveys is typically performed. However, the approaches in this field. In addition, with the
rise of social media such as blogs and social creation and release of the fully tagged corpus,
networks and the increasing amount of user- we aim to provide a benchmark dataset that
generated contents in the form of reviews, enables researchers to compare their algorithms
recommendations, ratings and any other form of and systems.
opinion, has led to creation of an emerging
trend towards online reputation analysis. 2 Description of tasks
The so-called sentiment analysis, i.e., the
Two tasks are proposed for the participants in
application of natural language processing and
this first edition: sentiment analysis and
text analytics to identify and extract subjective
trending topic coverage.
information from texts, which is the first step
Groups may participate in both tasks or just
towards the online reputation analysis, is
in one of them.
becoming a promising topic in the field of
Along with the submission of experiments,
marketing and customer relationship
participants are encouraged to submit a paper to
management, as the social media and its
the workshop in order to describe their systems
associated word-of-mouth effect is turning out
to the audience in a regular workshop session
to be the most important source of information
together with special invited speakers.
for companies and their customers' sentiments
Submitted papers are reviewed by the program
towards their brands and products.
committee.
Sentiment analysis is a major technological
challenge. The task is so hard that even humans
often disagree on the sentiment of a given text. 2.1 Task 1: Sentiment Analysis
The fact that issues that one individual finds This task consists on performing an automatic
acceptable or relevant may not be the same to sentiment analysis to determine the polarity of
others, along with multilingual aspects, cultural each message in the test corpus.
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
factors and different contexts make it very hard The evaluation metrics to evaluate and
to classify a text written in a natural language compare the different systems are the usual
into a positive or negative sentiment. And the measurements of precision (1), recall (2) and F-
shorter the text is, for example, when analyzing measure (3) calculated over the full test set, as
Twitter messages or short comments in shown in Figure 1.
Facebook, the harder the task becomes.
Within this context, TASS2, which stands (1)
for Taller de Análisis de Sentimientos en la
SEPLN (Workshop on Sentiment Analysis at
SEPLN, in English) is an experimental (2)
evaluation workshop, organized as a satellite
event of the SEPLN 2012 Conference, held on
September 7th, 2012 in Jaume I University at (3)
Castellón de la Plana, Comunidad Valenciana,
Spain, to promote the research in the field of
Figure 1: Evaluation metrics
sentiment analysis in social media, initially
focused on Spanish, although it could be
extended to any language. 2.2 Task 2: Trending topic coverage
In this case, the technological challenge is to
2 build a classifier to identify the topic of the text,
http://www.daedalus.es/TASS
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 96
and then apply the polarity analysis to get the same time. Peter is a very good
assessment for each topic. friend but I cannot stand John
The evaluation metrics are the same as in COnsidered NEU with DI SAGREEMENT where
Task 1 (Figure 1). Peter is regarded as P+ andJohn as N+.
On the other hand, a selection of a set of 1O
3 Corpus tapies has been made based on the thematic
areas covered by the corpus, such as politics,
The corpus provided to part1c1pants contains
(política), soccer (fútbol), literature (literatura)
over 70 000 tweets, written in Spanish by
or entertainment (entretenimiento).
nearly 200 well-known personalities and
Each message of the corpus has been
celebrities of the world of politics, economy,
semiautomatically assigned to one or several of
communication, mass media and culture
these tapies (most messages are associated to
between November 2011 and March 2012'
just one topic, due to the short length of the
Although the context of extraction has a Spain-
text).
focused bias, the diverse nationality of the
This tagged corpus has been divided into
authors, including people from Spain, Mexico,
two sets: training and test. The training corpus
Colombia, Puerto Rico, USA and many other
was released along with the corresponding tags
countries, makes the corpus reach a global
so that participants may train and validate their
coverage in the Spanish-speaking world.
models for classification and sentiment
Each Twitter message includes its ID
analysis. The test corpus was provided without
(t witi d ), the creation date ( d ate) and the user
any tag and was used to evaluate the results
ID (user).
provided by the different systems.
Due to restrictions in the Twitter API Terms
Table 1 shows a summary of the training
of Service 3 , it is forbidden to redistribute a
data provided to participants.
corpus that includes text contents or
information about users. However, it is valid if
Attribute Value
those fields are removed and instead IDs Twits 7 219
(including Tweet IDs and user IDs) are Topics 10
provided. The actual message content can be Twit languages 1
easily obtained by making queries to the Users 154
Twitter API using the t wi ti d. In addition U ser types 3
using the user ID, it is possible to extrae~ U ser languages 1
information about the user name, registration Date start 2011-12-02 T00:47:55
date, geographical information of his/her Date end 2012-04-10 T23:40:36
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 97
<twit>
The list of tapies that have been selected is <t:.wi tid>O000000000 < / twi t. id>
<copie::~>
Economy (economía) <Copie>entre t e n i mi e n t o</ copie>
</ copies>
Music (música) </ CWiC>
Soccer (fútbol)
<Cwit>
Films (cine) <cwi cid>O000000001 < / cwi cid>
Technology (tecnología) <u ser>u s u ari o1 </ user>
<concenc x ! [CDATA[ '0Py0 contará ca::~i ::~equro con grupo gracia::~
Sports (deportes) al Foro Asturias . ] ] >< / con cenc>
Literature (literatura) <dace>2011- 1 2-02TOO : 21 : 01< / dace>
<lang>es </ lang>
<sencimenc::~>
<polaric y >
<valu e>P</ valu e>
Table 3: Topic list <Cype>AGREEMENT</ Cype>
</ polaricy>
<polaric y >
<enticy>OPyD< / e n tity>
The corpus is encoded in XML as defined by <valu e>P< / valu e>
() date xsd:string
() Jang xsd:language
Figure 3: Sample twits
~) .,..•_nt_' _(o.._11_ _ _ _. . . _ _ - - .
t t: sent im
~ -eJ () ¡ _ _"....;
polarity,.:.ll_.:;_ "Y;_ pe_
Po_
I"...;.''Y_~_J.......]_--.
~ ~ ~ () entity [0 .. 1} xsd:string
J </: ;Jj() t opk [1.'] xsd:string } task(s) in order to obtain the corpus.
Results should be submitted in a plain text
file with the following format:
Figure 2: XML schema t wi t i t \ t p o larity \ t t o pi c
Two sample twits are shown in Figure 3. Where t witid is the twit ID for every
The second one is tagged with both the message in the test corpus, the p o lar i t y
global polarity of the message and the polarity contains one of the 6 valid tags (P+, P, NEU, N,
associated to each one of the entities that N+ and NONE), and the same for t o pi c .
appears in the text ( UPyD and Foro Although the polarity level must be
Asturias classified into those levels and the results were
with the global polarity as the text contains no primarily evaluated for the 5 of them, the
mentions to any entity. evaluation results also include metrics that
The full corpus will be made public after the consider just 3 levels (positive , neutral and
workshop so that any group interested in the negative).
field of sentiment analysis in Spanish can use it. Participants could submit results for one or
both tasks. Several results for the same task
were allowed too.
15 groups registered and finally 8 groups
sent their submissions for one of the two tasks.
The list of active groups is shown in Table 4.
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 98
All of thern subrnitted results for the sentirnent They present an interesting cornparative
analysis tasks and rnost of thern (6 out of 8, analysis of different approaches and
75%) participated in both tasks. classification techniques for these problerns
using the provided corpus of Spanish tweets.
[ Gr oup Tas k l Tas k 2 1 The data is preprocessed using well-known
E lhu ya r Fun daz i oa Yes No techniques and tools proposed in the literature,
IMD EA Yes Yes together with others specifically proposed here
L2F- IN ESC Yes Yes that take into account the characteristics of
La Sa ll e- URL Yes Yes Twitter.
LSI UNED Yes Yes Then, popular classifiers have been used (in
LSI UNED 2 Yes Yes particular, all classifiers of WEKA have been
SIN AI- UJAE N Yes Yes evaluated). Their report describes sorne of the
UM A Yes No results obtained in their prelirninary research.
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 99
dataset and it achieves maximum macro each tweet but, as an alternative, the named
averaged F1 measure rate of 36.28%. entities or adjectives detected as well.
Results show that modeling the tweets set
4.5 LSI – UNED using named entities and adjectives improves
the final precision results and, as a
(Martín-Wanton and Carrillo de Albornoz,
consequence, their representativeness in the
2012) presents the participation of the UNED
model compared with the use of common
group in TASS.
terms.
For polarity classification, they propose an
General results are promising (fifth and
emotional concept-based method. The original
fourth position in each of the proposed tasks),
method makes use of an affective lexicon to
indicating that an IR and language models
represent the text as the set of emotional
based approach may be an alternative to other
meanings it expresses, along with advanced
classical proposals focused on the application
syntactic techniques to identify negations and
of classification techniques.
intensifiers, their scope and their effect on the
emotions affected by them.
4.7 SINAI – Universidad de Jaén
Besides, the method addresses the problem
of word ambiguity, taking into account the The participation of the SINAI research group
contextual meaning of terms by using a word of the University of Jaén is described in
sense disambiguation algorithm. (Martínez Cámara et al., 2012).
On the other hand, for topic detection, their For the first task, they have chosen a
system is based on a probabilistic model supervised machine learning approach, in which
(Twitter-LDA). They first build for each topic they have used SVM for classifying the
of the task a lexicon of words that best describe polarity. Text features included are unigrams,
it, thus representing each topic as a ranking of emoticons, positive and negative words and
discriminative words. Moreover, a set of events intensity markers.
is retrieved based on a probabilistic approach In the second task, they have also used SVM
that was adapted to the characteristics of for the topic classification but several bags of
Twitter. words (BoW) have been used with the goal of
To determine which of the topics improving the classification performance.
corresponds to each event, the topic with the One BoW has been obtained using Google
highest statistical correlation was obtained AdWordsKeyWordTool, which allows to enter
comparing the ranking of words of each topic a term and directly returns the top N related
and the ranking of words most likely to belong concepts. The second BoW has generated based
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
to the event. on the hash tags of the training tweets, per each
The experimental results achieved show the category.
adequacy of their approach for the task, as
shown later. 4.8 Universidad de Málaga (UMA)
(Moreno-Ortiz and Pérez-Hernández, 2012)
4.6 LSI – UNED 2 describes the participation of the group at
(Castellano, Cigarrán and García Serrano, Facultad de Filosofía y Letras in Universidad de
2012) describes the research done for the Málaga.
workshop by the second team component of the They use a lexicon-based approach to
LSI group at UNED. Sentiment Analysis (SA). These approaches
Their proposal addresses the sentiment and differ from the more common machine-learning
topic detection from an Information Retrieval based approaches in that the former rely solely
(IR) perspective, based on language on previously generated lexical resources that
divergences. Kullback-Liebler Divergence store polarity information for lexical items,
(KLD) is used to generate both, polarity and which are then identified in the texts, assigned a
topic models, which will be used in the IR polarity tag, and finally weighed, to come up
process. with an overall score for the text.
In order to improve the accuracy of the Such SA systems have been proved to
results, they propose several approaches perform on par with supervised, statistical
focused on carry out language models, not only systems, with the added benefit of not requiring
considering the textual content associated to a training set. However, it remains to be seen
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 100
whether such lexically-motivated systems can Besides, results for different submissions
cope equally well with extremely short texts, as from the same group are typically very similar
generated on social networking sites, such as except for the case of SINAl - UJAEN group.
Twitter.
In their paper they perform such an [ Run Id Gr o u p Pr ec i s i o n 1
evaluation using Sentitext, a lexicon-based SA po l-e lhu ya r·l -5 1 Elhu ya r Fun d. 65 .29%
tool for Spanish. One conclusion is that po l· l2f-l -5 1 L2F- IN ESC 63 .37%
po l· l2f-3-5 1 L2F- IN ESC 63 .27%
affected by the number of lexical units available po l· l2f-2-5 1 L2F- IN ESC 62 .l 6%
in the text (or the lack of them, rather). On the po l·atr ill a· l -5 1 La Sa ll e- URL 57 .0l %
other hand, they also po l-s in ai-4-5 1 SINAl - UJAE N 54. 68%
tendency to assign middle-of-the-scale ratings, po l-un ed l -2-5 1 LSI UN ED 53 .82%
or at least avoid extreme values, which is po l-un ed l -l -5 1 LSI UN ED 52 .54%
reflected on its poor performance for the N+ po l· un ed2-2-5 1 LSI UN ED 2 40.4l %
po l· un ed2-l -5 1 LSI UN ED 2 39 .98%
and P+ classes, most of which were assigned to
po l· un ed2-3-5 1 LSI UN ED 2 39 .4 7%
the more neutral N and P classes.
po l· un ed2-4-5 1 LSI UN ED 2 38 .59%
Another interesting conclusion, which is
po l· im dea·l -5 1 IMD EA 36 .04%
drawn from their analysis of the average
po l-s in ai·2-5 1 SINAl - UJAE N 35 .65%
number of polarity lexical segments and Affect
po l-s in ai·l -5 1 SINAl - UJAE N 35 .28%
Intensity, is that Twitter users employ highly po l-s in ai-3-5 1 SINAl - UJAE N 34. 97%
emotionallanguage. po l-um a· l -5 1 UMA l 6.73%
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 101
In this case, precision values improve, as trending topic within the information
expected. The precision obtained now ranges technology field.
from 71.12% to 35.11%. In this case, 9 Sorne participants expressed sorne concerns
submissions have a precision value over 50% about the quality of both the annotation of the
and 6 groups have at least one result over this training corpus and also of the gold standard
percent. (the test corpus). In case of future editions of
Table 7 shows the results for Task 2 TASS and the reuse of the corpus, more effort
(Trending topic coverage ). 13 experiments were must be invested in filtering errors and
submitted in all (plus 2 experiments from improving the annotation of the corpora.
TUDelft). Furthermore, as expressed by (Moreno-Ortiz
and Pérez-Hernández), there is a need of further
rRun Id Gro u p Precision l discussion about whether differentiating
top· l2f-2 L2F- IN ESC 65.37% between neutral and no polarity is the best
top· l2f-l y3 L2F- IN ESC 64.92% decision, since it is not always clear what the
top-atr ill a- 1 La Sa ll e- URL 60.l6% difference is, and, moreover, if this distinction
po l·uned2-5a8 LSI UNED 2 45.26% is interesting from a practical perspective.
top-i m dea-1 IMD EA 45.24% In future editions of the workshop, it would
po l·uned2-9a l2 LSI UNED 2 42.24% be interesting to extend the corpus to other
po l·uned2-l a4 LSI UNED 2 40.5l % languages (English in particular) to compare the
top-s in a i-5 SINAl - UJAEN 39.37% performance of the different approaches on
top-s in a i-4 SINAl - UJAEN 37.79% different languages.
top-sinai-2 SINAl - UJAEN 34.76%
top-s in a i-3 SINAl - UJAEN 34.06%
References
top-s in a i-1 SINAl - UJAEN 32.34%
po l·un edl-ly2 LSI UNED 30.98% Saralegi Urizar, Xabier; San Vicente Roncal,
Iñaki. 2012. TASS: Detecting Sentiments in
Spanish Tweets. TASS 2012 Working Notes.
Table 7: Results for task 2 (Trending topic
coverage) Fernández Anta, Antonio; Morere, Philippe;
Núñez Chiroque, Luis; and Santos, Agustín.
Techniques for Sentiment Analysis and
In this task, precision ranges from 65.37% to
Tapie Detection of Spanish Tweets:
30.98% and only 4 of 15 submissions are above
Preliminary Report. TASS 2012 Working
50% (2 groups). As in task 1, different
Notes.
submissions from the same group usually have
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 102
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
TASS: Detecting Sentiments in Spanish Tweets
TASS: Detección de Sentimientos en Tuits en Español
Resumen: Este artı́culo describe el sistema presentado por nuestro grupo para la
tarea de análisis de sentimiento enmarcada en la campaña de evaluación TASS 2012.
Adoptamos una aproximación supervisada que hace uso de conocimiento lingüı́stico.
Este conocimiento lingüı́stico comprende lematización, etiquetado POS, etiquetado
de palabras de polaridad, tratamiento de emoticonos, tratamiento de negación, y
ponderación de polaridad según el nivel de anidamiento sintáctico. También se lleva
a cabo un preprocesado para el tratamiento de errores ortográficos. La detección
de las palabras de polaridad se hace de acuerdo a un léxico de polaridad para el
castellano creado en base a dos estrategias: Proyección o traducción de un léxico de
polaridad de inglés al castellano, y extracción de palabras divergentes entre los tuits
positivos y negativos correspondientes al corpus de entrenamiento. Los resultados de
la evaluación final muestran un buen rendimiento del sistema ası́ como una notable
robustez tanto para la detección de polaridad a alta granularidad (65% de exactitud)
como a baja granularidad (71% de exactitud).
Palabras clave: TASS, Análisis de sentimiento, Minerı́a de opiniones, Detección
de polaridad
Abstract: This article describes the system presented for the task of sentiment
analysis in the TASS 2012 evaluation campaign. We adopted a supervised approach
that includes some linguistic knowledge-based processing for preparing the features.
The processing comprises lemmatisation, POS tagging, tagging of polarity words,
treatment of emoticons, treatment of negation, and weighting of polarity words
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 104
the tweets. The TASS evaluation workshop icon to classify movie reviews. Read (2005)
aims “to provide a benchmark forum for com- confirmed the necessity to adapt the mod-
paring the latest approaches in this field”. els to the application domain, and (Choi and
Our team only took part in the first task, Cardie, 2009) address the same problem for
which involved predicting the polarity of a polarity lexicons.
number of tweets, with respect to 6-category In the last few years many researchers
classification, indicating whether the text ex- have turned their efforts to microblogging
presses a positive, negative or neutral senti- sites such as Twitter. As an example, (Bol-
ment, or no sentiment at all. It must be noted len, Mao, and Zeng, 2010) have studied the
that most works in the literature only classify possibility of predicting stock market res-
sentiments as positive or negative, and only ults by measuring the sentiments expressed
in a few papers are neutral and/or objective in Twitter about it. The special character-
categories included. We developed a super- istics of the language of Twitter require a
vised system based on a polarity lexicon and special treatment when analyzing the mes-
a series of additional linguistic features. sages. A special syntax (RT, @user, #tag,...),
The rest of the paper is organized as fol- emoticons, ungrammatical sentences, vocab-
lows. Section 2 reviews the state of the art ulary variations and other phenomena lead
in the polarity detection field, placing spe- to a drop in the performance of traditional
cial interest on sentence level detection, and NLP tools (Foster et al., 2011; Liu et al.,
on twitter messages, in particular. The third 2011). In order to solve this problem, many
section describes the system we developed, authors have proposed a normalization of the
the features we included in our supervised text, as a pre-process of any analysis, report-
system and the experiments we carried out ing an improvement in the results. Brody
over the training data. The next section (2011) deals with the word lengthening phe-
presents the results we obtained with our sys- nomenon, which is especially important for
tem first in the training-set and later in the sentiment analysis because it usually ex-
test data-set. The last section draws some presses emphasis of the message. (Han and
conclusions and future directions. Baldwin, 2011) use morphophonemic simil-
arity to match variations with their standard
2 State of the Art vocabulary words, although only 1:1 equival-
Much work has been done in the last dec- ences are treated, e.g., ’imo = in my opinion’
ade in the field of sentiment labelling. Most would not be identified. Instead, they use an
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
of these words are limited to polarity de- Internet slang dictionary to translate some of
tection. Determining the polarity of a text those expressions and acronyms. Liu et al.
unit (e.g., a sentence or a document) usually (2012) propose combining three strategies,
includes using a lexicon composed of words including letter transformation, “priming” ef-
and expressions annotated with prior polar- fect, and misspelling corrections.
ities (Turney, 2002; Kim and Hovy, 2004; Once the normalization has been per-
Riloff, Wiebe, and Phillips, 2005; Godbole, formed, traditional NLP tools may be used to
Srinivasaiah, and Skiena, 2007). Much re- analyse the tweets and extract features such
search has been done on the automatic or as lemmas or POS tags (Barbosa and Feng,
semi-automatic construction of such polar- 2010). Emoticons are also good indicators
ity lexicons (Riloff and Wiebe, 2003; Esuli of polarity (O’Connor et al., 2010). Other
and Sebastiani, 2006; Rao and Ravichandran, features analyzed in sentiment analysis such
2009; Velikovich et al., 2010). as discourse information (Somasundaran et
Regarding the algorithms used in senti- al., 2009) can also be helpful. (Speriosu et
ment classification, although there are ap- al., 2011) explore the possibility of exploiting
proaches based on averaging the polarity of the Twitter follower graph to improve polar-
the words appearing in the text (Turney, ity classification, under the assumption that
2002; Kim and Hovy, 2004; Hu and Liu, people influence one another or have shared
2004; Choi and Cardie, 2009), machine learn- affinities about topics. (Barbosa and Feng,
ing methods have become the more widely 2010; Kouloumpis, Wilson, and Moore, 2011)
used approach. Pang et al. (2002) proposed combined polarity lexicons with machine
a unigram model using Support Vector ma- learning for labelling sentiment of tweets.
chines which does not need any prior lex- Sindhwani and Melville (2008) adopt a semi-
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 105
NONE are higher than the rest. NEU is the tract the words most associated with a cer-
class including the least tweets. In addition, tain polarity; let us say positive, we divided
each message includes its Twitter ID, the cre- the corpus into two parts: positive tweets
ation date and the twitter user ID. and the rest of the corpus. Using the Log-
Polarity #tweets % of #tweets
likelihood ratio (LLR) we obtained the rank-
P+ 1,764 24.44% ing of the most salient words in the positive
P 1,019 14.12% part with respect to the rest of the corpus.
NEU 610 8.45% The same process was conducted to obtain
N 1,221 16.91% negative candidates. The top 1,000 negative
N+ 903 12.51% and top 1,000 positive words were manually
NONE 1,702 23.58% checked. Among them, 338 negative and 271
Total 7,219 100% positive words were selected for the polarity
lexicon (see sixth column in Table 3). We
found a higher concentration of good candid-
Table 1: Polarity classes distribution in cor- ates among the best ranked candidates (see
pus Ct . Figure 1).
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 106
As mentioned in section 2, microblogging 3.2) were selected as features. This allows the
in general and Twitter, in particular, suffers system to focus on features that express the
from a high presence of spelling errors. This polarity, without further noise. Another ef-
hampers any knowledge-based processing as fect is that the number of features decreases
well as supervised methods. We rejected the significantly (from 15,069 to 3,730), thus re-
use of spell-correctors such as Google spell- ducing the computational costs of the model.
checker because they try to treat many cor- In our experiments relying on the polarity
rect words that they do not know. There- lexicon (see Table 4) clearly outperforms the
fore, we apply some heuristics in order to pre- unigram-based baseline. The rest of the fea-
process the tweets and solve the main prob- tures were tested on top of this configuration.
lems we detected in the training corpus:
3.3.3 Emoticons and Interjections
• Replication of characters (e.g., (EM)
“Sueñooo”): Sequences of the same Emoticons and interjections are very strong
characters are replaced by a single expressions of sentiments. A list of emoticons
character when the pre-edited word is is collected from a Wikipedia article about
not included in Freeling’s1 dictionary emoticons and all of them are classified as
and the post-edited word appears in positive (e.g., “:)”, “:D” ...) or negative (e.g.,
Freeling’s dictionary. “:(“ , “u u” ...). 23 emoticons were classified
as positive and 35 as negative. A list of 54
• Abbreviations (e.g., “q”, “dl”, ...): A
negative (e.g., “mecachis”, “sniff ”, ...) and
list of abbreviations is created from the
28 positive (e.g., “hurra”, “jeje”, ...) interjec-
1
http://nlp.lsi.upc.edu/freeling tions including variants modelled by regular
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 107
expressions were also collected from different citly, the polarity words included in the Pes
webs as well as from the training corpora. but not in the training corpus will be used
The frequency of each emoticon and interjec- by the classifier. By dealing with those OOV
tion type (positive or negative) is included as polarity words, our intention is to make our
a feature of the classifier. system more robust.
The number of upper-case letters in the Two new features are created to be
tweet was also used as an orthographical clue. included in the polarity information: a score
In Twitter where it is not possible to use let- of the positivity and a score of the negativity
ter styling, people often use the upper case of a tweet. In principle, positive words
to emphasize their sentiments (e.g., GRA- in Pes add 1 to the positivity score and
CIAS), and hence, a large number of upper- negative words add 1 to the negativity score.
case letters would denote subjectivity. So, However, depending on various phenomena,
the relative number of upper-case letters in a the score of a word can be altered. These
tweet is also included as a feature. phenomena are explained below.
According to the results (see Table 4),
these clues did not provide a significant im- Treatment of Negations and Adverbs
provement. Nevertheless, they did show a The polarity of a word changes if it is
slight improvement. Moreover, other literat- included in a negative clause. Syntactic
ure shows that such features indeed help to information provided by Freeling is used
detect the polarity (Koulompis, 2011). The for detecting those cases. The polarity of a
low impact of these features could be ex- word increases or decreases depending on the
plained by the low density of such elements adverb which modifies it. We created a list of
in our data-set: only 622 out of 7,219 tweets increasing (e.g., “mucho”, “absolutamente”,
in the training data (8.6%) include emoticons ...) and decreasing (e.g., “apenas”, “poco”,
or interjections. Emoticon, interjection and ...) adverbs. If an increasing adverb modify-
capitalization features were included in our ing a polarity word is detected, the polarity
final model. is increased (+1). If it is a decreasing adverb,
3.3.4 POS Information (PO) the polarity of the words is decreased (−1).
Results obtained among the literature are not Syntactic information provided by Freeling
clear as to whether POS information helps is used for detecting these cases.
to determine the polarity of the texts (Kou-
lompis 2011), but POS tags are useful for dis- Syntactic Nesting Level
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
tinguishing between subjective and objective The importance of the word in the tweet
texts. Our hypothesis is that certain POS determines the influence it can have on the
tags are more frequent in opinion messages, polarity of the whole tweet. We measured
e.g., adjectives. In our experiments POS tags the importance of each word w by calculat-
provided by Freeling were used. We used as ing the relative syntactic nesting level ln (w).
a feature the frequency of the POS tags in a The lower the syntactic level, the less import-
message. ant it is. The relative syntactic nesting level
Results in Table 4 show that this feature is computed as the inverse of the syntactic
provides a notable improvement and it is es- nesting level (1/ln (w)).
pecially helpful for detecting objective mes-
sages (view difference in F-score between SP Features/
Metric
Acc.
(6 cat.)
P+ P NEU N N+ NONE
and SP+PO for the NONE class). Baseline 0.45 0.574 0.267 0.137 0.368 0.385 0.578
SP 0.484 0.594 0.254 0.098 0.397 0.422 0.598
3.3.5 Frequency of Polarity Words SP+PO 0.496 0.596 0.245 0.093 0.414 0.438 0.634
SP+EM 0.49 0.612 0.253 0.097 0.402 0.428 0.6
(FP) SP+FP 0.514 0.633 0.261 0.115 0.455 0.438 0.613
The SP classifier does not interpret the polar- All
ALL+AC1
0.523
0.523
0.648
0.647
0.246
0.248
0.111
0.116
0.463
0.46
0.452
0.451
0.657
0.655
ity information included on the lexicon. We
explicitly provide that information as a fea-
ture to the classifier. Furthermore, without Table 4: Accuracy results obtained on the
the polarity information, the classifier will be evaluation of the training data. Columns 3rd
built taking into account only those polarity to 8th show F-scores for each of the class val-
words appearing in the training data. Includ- ues.
ing the polarity frequency information expli-
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 108
3.3.6 Using Additional Corpora (AC) the training data Ct−train only those tweets
Additional training data were retrieved using of Ctw containing at least one word w from
the Perl Net::Twitter API. Different searches Pes but not appearing in the training corpus
were conducted during June 2012 using the (w ∈ Pes ∧f req(w, Ct−train ) = 0). Only 7.9%
attitude feature of the twitter search. Using of the retrieved tweets were added. Results
this feature, users can search for tweets ex- were still unsatisfactory, and so, additional
pressing either positive or negative opinion. training data were left out of the final model.
The search is based on emoticons as in (Go It must be noted that the tweet retrieval
et al., 2009). Retrieved tweets were classified effort was very simple, due to the limited
according to their attitude. time we had to develop the system. We
conclude that these additional training data
Corpora/Tweets P N Total were unhelpful due to the differences with
Ctw 11,363 9,865 21,228 the original data provided: Ctw contained
many more ungrammatical structures and
nonstandard tokens than the original data;
Table 5: Characteristics of the tweet corpus the dates of the tweets were different which
collected from Twitter. could even lead to topic and vocabulary dif-
The corpus Ctw including retrieved tweets ferences; and especially, the fact that the ad-
(see Table 5.) was used in two ways: on the ditional data collected did not include neutral
one hand, we used it to find new words for our or objective tweets and neither did it include
polarity lexicon Pes , by using the automatic different degrees of polarity in the case of pos-
method described in section 3.2. The first 500 itive and negative tweets.
positive candidates and 500 negative candid- Features/ #training Accuracy
ates were manually checked. Altogether, 110 Metric examples
positive words and 95 negative ones (AC1) ALL 6,137 0.573
were included in the polarity lexicon Pes . ALL+AC2 27,365 0.507
According to the results (see ALL+AC1 in ALL+AC2-OOV 7,807 0.569
Table 4), these new polarity words do not
provide any improvement. The reason is that
most relevant polarity words included in the Table 6: Results obtained by including addi-
training corpus Ct are already included in tional examples in the training data.
Pe s as explained in section 3.2. In order to
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 109
creases significantly in the test data with re- system effectively combines several features
spect to the training data. By contrast, the based on linguistic knowledge. In our case,
NEU and P classes decreased dramatically. using a semi-automatically built polarity lex-
The distribution difference together with the icon improves the system performance signi-
performance of the system regarding specific ficantly over a unigram model. Other fea-
classes could explain the difference in accur- tures such as POS tags, and especially word
acy between test and training evaluations. It polarity statistics were also found to be help-
remains unclear to us why the F-scores for ful. In our experiments, including external
all the classes improved with respect to the training data was unsuccessful. However, our
training phase. We should analyse the char- approach was very simple, and so, a more ex-
acteristics of the training and test corpora, haustive experimentation should be carried
looking for differences in the samples and an- out in order to obtain conclusive results. In
notation. any case, the system shows robust perform-
As for the results of the individual classes, ance when it is evaluated against test data
it is worth mentioning that neutral tweets different from the training data.
are very difficult to classify because they There is still much room for improvement.
do contain polarity words. We looked at Tweet normalization was naı̈vely implemen-
its confusion matrix (both for training and ted. Some authors (Pang and Lee, 2004;
test evaluations) and it shows that NEU Barbosa and Feng, 2010) have obtained pos-
tweets wrongly classified are evenly dis- itive results by including a subjectivity ana-
tributed between the other classes, except lysis phase before the polarity detection step.
for the NONE class, with almost no NEU We would like to explore that line of work.
tweets classified as NONE. Most of the NEU Lastly, it would be worthwhile conducting
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 110
in-depth research into the creation of polar- Conference on Artificial Intelligence, Au-
ity lexicons including domain adaption and gust.
treatment of word senses.
Go, A., R. Bhayani, and L. Huang. 2009.
Acknowledgments Twitter sentiment classification using dis-
tant supervision. CS224N Project Report,
This work has been partially founded by the Stanford, pages 1–12.
Industry Department of the Basque Govern-
ment under grant IE11-305 (knowTOUR pro- Godbole, N., M. Srinivasaiah, and S. Skiena.
ject). 2007. Large-scale sentiment analysis for
news and blogs. In Proceedings of the
References International Conference on Weblogs and
Social Media (ICWSM), pages 219–222.
Barbosa, Luciano and Junlan Feng. 2010.
Robust sentiment detection on twit- Hall, Mark, Eibe Frank, Geoffrey Holmes,
ter from biased and noisy data. In Bernhard Pfahringer, Peter Reutemann,
Proceedings of the 23rd International and Ian H. Witten. 2009. The
Conference on Computational Linguistics: WEKA data mining software: an up-
Posters, COLING ’10, pages 36–44, date. SIGKDD Explor. Newsl., 11(1):10–
Stroudsburg, PA, USA. Association for 18, november.
Computational Linguistics.
Han, Bo and Timothy Baldwin. 2011.
Bollen, Johan, Huina Mao, and Xiao-Jun Lexical normalisation of short text
Zeng. 2010. Twitter mood predicts the messages: Makn sens a #twit-
stock market. 1010.3003, October. ter. In Proceedings of the 49th
Annual Meeting of the Association
Brody, Samuel and Nich-
for Computational Linguistics: Human
olas Diakopoulos. 2011.
Language Technologies, pages 368–378,
Cooooooooooooooollllllllllllll!!!!!!!!!!!:
Portland, Oregon, USA, June. Associ-
using word lengthening to detect senti-
ation for Computational Linguistics.
ment in microblogs. In Proceedings of
the Conference on Empirical Methods in Hu, M. and B. Liu. 2004. Mining
Natural Language Processing, EMNLP and summarizing customer reviews. In
’11, pages 562–570. Association for Proceedings of the tenth ACM SIGKDD
Computational Linguistics. international conference on Knowledge
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 111
Liu, X., S. Zhang, F. Wei, and M. Zhou. Riloff, Ellen and Janyce Wiebe. 2003.
2011. Recognizing named entities Learning extraction patterns for subject-
in tweets. In Proceedings of the ive expressions. In Proceedings of the
49th Annual Meeting of the Association 2003 conference on Empirical methods in
for Computational Linguistics: Human natural language processing -, pages 105–
Language Technologies (ACL-HLT 2011), 112.
Portland, Oregon. Somasundaran, Swapna, Galileo Namata, Ja-
O’Connor, Brendan, Ramnath Balasub- nyce Wiebe, and Lise Getoor. 2009.
ramanyan, Bryan R. Routledge, and Supervised and unsupervised methods in
Noah A. Smith. 2010. From tweets employing discourse relations for improv-
to polls: Linking text sentiment to ing opinion polarity classification. In
public opinion time series. In Fourth Proceedings of the 2009 Conference on
International AAAI Conference on Empirical Methods in Natural Language
Weblogs and Social Media, May. Processing: Volume 1 -, EMNLP ’09,
pages 170–179, Stroudsburg, PA, USA.
Pang, Bo and Lillian Lee. 2004. A senti- Association for Computational Linguist-
mental education: sentiment analysis us- ics.
ing subjectivity summarization based on
Speriosu, Michael, Nikita Sudan, Sid Up-
minimum cuts. In Proceedings of the
adhyay, and Jason Baldridge. 2011.
42nd annual meeting of the Association
Twitter polarity classification with label
for Computational Linguistics, ACL ’04,
propagation over lexical links and the fol-
Stroudsburg, PA, USA. Association for
lower graph. In Proceedings of the First
Computational Linguistics.
Workshop on Unsupervised Learning in
Pang, Bo, Lillian Lee, and Shivakumar NLP, EMNLP ’11, pages 53–63, Strouds-
Vaithyanathan. 2002. Thumbs up?: sen- burg, PA, USA. Association for Computa-
timent classification using machine learn- tional Linguistics.
ing techniques. In Proceedings of the Turney, Peter D. 2002. Thumbs up
ACL-02 conference on Empirical methods or thumbs down?: semantic orienta-
in natural language processing - Volume tion applied to unsupervised classifica-
10, EMNLP ’02, pages 79–86, Strouds- tion of reviews. In Proceedings of the
burg, PA, USA. Association for Compu- 40th Annual Meeting on Association for
tational Linguistics. Computational Linguistics - ACL ’02,
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
Rao, Delip and Deepak Ravichandran. 2009. page 417, Philadelphia, Pennsylvania.
Semi-supervised polarity lexicon induc- Velikovich, Leonid, Sasha Blair-Goldensohn,
tion. In Proceedings of the 12th Kerry Hannan, and Ryan McDon-
Conference of the European Chapter ald. 2010. The viability of web-
of the Association for Computational derived polarity lexicons. In Human
Linguistics, EACL ’09, pages 675–682, Language Technologies: The 2010
Stroudsburg, PA, USA. Association for Annual Conference of the North
Computational Linguistics. American Chapter of the Association
for Computational Linguistics, HLT
Read, Jonathon. 2005. Using emoticons
’10, pages 777–785, Stroudsburg, PA,
to reduce dependency in machine learning
USA. Association for Computational
techniques for sentiment classification. In
Linguistics.
Proceedings of the ACL Student Research
Workshop, ACLstudent ’05, pages 43–48, Wilson, Theresa, Paul Hoffmann, Swapna
Stroudsburg, PA, USA. Association for Somasundaran, Jason Kessler, Janyce
Computational Linguistics. Wiebe, Yejin Choi, Claire Cardie,
Ellen Riloff, and Siddharth Pat-
Riloff, E., J. Wiebe, and W. Phillips. wardhan. 2005. OpinionFinder.
2005. Exploiting subjectivity classifica- In Proceedings of HLT/EMNLP on
tion to improve information extraction. Interactive Demonstrations -, pages
In Proceeding of the national conference 34–35, Vancouver, British Columbia,
on Artificial Intelligence, volume 20, page Canada.
1106.
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
Techniques for Sentiment Analysis and Topic Detection of
Spanish Tweets: Preliminary Report∗
Técnicas de análisis de sentimientos y deteccion de asunto de tweets en
español: informe preliminar
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 113
ber of methods and techniques have been pro- Pang and Lee (Pang and Lee, 2008) have
posed in the literature to solve them. Most a comprehensive survey of sentiment analy-
of these techniques focus on English texts sis and opinion mining research. Liu (Liu,
and study large documents. In our work, 2010), on his hand, reviews and discusses a
we are interested in languages different from wide collection of related works. Although,
English and micro-texts. In particular, we most of the research conducted focuses on
are interested in sentiment and topic clas- English texts, the number of papers on the
sification applied to Spanish Twitter micro- treatment of other languages is increasing ev-
blogs. Spanish is increasingly present over ery day. Examples of research papers on
the Internet, and Twitter has become a pop- Spanish texts are (Brooke, Tofiloski, and
ular method to publish thoughts and infor- Taboada, 2009; Martı́nez-Cámara, Martı́n-
mation with its own characteristics. For in- Valdivia, and Ureña-López, 2011; Martı́nez
stance, publications in Twitter take the form Cámara et al., 2011).
of tweets (i.e., Twitter messages), which are Most of the algorithms for sentiment anal-
micro-texts with a maximum of 140 char- ysis and topic detection use a collection of
acters. In Spanish tweets it is common to data to train a classifier that is later used
find specific Spanish elements (SMS abbrevi- to process the real data. The (training and
ations, hashtags, slang). The combination of real) data is processed before being used for
these two aspects makes this a distinctive re- (building or applying) the classifier in or-
search topic, with potentially deep industrial der to correct errors and extract the main
applications. features (to reduce the required processing
The motivation of our research is twofold. time or memory). Many different techniques
On the one hand, we would like to know have been proposed for these phases. For in-
whether usual approaches that have been stance, different classification methods have
proved to be effective with English text are been proposed, like Naive Bayes, Maximum
also so with Spanish tweets. On the other, Entropy, Support Vector Machines (SVM),
we would like to identify the best (or at BBR, KNN, or C4.5. In fact, there is no fi-
least good) technique for Spanish tweets. For nal agreement on which of these classifiers
this second question, we would like to eval- is the best. For instance, Go et al. (Go,
uate those techniques proposed in the lit- Bhayani, and Huang, 2009) report similar ac-
erature, and possibly propose new ad hoc curacy with classifiers based on Naive Bayes,
techniques for our specific context. In our Maximum Entropy, and SVM.
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
study, we try to sketch out a comparative Regarding preprocessing the data (texts
study of several schemes on term weight- in our case), one of the first decisions to be
ing, linguistic preprocessing (stemming and made is which elements will be used as ba-
lemmatization), term definition (e.g., based sic terms. Laboreiro et al. (Laboreiro et al.,
on uni-grams or n-grams), the combination 2010) explore tweets tokenization (or symbol
of several dictionaries (sentiment, SMS ab- segmentation) as the first key task for text
breviations, emoticons, spell, etc.) and the processing. Once single words or terms are
use of several classification methods. When available, typical choices are using uni-grams,
possible, we have used tools freely available, bi-grams, n-gram, or parts-of-speech (POS).
like the Waikato Environment for Knowl- Again, there is no clear conclusion on which
edge Analysis (WEKA, an open source soft- is the best option, since Pak and Paroubek
ware which consists of a collection of machine (Pak and Paroubek, 2010) report the best
learning algorithms for data mining) (at Uni- performance with bi-grams, while Go (Go,
versity of Waikato, 2012). Bhayani, and Huang, 2009) present better
results with unigrams. The preprocessing
1.1 Related Work phase may also involve word processing the
As mentioned above, sentiment analysis, also input texts: stemming, spelling and/or se-
known as opinion mining, is a challenging mantic analysis. Tweets are usually very
Natural Language Processing (NLP) prob- short, having emoticons like :) or :-), or ab-
lem. Due to its tremendous value for prac- breviated (SMS) words like “Bss” for “Besos”
tical applications, it has experienced a lot (“kisses”). Agarwal et al. (Agarwal et al.,
of attention, and it is perhaps one of the 2011) propose the use of several dictionaries:
most widely studied topic in the NLP field. an emoticon dictionary and an acronym dic-
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 114
tionary. Other preprocessing tasks that have ing. They propose methods for the classifi-
been proposed are contextual spell-checking cation of tweets in an open (dynamic) set of
and name normalization (Kukich, 1992). topics. Instead, in work we are interested in
One important question is whether the al- a closed (fixed) set of topics. However, we ex-
gorithms and techniques proposed for a type plore all the index and clustering techniques
of data can be directly applied to tweets. proposed, since most of them could be ap-
This could be very convenient, since a cor- plied to sentiment analysis process.
pus of Spanish reviews of movies (from Mu-
chocine1 ) has already been collected and 1.2 Contributions
studied (Cruz et al., 2008; Martı́nez Cámara In this paper we have explored the perfor-
et al., 2011). Unfortunately, Twitter data mance of several preprocessing, feature ex-
poses new and different challenges, as dis- traction, and classification methods in a cor-
cussed by Agarwal et al. (Agarwal et al., pus of Spanish tweets, both for sentiment
2011) when reviewing some early and re- analysis and for topic detection. The differ-
cent results on sentiment analysis of Twit- ent methods considered can be classified into
ter data (e.g., (Go, Bhayani, and Huang, almost orthogonal families, so that a differ-
2009; Bermingham and Smeaton, 2010; Pak ent method can be selected from each family
and Paroubek, 2010)). Engström (En- to form a different configuration. In partic-
gström, 2004) has also shown that the bag- ular, we have explored the following families
of-features approach is topic-dependent and of methods.
Read (Read, 2005) demonstrated how models
Term definition and counting In this
are also domain-dependent.
family it is decided what constitutes a ba-
These papers, as expected, use a broad sic term to be considered by the classifica-
spectrum of tools for the extraction and clas- tion algorithm. The different alternatives are
sification processes. For feature extraction, using single words (uni-grams), or groups of
FreeLing (Padró et al., 2010) has been pro- words (bi-grams, tri-grams, n-grams) as ba-
posed, which is a powerful open-source lan- sic terms. Of course, the aggregation of all
guage processing software. We use it as an- these alternatives is possible, but it is typi-
alyzer and for lemmatization. For classifica- cally never used because it results in a huge
tion, Justin et al. (Justin et al., 2010) report number of different terms, which makes the
very good results using WEKA (at Univer- processing hard or even impossible. Each of
sity of Waikato, 2012; Hall et al., ), which the different terms that appears in the in-
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
is one of the most widely used tools for put data is called by classification algorithms
the classification phase. Other authors pro- an attribute. Once the term formation is de-
posed the use of additional libraries like Lib- fined, the list of attributes in the input data is
SVM (Chang and Lin, 2011). In contrast, found, and the occurrences of each attributed
some authors (e.g., (Phuvipadawat and Mu- are counted.
rata, 2010)) propose the utilization of Lucene
(Lucene, 2005) as index and text search en- Stemming and lemmatization One of
gine. the main difference between Spanish and En-
Most of the references above have to do glish is that English is a weakly inflected
with sentiment analysis, since this is a very language in contrast to Spanish, a highly
popular problem. However, the problem inflected one. A part of our work is the
of topic detection is becoming also popu- stemming and lemmatization process. In or-
lar (Sriram et al., 2010), among other rea- der to reduce the feature dimension (num-
sons, to identify trending topics (Allan, 2002; ber of attributes), each word could be re-
Bermingham and Smeaton, 2010; Lee et duced to either its lemma (canonical form)
al., 2011). Due to the the realtime nature (e.g., “cantábamos” is reduced to its infini-
of Twitter data, most works (Mathioudakis tive “cantar”) or its stem (e.g., “cantábamos”
and Koudas, 2010; Sankaranarayanan et al., is reduced to “cant”). One interesting ques-
2009; Vakali, Giatsoglou, and Antaris, 2012; tions is to compare how well the usual stem-
Phuvipadawat and Murata, 2010) are inter- ming and lemmatization processes perform
ested in breaking news detection and track- with Spanish words.
Word processing and correction Sev-
1
http://www.muchocine.net eral dictionaries are available to correct the
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 115
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 116
we limit n to be no larger than 3. dictionary (see below) we may not use the in-
Of course, it is possible to combine the n- put data. This is controlled with a parameter
grams with several values of n. We only con- that we denote Use input data (see Table 1).
sider the possibility of combining two such Moreover, even if the input data is processes,
values, and one has to be n = 1. This is we may filter it and only keep some of it; for
controlled with the flag Only n-gram (see instance, we may decide to use only nouns.
Table 1), which says whether only n-grams This can be controlled with the parameter
(with n > 1) are considered as terms or also Word types (see Table 1), which is described
individual words (unigrams) are considered. below. In summary, the list of attributes is
In the latter case, the lists of attributes of built from the input data (if so decided) pre-
both cases are merged. The drawback of processed as determined by the rest of pa-
merging is the high number of entries in the rameters (e.g., filtered Word types) and from
final attribute list. Hence, when doing this, a potentially the additional data (like the af-
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 117
cess of extracting it is called lemmatization ested in. For topic estimation, the keywords
or stemming, respectively. Lemmatization are mainly nouns and verbs whereas for senti-
preserves the meaning and type of a word ment analysis, they are adjectives and verbs.
(e.g., words “buenas” and “buenos” become For example, in the sentence La pelicula es
“bueno”). We have used the FreeLing soft- buena (“The movie is good”), the only word
ware to perform this processing, since it can that is carrying the topic information is the
provide the lemma of those words that are noun pelicula, which is very specific to the
in its dictionary. After lemmatization, there cinema topic. Besides, the word that best
are no plurals or other inflected forms, but reflects the sentiment of the sentence is the
still two words with the same root but differ- adjective buena, which is positive. Also, in
ent type may appear. Stemming on its hand the sentence El equipo ganó el partido (“The
reduces even more the list of attributes. A team won the match”), the verb ganó is car-
stem is a word whose affixes has been re- rying information for both topic and senti-
moved. Stemming might lose the meaning ment analysis: the verb ganar is used very
and any morphological information that the often in the soccer and sport topics and has
original word had (e.g., words “aparca”, verb, a positive sentiment. We allow to filter the
and “aparcamiento”, noun, become “aparc”). words of the input data using their type with
The Snowball (Sno, 2012) software stemmer the parameter Word types (see Table 1). The
has been used in our experiments. filtering is done using the FreeLing software,
We have decided to always use one of the which is used to retrieve the type of each
two processes. Which one is used in a partic- word.
ular configuration is controlled with the pa- When performing sentiment analysis, we
rameter Lemma/Stem (see Table 1). have found useful to have an affective dic-
2.3 Word Processing and tionary, whose use is controlled with the
flag Affective dictionary (see Table 1). We
Correction have used an affective dictionary developed
As mentioned above, one of the possible pre- by Martı́n Garcı́a (Garcı́a, 2009). This dictio-
processing steps of the data before extracting nary consist of a list of words that have a pos-
attributes and vectors is to correct spelling itive or negative meaning, expanded by their
errors. Whether or not this step is taken is polarity “P” or “N” and their strength “+” or
controlled with the flag Correct words (see “-”. For example, the words bueno (“good”)
Table 1). If correction is done, the algorithm and malo (“bad”) are respectively positive
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
uses the Hunspell dictionary (Hun, 2012) (an and negative with no strength whereas the
open source spell-checker) to perform it. words mejor (“best”) and peor (“worse”)
Another optional preprocessing step (con- are respectively positive and negative with
trolled with the flag SMS ) expands the a positive strength. As a first approach, we
emoticons, shorthand notations, and slang have not intensively used the polarity and the
commonly used in SMS messages which is not strength of the affective words in the dictio-
understandable by the Hunspell dictionary. nary. Its use only forces the words that con-
The use of these abbreviations is common in tain it to be added as attributes. This has the
tweets, given the limitation to 140 charac- advantage of drastically reducing the size of
ters. An SMS dictionary (dic, 2012) is used the attribute list, specially if the input data
to do the preprocessing. It transforms the is filtered. Observe that the use of this dictio-
SMS notations into words understandable by nary for sentiment analysis is very pertinent,
the main dictionary. Also, the emoticons are since the affective words carry the tweet po-
replaced by words that describe their mean- larity information. In a more advanced fu-
ing. For example :-) is replaced by feliz ture aproach, the characteristics of the words
(“happy”) and :-( by triste (“sad”). The could be used to compute weights. Since not
emoticons tend to have a strong emotional se- all the words in our affective dictionary may
mantic. Hence, this process helps estimating appear in the corpus we have used, we have
the sentiment of the tweets with emoticons. built artificial vectors for the learning ma-
We have observed that the information of chine. There is one artificial vector per senti-
a sentence is mainly located in a few key- ment analysis category (positive+, positive,
words. These keywords have a different type negative, negative+, none), which has been
according to the information we are inter- built counting one occurrence of those words
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 118
whose polarity and strength match with the which is the hashtag of the Barcelona
appropriate category. soccer team, it can almost doubtlessly
be classified in a soccer tweet.
2.4 Valence Shifters
• References (a “@” followed by the user-
There are two different aspects of valence
name of the referenced user). It is used
shifting that are used in our methods. First,
to reference other Twitter users. Any
we may take into account negations that
user can be referenced. For example,
can invert the sentiment of positive and neg-
@username means the tweet is answer-
ative terms in a tweet. Second, we may
ing a tweet of username, or referring to
take weighted words, which are intensifiers
his/her. References are interesting be-
or weakeners, into account. Whether these
cause some users appear more frequently
cases are processed is controlled by the flags
in certain topics and will more likely
Negation and Weight (see Table 1).
tweet about them. A similar behaviour
Negations are words that reverse the senti-
can be found for sentiment.
ment of other words. For example, in the sen-
tence La pelicula no es buena (“The movie • Links (a URL). Because of the charac-
is not good”), the word buena is positive ter limitation of the tweets, users often
whereas it should be negative because of the include URLs of webpages where more
negation no. The way we process negations details about the message can be found.
is as follows. Whenever a negative word is This may help obtaining more context,
found, the sign of the 3 terms that follow it specially for topic detection.
is reversed. This allows us to differentiate a
In our algorithms, we have the possibil-
positive buena from a negative buena. The
ity of including hashtags and references as
area of effect of the negation is restricted to
attributes. This is controlled by the flags
avoid false negative words in more sophisti-
Hashtags and Author tags (see Table 1), re-
cated sentences.
spectively. We believe that these options are
Other valence shifters are words that
just a complement to previous methods and
change the degree of the expressed senti-
cannot be used alone, because we have found
ment. Examples of these are, for instance
that the number of hashtags and references
muy (“very”), which increases the degree,
in the tweets is too small.
or poco (“little”), which decreases it. These
We also provide the possibility of adding
words were included in the dictionary de-
to the terms of a tweet the terms obtained
veloped by Martı́n Garcı́a (Garcı́a, 2009) as
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 119
mine of information and search-engines can noticed that WEKA is more efficient when
be used to retrieve it. We have used this there is a smaller number of attributes. Sec-
technique to obtain many keywords and a ond, a smaller file avoids having lack of mem-
context from just a few words taken from ory issues: a great amount of memory, which
the tweets. For implementation reasons, Bing is proportional to the file size, is needed while
(Bin, 2012) was chosen for the process. The WEKA builds a model.
title and description of the 10 first results of Once the ARFF file is available, we are
the search are kept and processed in the same able to run all the available classification al-
way as the words of the tweet. We found gorithms that WEKA provides. However,
out that we have better results by search- due to time limit we will below concentrate
ing in Bing with only the nouns contained on only a few.
in the tweet; therefore, this is the option we
chose. The activation of this option is con- 3 Experimental Results
trolled with the flag Search engine. 3.1 Data Sets
2.6 Classification Methods We have used a corpus of tweets provided
for the TASS workshop at the SEPLN 2012
The Waikato Environment for Knowledge conference (TAS, 2012) as input data set.
Analysis (WEKA) (at University of Waikato, This set contains about 70,000 tweets pro-
2012) is a collection of machine learning al- vided as tuples ID, date, userID. Additionaly,
gorithms that can be used for classification over 7,000 of the tweets were given as a small
and clustering. The workbench includes al- training set with both topic (chosen politics,
gorithms for classification, regression, clus- economy, technology, literature, music, cin-
tering attribute selection and association rule ema, entertainment, sports, soccer or others)
mining. Almost all popular classification al- and sentiment (or polarity, chosen strong pos-
gorithms are included. WEKA includes sev- itive, positive, neutral, negative, strong nega-
eral Bayesian methods, decision tree learn- tive or none) classification. The data set was
ers, random trees and forests, etc. It also shuffled for the topics and sentiments to be
provides several separating hyperplane ap- randomly distributed. Due to the large time
proaches and lazy learning methods. taken by the experiments with the large data
Since we use WEKA as learning machine, set, most of the experiments presented have
it is worth knowing that each element in the used the small data set, using 5,000 tweets
learning machine data set will be called an for training and 2,000 for evaluation.
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.
XVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL 120
subset. The latter used Naive Bayes Multi- for each classification method a new configu-
nomial on data preprocessed by using the af- ration is created and tested with the param-
fective dictionary, filtering words and keep- eter settings that maximized the accuracy.
ing only adjectives and verbs (adjectives were The accuracy values computed in each of
stemmed, and verbs were lemmatized), using the configurations with the five methods with
the SMS dictionary, and processing negations the small data set are presented in Figures
at the sentence level. The accuracy reported 1 and 2. In both figures, Configuration 1
in the large data set was of 36.04%. is the basic configuration. The derived con-
Since the mentioned results were submit- figurations are numbered 2 to 9. (Observe
ted, we have worked on making the algorithm that each accuracy value that improves over
more flexible, so it is simpler to activate and the accuracy with the basic configuration is
deactivate certain processes. This has led to shown on boldface.) Finally, the last 5 con-
a slightly different behaviour from the sub- figurations of each figure correspond to the
mitted version, but we believe it has resulted parameters settings that gave highest accu-
in an improvement in accuracy. racy in the prior configurations for a method
(in the order Ibk, Complement Naive Bayes,
3.3 Process to Obtain the New Naive Bayes Multinomial, Random Commit-
Experimental Results tee, and SMO).
As mentioned, the algorithm used for ob-
taining the new experimental results, is more 3.4 Topic Estimation Results
flexible and can be configured with the pa- As mentioned, Figure 1 presents the accu-
rameters defined in Table 1. In addition, racy results for topic detection on the small
all classification methods of WEKA can be data set, under the basic configuration (Con-
used. Unfortunately, it is unfeasible to exe- figuration 1), configurations derived from this
cute all possible configurations with all pos- one by toggling one by one every parameter
sible classification methods. Hence, we have (Configurations 2 to 9), and the seemingly
made some decisions to limit the number of best parameter settings for each classification
experiments. method (Configurations 10 to 14). Observe
First, we have chosen only five clas- that there are no derived configuration with
sification algorithms from those provided the search engine flag set. This is because
by WEKA. In particular, we have chosen the ARFF file generated in that configuration
the methods Ibk, Complement Naive Bayes, after searching the web as described above
Copyright © 2012. Universitat Jaume I. Servei de Comunicació i Publicacions. All rights reserved.
Naive Bayes Multinomial, Random Commit- (even for the small data set) was extremely
tee, and SMO. This set tries to cover the large and the experiment could not be com-
most popular classification techniques. Sev- pleted
eral configurations of the parameters from The first fact to be observed in Figure 1
Table 1 will be evaluated with these 5 meth- is that Configuration 1, which is supposed
ods. to be similar to the one used for the sub-
Second, we have chosen for each of the mitted results, seems to have a better ac-
two problems (topic and sentiment) a basic curacy with some methods (more than 56%
configuration. In each case, the basic con- versus 45.24%). However, it must be noted
figuration is as close as possible to the con- that this accuracy has been computed with
figuration used to obtain the submitted re- the small data set (while the value of 45.24%
sults. (Since the algorithm has been mod- was obtained with the large one). A second
ified to add flexibility, the exact submitted observation is that in the derived configura-
configuration could not be used.) The rea- tions there is no parameter that by changing
son for choosing these as basic configurations its setting drastically improves the accuracy.
is that they were found to be the most ac- This also applies to the rightmost configu-
curate among those explored before submis- rations, that combine the best collection of
sion. Then, starting from this basic config- parameter settings.
uration a sequence of derived configurations Finally, it can be observed that the largest
are tested. In each derived configuration, one accuracy is obtained by Configuration 2 with
of the parameters of the basic configuration Complement Naive Bayes. This configura-
was changed, in order to explore the effect of tion is obtained from the basic one by sim-
that parameter in the performance. Finally, ply removing the word filter that allow only
<i>XVIII Congreso de la Asociación Española para el Procesamiento del Lenguaje Natural</i>, edited by Llavorí, Rafael Berlanga, et al., Universitat Jaume I. Servei de
Comunicació i Publicacions, 2012. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/bibliotecauptsp/detail.action?docID=4184256.
Created from bibliotecauptsp on 2019-09-28 09:35:08.