Parlor Slides

GENDER/GENRE:
GENDER DIFFERENCES IN
PROFESSIONAL WRITING
Image: flickr/srqpix CC BY 2.0
Brian N. Larson
29 October 2014
Current Research in Writing Studies
Housekeeping
www.Rhetoricked.com (these slides + some
additional)
Communicate with me:
@Rhetoricked
Larson@Rhetoricked.com
Research supported by:

Graduate Research Partnership Program fellowship (U of M
CLA), 2012
James I. Brown Summer Research Fellowship, 2014
www.Rhetoricked.com
@Rhetoricked
Gender, sex,
and research constructs
When I talk about my own data, Ill refer to
Gender F authors/writers
Gender M authors/writers
These categories may or may not

correspond to other researchers
{woman, female, feminine}
{man, male, masculine}
Thats the subject of another talk (or for

Q&A)
www.Rhetoricked.com
@Rhetoricked
Many researchers have asked

Do men and women communicate
differently?
Much work inspired by Robin Lakoff (1975)
Scholarly and popular works by Deborah
Tannen (e.g. 1990[2001]) and others
Much of this research in oral/face-to-face
communication
www.Rhetoricked.com
@Rhetoricked
Writing:
Process and product
In writing studies, we can (roughly)
divide process and product
Do men and women produce writing using
different processes?
Is the writing they produce distinguishable
based on author gender?
www.Rhetoricked.com
@Rhetoricked
Previous studies:
Process research
Focus on interpersonal communications
in mixed-gender contexts
Lay, 1989 (Schuster); Rehling, 1996; Raign
& Sims, 1993; Ton & Klecun, 2004; Wolfe
& Alexander, 2005; Brown & Burnett, 2006;
Wolfe & Powell, 2006, 2009.
www.Rhetoricked.com
@Rhetoricked
Previous studies:
Product research
In technical and professional
communication
Sterkel, 1988 (20 stylistic chars)
Smeltzer & Werbel, 1986 (16 stylistic and
evaluative measures)
Tebeaux, 1990 (quality of responses)
Allen, 1994 (markers of authoritativeness)
Manual methods, small samples

www.Rhetoricked.com
@Rhetoricked
Enter computational methods

Natural language processing (NLP)
Allows processing of large quantities of
text data
Study that attracted my attention
Koppel, Argamon & Shimoni, 2002
(machine-learning algorithms)
Argamon et al., 2003 (statistical analysis)
Ill focus on Argamon et al. in this talk
www.Rhetoricked.com
@Rhetoricked
Argamon et al. 2003

Used 500 published texts from BNC
Mean 34,000 words (tokens) per text
Statistical analysis showed
correspondence to Bibers (1995)
informational/involved dimension
www.Rhetoricked.com
@Rhetoricked
Gender in computer-mediated
communication (CMC)
CMC popular for NLP studies
Data are readily available
Data are voluminous
Examples
Herring & Paolillo, 2006 (blog posts, stat analysis)
Yan & Yan, 2006 (blog posts, MLA analysis)
Argamon et al., 2007 (blog posts, MLA analysis)
Rao et al., 2010 (Twitter, MLA analysis)
Burger et al., 2011 (Twitter, MLA analysis)
www.Rhetoricked.com
@Rhetoricked
Rationale:
Why is the question important?
Lend support to one or more theories of
gender
Two cultures (Maltz & Borker, 1982)
Standpoint (Barker & Zifcak, 1999)
Performative (Butler 1993, 1999, 2004)
Others
Sorting out methodological problems,

particularly use of gender as a variable
www.Rhetoricked.com
@Rhetoricked
Study design goals

Research questions
Did Gender F and Gender M writers in a disciplinary
genre in which they are being trained use lexical and
quasi-syntactic stylistic features with relative
frequencies that varied with their genders?
If so, did the differences appear in interpretable
patterns?
Examine a corpus of texts

All of the same genre
Where we can be confident of single authorship
Where author gender is self-identified
www.Rhetoricked.com
@Rhetoricked
Data collection
Major writing project at end of first year of
law school
Students address hypothetical problem
(writing in same genre)
Students not allowed to collaborate
Plagiarism difficult (but still possible)
Students self-identified gender*

193 texts (mean word tokens = 3764)
*This study IRB-approved (UMN Study #1202E10685)
www.Rhetoricked.com
@Rhetoricked
Text genre: Memorandum

regarding motion to dismiss
Written to hypothetical court
Supporting or opposing a motion before
the court
High-level organization is formulaic
www.Rhetoricked.com
@Rhetoricked
r
t
www.Rhetoricked.com
@Rhetoricked
Memorandum Sections
Caption**
Introduction/summary*
Facts
Legal standard of review*
Argument
Conclusion
Signature block**
* Not always present.
**I did not analyze (content is highly formulaic)
www.Rhetoricked.com
@Rhetoricked
Feature (variable)
selection
For now, those of Argamon et al. 2003
Relative frequencies of
429 function words (Argamon used 405)
45 parts of speech from the Penn
Treebank tagset (Argamon used 76 BNC
POS tags)
100 common part-of-speech bigrams
500 common POS trigrams
www.Rhetoricked.com
@Rhetoricked
Part-of-speech tags?
Bigrams & trigrams?
First, tokenize each sentence
(automated):
My aunts pen is on the table.
www.Rhetoricked.com
@Rhetoricked
POS tags
Purple words are function words
Tag the parts of speech (automated)

Then calculate relative frequency of
function words and POS tags
(automated)
www.Rhetoricked.com
@Rhetoricked
POS bigrams and trigrams

A bigram or trigram is a 2- or 3-token
window on the sentence.
Automated calculation
www.Rhetoricked.com
@Rhetoricked
Feature (variable)
selection
First-person pronouns (total)
Singular: I, me, my, mine, myself.
Plural: We, us, our, ours, ourselves.
Second-person pronouns: You, your, yours, yourself.

Third-person pronouns (total)
Singular (total)
Feminine: She, her, hers, herself.
Masculine: He, him, his, himself.
Plural: They, them, their, theirs, themselves.
Contractions: Including all instances of nt, ld, ve, etc.

All relative frequencies calculated (automated)
www.Rhetoricked.com
@Rhetoricked
Each students text is

represented by variables
A series of numerical values expressing each
feature (variable), i.e., the relative frequency of:
Function words / total tokens
POS tags / total tokens
Bigrams / total bigrams*
Trigrams / total trigrams*
Pronouns
Automated calculation
*Multiplied by a factor.
www.Rhetoricked.com
@Rhetoricked
t
T
www.Rhetoricked.com
@Rhetoricked
Example 1
Tokens of the function word-type all in

paper 1007 account for less than 7/100
of 1% of all tokens in that paper.
www.Rhetoricked.com
@Rhetoricked
Example 2
Bigrams made up of
a plural common
noun (NNS) followed
by a coordinating
conjunction (CC)
accounted for 1/10
of 1% of bigrams in
paper 1009.
www.Rhetoricked.com
@Rhetoricked
Mean relative frequencies

calculated
For each feature
Mean frequency (SD) for Gender F authors
Mean frequency (SD) for Gender M
authors
Statistical significance assessed with
Mann-Whitney U test (expressed as pvalue)
A priori threshold for significance: 0.05

www.Rhetoricked.com
@Rhetoricked
What Argamon et al. 2003

found: Men
Males used significantly more
Determiners, a, the, these
Determiner+noun bigrams: the books, a
dog, these Tories
Attributive-adjective+noun bigrams: great
leaders, old form
Prepositions: at, from, for, of, behind
Its
www.Rhetoricked.com
@Rhetoricked
What Argamon et al. 2003

found: Women
Females used significantly more
Pronouns (all)
1st person sing.: I, my, mine
2nd person: you, yours
3rd person: they, them, theirs
Present tense verbs: walks, eradicates

Contractions
Negation with not
www.Rhetoricked.com
@Rhetoricked
Informational/involved
Biber (1995) labeled this a dimension of
register variation after doing cluster
analyses on frequencies to identify covarying features as dimensions
Consistent with popular conceptions
and works such as Tannen (1990
[2001]) that characterize women as
affiliative and men as informative
www.Rhetoricked.com
@Rhetoricked
What I found:
Nouns & determiners
Nouns
Some categories showed non-significant
Gender F preference (weakly contradicting
Argamon)
Determiners and determiner+noun

Only significant: DET-NNP (proper noun)
But all showed non-significant Gender M
preference
(Overall, weakly supporting Argamon)
www.Rhetoricked.com
@Rhetoricked
What I found:
Adjectives & prepositions
Attributive-adjective+noun
Non-significant Gender M preference
(weakly supporting Argamon)
Prepositions
Non-significant Gender M preference
(weakly supporting Argamon)
www.Rhetoricked.com
@Rhetoricked
What I found:
Pronouns (i.e., a mess)
All pronouns: Non-significant Gender M
preference (weakly contradicting Argamon)
1st p sing., 2nd p., 3rd p. overall, 3rd s. fem: Nonsignificant Gender F preference (weakly
supporting Argamon)
3rd p. plural: Significant Gender M preference
(contradicting Argamon)
Its: Non-significant Gender F preference
(weakly contradicting Argamon)
www.Rhetoricked.com
@Rhetoricked
What I found:
Verbs, contractions, not
Present-tense verbs
Significant Gender M preference for 3rd p.
singular (contradicting Argamon)
Non-significant Gender M preference for the
rest (weakly contradicting Argamon)
Contractions: Non-significant Gender F

preference (weakly supporting Argamon)
Negation with not: (weakly supporting
Argamon)
www.Rhetoricked.com
@Rhetoricked
The take-away?
Statistics: The non-significant differences
should probably be regarded as nonsignificant
In that case, M-informational/F-involved is not
confirmed in this study
If the non-significant differences are real,

evidence for M-informational/F-involved is
still mixed, especially in pronouns and
present-tense verbs
www.Rhetoricked.com
@Rhetoricked
Explaining the findings with

relevance theory
Relevance theory (Sperber & Wilson 1995)
recognizes the effects of habituation
If boys and girls are acculturated to writing
in certain genres and certain topics in their
youths . . .
. . . they may unconsciously habituate to
certain (appropriate) word choices
. . . and may not be completely free to
vary their word choices consciously later.
www.Rhetoricked.com
@Rhetoricked
Situating the findings within

gender & language theories
Findings weakly support or contradict
Two sociolinguistic cultures view (Maltz &
Borker 1982; Tannen 1990 [2001])
Intersectionality/performativity views (Barker &
Zifcak 1999; Butler; many others)
Some gendered linguistic habits appeared

to resist retraining and conscious efforts to
conform to register conventions . . .
. . . others were apparently overcome.
www.Rhetoricked.com
@Rhetoricked
Im left with more questions

than answers . . .
But you are entitled to ask some
questions now . . .
www.Rhetoricked.com
@Rhetoricked
THANK YOU!
www.Rhetoricked.com (these slides + some
additional)
Communicate with me:
@Rhetoricked
Larson@Rhetoricked.com
Research supported by:

Graduate Research Partnership Program fellowship (U of M
CLA), 2012
James I. Brown Summer Research Fellowship, 2014
www.Rhetoricked.com
@Rhetoricked
Works cited
Allen, J. (1994). Women and authority in business/technical
communication scholarship: An analysis of writing... Technical
Communication Quarterly, 3(3), 271.
Argamon, S., Koppel, M., Fine, J., & Shimoni, A. R. (2003). Gender,
genre, and writing style in formal written texts. Text, 23(3), 321346.
Argamon, S., Koppel, M., Pennebaker, J. W., & Schler, J. (2007).
Mining the Blogosphere: Age, gender and the varieties of selfexpression. First Monday, 12(9). Retrieved from http://firstmonday.org/
issues/issue12_9/argamon/index.html
Armstrong, C. L., & McAdams, M. J. (2009). Blogs of information: How
gender cues and individual motivations influence perceptions of
credibility. Journal of Computer-Mediated Communication, 14(3), 435
456.
Barker, R. T., & Zifcak, L. (1999). Communication and gender in
workplace 2000: creating a contextually-based integrated paradigm.
Journal of Technical Writing & Communication, 29(4), 335.
Biber, D. (1995). Dimensions of register variation: a cross-linguistic
comparison. Cambridge;;New York: Cambridge University Press.
Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing
with Python (1st ed.). OReilly Media.
Brown, S. M., & Burnett, R. E. (2006). Women hardly talk. Really!
Communication practices of women in undergraduate engineering
classes (pp. T3F1T3F9). Presented at the 9th International
Conference on Engineering Education, San Juan, Puerto Rico:
International Network for Engineering Education & Research. Retrieved
from http://ineer.org/Events/ICEE2006/papers/3219.pdf
Burger, J., Henderson, J., Kim, G., & Zarrella, G. (2011). Discriminating
gender on Twitter. Bedford, MA: MITRE Corporation. Retrieved from
http://www.mitre.org/work/tech_papers/2011/11_0170/
Butler, J. (1993). Bodies that matter: on the discursive limits of sex.

New York: Routledge.
Butler, J. (1999). Gender trouble. New York: Routledge.
Butler, J. (2004). Undoing gender. New York: Routledge.
Cunningham, H., Maynard, Diana, Bontcheva, K., Tablan, V., Aswani,
N., Roberts, I., Peters, W. (2012, December 28). Developing
Language Processing Components with GATE Version 7 (a User
Guide). GATE: General Architecture for Text Engineering. Retrieved
January 1, 2013, from http://gate.ac.uk/sale/tao/split.html
Cunningham, H., Tablan, V., Roberts, A., & Bontcheva, K. (2013).
Getting More Out of Biomedical Documents with GATEs Full Lifecycle
Open Source Text Analytics. PLoS Computational Biology, 9(2),
e1002854.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., &
Witten, I. H. (2009). The WEKA Data Mining Software: An Update.
SIGKDD Explorations, 11(1), 1018.
Herring, S. C., & Paolillo, J. C. (2006). Gender and genre variation in
weblogs. Journal of Sociolinguistics, 10(4), 439459.
Koppel, M., Argamon, S., & Shimoni, A. R. (2002). Automatically
categorizing written texts by author gender. Literary and Linguistic
Computing, 17(4), 401 412.
Lakoff, R. T. (1975/2004). Language and Womans Place: Text and
Commentaries. (M. Bucholtz, Ed.) (Revised and expanded ed.). New
York: Oxford University Press.
www.Rhetoricked.com
@Rhetoricked
Works cited
Lay, M. M. (1989). Interpersonal conflict in collaborative writing: What
we can learn from gender studies. Journal of Business and Technical
Communication, 3(2), 528.
Maltz, D. N., & Borker, R. (1982). A cultural approach to male-female
miscommunication. In J. J. Gumperz (Ed.), Language and social
identity (pp. 196216). Cambridge U.K.: Cambridge University Press.
Pakhomov, S. V., Hanson, P. L., Bjornsen, S. S., & Smith, S. A. (2008).
Automatic classification of foot examination findings using clinical notes
and machine learning. Journal of the American Medical Informatics
Association, 15, 198202.
Raign, K. R., & Sims, B. R. (1993). Gender, persuasion techniques, and
collaboration. Technical Communication Quarterly, 2(1), 89104.
Rao, D., Yarowsky, D., Shreevats, A., & Gupta, M. (2010). Classifying
latent user attributes in Twitter. In Proceedings of the 2nd international
workshop on Search and mining user-generated contents (pp. 3744).
Toronto, ON, Canada: ACM.
Rehling, L. (1996). Writing together: Genders effect on collaboration.
Journal of Technical Writing and Communication, 26(2), 163176.
Smeltzer, L. R., & Werbel, J. D. (1986). Gender differences in
managerial communication: Fact or folk-linguistics? Journal of Business
Communication, 23(2), 4150.
Sperber, D., & Wilson, D. (1995). Relevance: Communication and
Cognition (2nd ed.). Wiley-Blackwell.
Sterkel, K. S. (1988). The relationship between gender and writing style
in business communications. Journal of Business Communication,
25(4), 1738.
Tannen, D. (2001). You Just Dont Understand: Women and Men in
Conversation. William Morrow Paperbacks.
Tebeaux, E. (1990). Toward an understanding of gender differences in
written business communications: A suggested perspective for future
research. Journal of Business and Technical Communication, 4(1), 25
43.
Tong, A., & Klecun, E. (2004). Toward accommodating gender

differences in multimedia communication. Professional Communication,
IEEE Transactions on, 47(2), 118129.
Wolfe, J., & Alexander, K. P. (2005). The computer expert in mixedgendered collaborative writing groups. Journal of Business and
Technical Communication, 19(2), 135170.
Wolfe, J., & Powell, B. (2006). Gender and expressions of
dissatisfaction: A study of complaining in mixed-gendered student work
groups. Women & Language, 29(2), 1320.
Wolfe, J., & Powell, E. (2009). Biases in interpersonal communication:
How engineering students perceive gender typical speech acts in
teamwork. Journal of Engineering Education, 98(1), 516.
Yan, X., & Yan, L. (2006). Gender classification of weblog authors. In
AAAI Spring Symposium: Computational Approaches to Analyzing
Weblogs (pp. 228230).
www.Rhetoricked.com
@Rhetoricked

Parlor Slides

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Parlor Slides

Caricato da

Copyright:

Formati disponibili

GENDER/GENRE:

Image: flickr/srqpix CC BY 2.0

Research supported by:

These categories may or may not

Thats the subject of another talk (or for

Many researchers have asked

Manual methods, small samples

Enter computational methods

Argamon et al. 2003

Sorting out methodological problems,

Study design goals

Examine a corpus of texts

Students self-identified gender*

Text genre: Memorandum

Tag the parts of speech (automated)

POS bigrams and trigrams

Second-person pronouns: You, your, yours, yourself.

Plural: They, them, their, theirs, themselves.

Contractions: Including all instances of nt, ld, ve, etc.

Each students text is

Tokens of the function word-type all in

Mean relative frequencies

A priori threshold for significance: 0.05

What Argamon et al. 2003

What Argamon et al. 2003

Present tense verbs: walks, eradicates

Determiners and determiner+noun

Contractions: Non-significant Gender F

If the non-significant differences are real,

Explaining the findings with

Situating the findings within

Some gendered linguistic habits appeared

Im left with more questions

Research supported by:

Butler, J. (1993). Bodies that matter: on the discursive limits of sex.

Tong, A., & Klecun, E. (2004). Toward accommodating gender

Potrebbero piacerti anche