Sei sulla pagina 1di 103

The University of Queensland

Faculty of Business, Economics & Law


Department of Commerce

Information Request Ambiguity and End User Query

Performance: Theory and Empirical Evidence

A Thesis submitted to the Department of Commerce, the University of


Queensland, in partial fulfilment of the requirements for the degree of
Master of Information Systems.

By Micheal Axelsen

15th June 2000

Supervisor: Dr Paul Bowen


Acknowledgments

I wish to express my appreciation and thanks to my supervisor, Dr Paul Bowen, for his

assistance, advice, and patience in the preparation of this thesis. To my mother I offer thanks

for making it all possible. I also express sincere gratitude to my wife, Leeanne Klan, whose

obstinate patience continues to assist in putting the world in focus.

I also thank workshop participants at Nanyang Technological University in Singapore for

their comments and contributions to this thesis.

i
Abstract

The increasing reliance of organisations on information technology and the persistent

shortage of IT/IS professionals requires end users to satisfy many information requests by

querying complex information systems. Because many business decisions are now based on

the results of the end users' queries, information request ambiguity has extensive

ramifications for business practices. Where the queries do not match the requirements of the

information requests, the business decisions are likely to be fundamentally flawed.

This paper develops a theory of ambiguity in information requests and reports the results of

an initial empirical investigation of that theory. The theory identifies seven ambiguities:

lexical, syntactical, inflective, pragmatic, extraneous, emphatic, and suggestive. A laboratory

experiment with sixty-six participants was used to investigate the empirical effect of

ambiguity on end user query performance. End user query performance was measured by the

number of total errors in the proposed solution, the time taken to complete the solution, and

the end user's confidence in the solution.

The results indicate that ambiguity significantly degrades end user query performance. The

seven types of ambiguity were analysed to determine their individual effects on end user

query performance. Actual (pragmatic, extraneous) and imaginary (emphatic, suggestive)

ambiguities show significant relationships with total errors and duration. In general, potential

(lexical, syntactical, and inflective) ambiguities were not significantly associated with total

errors or end user confidence. The results should have important implications for consulting

firms, for organisations with ad hoc work groups, and for entities that make extensive use of

electronic mail for information requests.

ii
Table of Contents

1. Introduction............................................................................................................................. 1

2. Information Request Ambiguity and End User Query Performance ..................................... 3


2.1 .A Theoretical Model of Information Request Ambiguity .......................................................... 3
2.2 .The Nature of Ambiguity ......................................................................................................... 5
2.2.1 Potential Ambiguity ..................................................................................................... 7
Lexical Ambiguity........................................................................................................... 7
Syntactical Ambiguity ..................................................................................................... 8
Inflective Ambiguity........................................................................................................ 9
2.2.2 Actual Ambiguity ...................................................................................................... 10
Pragmatic Ambiguity ..................................................................................................... 11
Extraneous Ambiguity ................................................................................................... 12
2.2.3 Imaginary Ambiguity................................................................................................. 14
Emphatic Ambiguity...................................................................................................... 14
Suggestive Ambiguity.................................................................................................... 15
2.2.4 Ambiguity in Practice ................................................................................................ 17
2.3 .Task Complexity ................................................................................................................... 18
2.4 .Theoretical Model Summary.................................................................................................. 19
3. Methodology .......................................................................................................................... 20
3.1 .Experimental Design ............................................................................................................. 20
3.2 .Experiment Participants ......................................................................................................... 21
3.3 .Assessment of Participant Responses ..................................................................................... 21
4. Results and Discussion .......................................................................................................... 23
4.1 .Overview of Experimental Results ......................................................................................... 23
4.2 .Regression Analysis............................................................................................................... 27
4.3 .Ambiguity Treatment Multiple Linear Regression Model Results ........................................... 29
4.4 .Multiple Linear Regression Model: Seven Types of Ambiguity .............................................. 31
4.5 .Summary of Results............................................................................................................... 32
4.5.1 Potential Ambiguity ................................................................................................... 34
4.5.2 Actual Ambiguity ...................................................................................................... 35
4.5.3 Imaginary Ambiguity................................................................................................. 36
4.5.4 Complexity................................................................................................................ 37
5. Implications For Business Practice ....................................................................................... 38
5.1.1 Electronic Mail .......................................................................................................... 38
5.1.2 Personnel Turnover and Work Teams......................................................................... 39
6. Contributions, Limitations, and Future Research ................................................................ 41
6.1 .Research Contributions .......................................................................................................... 41
6.2 .Research Limitations ............................................................................................................. 41
6.3 .Future Research ..................................................................................................................... 42
References....................................................................................................................................... 44

Appendix A: Experiment Information Requests and Model Answers .......................................... 47

Appendix B: Experiment Instruction Sheet ................................................................................... 52

Appendix C: Command Interpreter Unix Shell Script .................................................................. 58

Appendix D: Experiment Entity-Relationship Diagram ............................................................... 65

Appendix E: Experimental Design ................................................................................................ 68

iii
Appendix F: Error Marking Sheets ............................................................................................... 72

Appendix G: Annotated Corrected Participant Response ............................................................. 75

Appendix H: Pearson Correlation Matrix of Variables................................................................ 77

Appendix I: Analysis of Ambiguity's Effect On Error Type ........................................................ 78

Appendix J: Seven Ambiguity Types Question Assessment Ratings ............................................ 84

Appendix K: Ambiguity Assessment Instrument .......................................................................... 85

Appendix L: Internal Validity of the Experiment ........................................................................ 94

iv
Figures

Figure 1 Types of Ambiguity (adapted from Walton 1996) 7

Figure 2 The Theoretical Model of Ambiguity, Complexity, and End User Query Performance 19

Figure 3 Depicting graphically the relationship between the treatment received (ambiguous or 25
clear information request) and the total errors in the participant's response.

Figure 4 Depicting graphically the relationship between the treatment received (ambiguous or 26
clear information request) and the duration taken for the participant to prepare the
response.

Figure 5 Depicting graphically the relationship between the treatment received (ambiguous or 26
clear information request) and the participant's confidence in the response.

Tables

Table 1 Summary and Examples of the Seven Types of Ambiguity in Natural Language 17
Information Requests

Table 2 Participant Demographic Information and Descriptive Statistics: Course Background 23


of Group A and Group B

Table 3 Participant Demographic Information and Descriptive Statistics: Academic Record of 23


Group A and Group B

Table 4 Participant Demographic Information and Descriptive Statistics: Participant Age in 24


Group A and Group B

Table 5 Comparative Statistics for all Participant Responses Grouped by Question (Q) and 25
Treatment (T). Note that for T, a = ambiguous, c = clear

Table 6 Confidence Rating Transformation to a Numerical Scale 28

Table 7 Ambiguity Assessment Scale for the analysis of the Seven Ambiguity Types 29
Regression Model

Table 8 Regression Analysis Results for the General Ambiguity Regression Model 30

Table 9 Regression Analysis Results for the Seven Ambiguity Types Regression Model 31

Table 10 Summary of Analysis' Support for Hypotheses 32

Table 11 Participant Strata Classes 69

v
1. Introduction

Keen (1993) predicts that innovative applications of information technology will change the

competitive landscape to such an extent that fifty percent of companies in some industries

may not survive the next decade. This rise of the importance of information technology

innovation and application has lead to the increased need for relevant, timely information at

the point where that information is used and understood (Conger 1994; Delligatta and

Umbaugh 1994; Nath and Lederer 1996).

The demand for information system (IS) professionals vastly overwhelms the available

supply for both now and the foreseeable future (Freeman et al. 2000; Rosenthal and

Jategaonkar 1995; Australian Bureau of Statistics 1997). Hence, the use of computerised

information systems by end users has become compulsory in most business organisations

(Cardinali 1992; Athey and Wickham 1995-1996). To provide appropriate, relevant

information requires identifying and eliminating ambiguities in communication between the

stakeholders or managers requesting information, and the end users querying the information

systems.

Traditional structured methodologies reduce ambiguity at the expense of timeliness,

flexibility, and learning. The insights that end users can achieve during interactive, iterative

query sessions are also of benefit. The need for timeliness, flexibility, learning and end user

insights, as well as the shortage of IS professionals, have lead to the general decline of

structured reports (Ryan 1993). The use of ad hoc and iterative end user reports has

increased (Tayntor 1994). Nonetheless, many end users now use more formalised processes

in developing their reports than previously (Conger 1994; Tayntor 1994).

1
Information request ambiguity has potentially real and large impacts on business

organisations. An ambiguous information request can result in a report that, although it

appears acceptable to the person making the information request, does not contain the desired

information. If that wrong report is then used to make business decisions that the correct

report would not have supported, then information request ambiguity can cause substantial

negative impacts.

This paper develops a theory of the impact of ambiguity in information requests on end user

query performance, and tests that theory empirically. It empirically examines the strength

and direction of the relationships between ambiguity types (lexical, syntactical, inflective,

pragmatic, extraneous, emphatic, and suggestive), complexity, and end user query

performance. The current study extends previous work (Suh and Jenkins 1992; Borthick et

al. 1997; Rho and March 1997; Borthick et al. 2000) and builds upon the theory of end users'

query performance in the tradition of Dubin (1978).

2
2. Information Request Ambiguity and End User Query Performance

Different forms of ambiguity can be present in a natural language information request. The

primary aim of this research is to explore the impact of ambiguity on end user query

performance. This chapter develops a theory of the relationship between information request

ambiguity and end user query performance.

2.1 A Theoretical Model of Information Request Ambiguity

The development of an accurate SQL query by an end user depends on the user's knowledge

of the information needed, the database structure, and the query language (Ogden et al. 1986).

A lack of skill in any of these three domains will lead to inaccurate SQL queries (Ogden et al.

1986).

A natural language information request requires end users to transform the natural language

constructs into the query components consisting of lexical items (Katzeff 1990). End users

must conceptualise the information requirement and then mentally map this conceptualisation

to their understanding of the database structure. Reisner (1977) proposed a template model

for the manner in which users create SQL queries from a natural language information

request. Each query's operator components (Halstead 1977) are drawn from a set of known

query language components to address the requirements of the natural language information

request.

Ambiguity affects the user's interpretation of the information needed. Because information

requests are expressed using a natural language, they are ambiguous and uncertain. End users

3
must interpret and analyse the information requests to develop queries that meet the

requestors' needs. The end users' uncertainty in determining the required response affects the

required cognitive effort because multiple interpretations of the actual information required

may be legitimately constructed (Almuallim et al. 1997).

The impact of natural language's seven types of ambiguity has not previously been examined

in the context of end user query performance. These seven types of ambiguity are lexical,

syntactical, inflective, pragmatic, extraneous, emphatic and suggestive (Walton 1996; Fowler

and Aaron 1998). These ambiguities affect the number of legitimate interpretations of the

natural language statement of the information request. The information request has

"multiplicity of meaning" (Walton 1996).

Tasks that are more complex require increased cognitive effort (Campbell 1988). In the

context of database queries, task complexity generally negatively impacts end user query

performance (Borthick et al. 1997; Borthick et al. 2000). Task complexity is included in this

research to control for complexity's established impact on end user query performance.

Query performance can be measured on a number of dimensions including correctness, time

required, and confidence.

Hence, the following hypotheses are proposed:

H1a: Higher ambiguity in the information request leads to an increase in the total errors

in the query formulation.

H1b: Higher ambiguity in the information request leads to an increase in the time taken

to complete the query formulation.

4
H1c: Higher ambiguity in the information request leads to lower end user confidence in

the accuracy of the query formulation.

2.2 The Nature of Ambiguity

Ambiguity is an inherent property of all natural languages, including English (Jespersen

1922; Williamson 1994). Absolute precision of a language is pragmatically undesirable,

because the language is unable to adapt to new concepts (Williamson 1994). The

communication needed to ensure effective and efficient report production, however, requires

complete clarity. Hence, a tension exists between the natural language's need for flexibility

in the long term and the need for precision in the short term. Natural language is at once both

dysfunctional and poorly adapted to the functions language needs to perform, yet flexible and

broad-based such that it is useable in practice (Chomsky 1990).

Interest in linguistic ambiguity has an extensive history, and has been recognised as a

separate branch of study since at least Aristotle's time (Kooij 1971). Aristotle noted that

language must be ambiguous, as a language has limited words but an infinite number of

things and concepts to which those words must apply (Kooij 1971).

Russell (1923) recognised that all natural languages are vague and ambiguous. Excluding the

realm of mathematical symbolism, constructing completely unambiguous expressions is not

possible with the syntax and vocabulary tools available within natural languages (Williamson

1994). To endure and survive, language requires the flexibility to communicate new

concepts. Ambiguity necessarily derives from the flexibility of natural language.

5
Kooij (1971) states that ambiguity arises where a sentence can be interpreted in more than

one way. Similarly, Walton (1996) considers a sentence or statement to be more ambiguous

as the number of legitimate interpretations of the sentence (or paragraph) increase.

Ambiguity implies multiplicity of meaning (Walton 1996).

In classical analysis, the multiplex (Latin for "multiple meaning") categorisation of

Alexander of Aphrodisius (Hamblin 1970) suggests a basis for the identification of categories

of ambiguity. In classical literature, Alexander of Aphrodisius identified three categories of

ambiguity: potential, actual, and imaginary. Walton (1996) adapts this classical multiplex

categorisation to his identified types of ambiguity.

Walton (1996) identifies six classical types of ambiguity in natural language: lexical,

syntactical, inflective, pragmatic, emphatic, and suggestive. In addition to Walton's (1996)

taxonomy, extraneous information and noise in the communication can also be a source of

ambiguity. Extraneous ambiguity arises where the communication is not parsimonious, or

the communication includes information that is not directly relevant to the message being

communicated (Fowler and Aaron 1998). Extraneous ambiguity is an actual ambiguity

within the Walton (1996) taxonomy.

Each ambiguity type can be independently present within the communication. Walton's

(1996) modified taxonomy and model of ambiguity is presented in Figure 1.

6
Ambiguity

Multiplex
Potential Actual Imaginary Categories of
Ambiguity

Types of
Lexical Inflective Pragmatic Extraneous Emphatic Suggestive Ambiguity

Syntactical

Figure 1
Types of Ambiguity (adapted from Walton 1996)

2.2.1 Potential Ambiguity

Potential ambiguity arises when a term or a sentence is ambiguous in and of itself, for

example, before its use in the context of a sentence or paragraph. Three types of ambiguity

are categorised as potential ambiguity: lexical, syntactical, and inflective.

Lexical Ambiguity

Lexical ambiguity is the most commonly known form of ambiguity (Reilly 1991; Walton

1996). It occurs when words have more than one meaning as commonly defined and

understood. Considerable potential ambiguity arises when a word with various meanings is

used in a statement of information request. For example, "bank" may variously mean the

"bank" of a river (noun), to "bank" as related to aeroplane or a roller-coaster (verb), a savings

"bank" (noun), to "bank" money (verb), or a "bank" of computer terminals (noun) (Turner

1987). Lexical ambiguity is often reduced or mitigated by the context of the sentence.

In the case of an information request, lexical ambiguity exists in the statement "A report of

our clients for our marketing brochure mail-out". The word "report" may have several

7
meanings, independent of its context. A gunshot report may echo across the hillside. A

student can report to the lecturer. A heavy report can be dropped on the foot. Although the

context may make the meaning clear, the lexical ambiguity contributes to the overall

ambiguity of the statement and increases cognitive effort.

The following hypotheses are proposed:

H2a: Higher lexical ambiguity in the information request leads to an increase in the total

errors in the query formulation.

H2b: Higher lexical ambiguity in the information request leads to an increase in the time

taken to complete the query formulation.

H2c: Higher lexical ambiguity in the information request leads to lower end user

confidence in the accuracy of the query formulation.

Syntactical Ambiguity

Syntactical ambiguity is a structural or grammatical ambiguity of a whole sentence that

occurs in a sub-part of a sentence (Reilly 1991; Walton 1996). Syntactical ambiguity is a

grammatical construct, and results from the difficulty of applying universal grammatical laws

to sentence structure. An example of syntactical ambiguity is "Bob hit the man with the

stick". This phrasing is unclear as to whether a man was hit with a stick, or whether a man

with a stick was struck by Bob. The context can substantially reduce syntactical ambiguity.

For example, knowing that either Bob, or the man, but not both, had a stick resolves the

syntactical ambiguity.

8
Comparing the phrase "Bob hit the man with the stick" to the analogous "Bob hit the man

with the scar" provides some insights. As a scar is little suited to physical, violent use, the

latter formulation clearly conveys that the man with the scar was struck by Bob (Kooij 1971).

In the case of an information request, syntactical ambiguity exists in the request "A report of

poor-paying clients and client managers. Determine their effect on our profitability for the

last twelve months." The request is syntactically ambiguous because the end user can

interpret "their" to mean the poor paying clients, the client managers, or both. Although the

context may reduce or negate the ambiguity, syntactically the request is ambiguous.

The following hypotheses are proposed:

H3a: Higher syntactical ambiguity in the information request leads to an increase in the

total errors in the query formulation.

H3b: Higher syntactical ambiguity in the information request leads to an increase in the

time taken to complete the query formulation.

H3c: Higher syntactical ambiguity in the information request leads to lower end user

confidence in the accuracy of the query formulation.

Inflective Ambiguity

As Walton (1996) notes, inflective ambiguity is a composite ambiguity, containing elements

of both lexical and syntactical ambiguity. Like syntactical ambiguity, inflective ambiguity is

grammatical in nature. Inflection arises where a word is used more than once in a sentence or

paragraph, but with different meanings each time (Walton 1996). An example of inflective

9
ambiguity is to use the word "scheme" with two different meanings in the fallacious

argument, "Bob has devised a scheme to save costs by recycling paper. Therefore, Bob is a

schemer, and should not be trusted" (Ryle 1971; Walton 1996).

In the case of an information request, inflective ambiguity exists in the example, "A report

showing the product of our marketing campaign for our accounting software product".

Ambiguity derives from using the word "product" in two different senses in the one statement

(Walton 1996; Fowler and Aaron 1998).

The following hypotheses are proposed:

H4a: Higher inflective ambiguity in the information request leads to an increase in the

total errors in the query formulation.

H4b: Higher inflective ambiguity in the information request leads to an increase in the

time taken to complete the query formulation.

H4c: Higher inflective ambiguity in the information request leads to lower end user

confidence in the accuracy of the query formulation.

2.2.2 Actual Ambiguity

Actual ambiguity refers to ambiguity that occurs in the act of speaking. It arises when a word

or phrase, without variation either in itself or in the way the word is put forward, has different

meanings. The statement does not contain adequate information to resolve the ambiguity,

resulting in a number of legitimate interpretations. Two distinct types of ambiguity are

categorised as actual ambiguity: pragmatic and extraneous.

10
Pragmatic Ambiguity

Pragmatic ambiguity arises when the statement is not specific, and the context does not

provide the information needed to clarify the statement. Information is missing, and must be

inferred. An example of pragmatic ambiguity is the story of King Croesus and the Oracle of

Delphi (adapted from Copi and Cohen 1990):

"King Croesus consulted the Oracle of Delphi before warring with Cyrus of
Persia. The Oracle replied that, "If Croesus went to war with Cyrus, he would
destroy a mighty kingdom". Delighted, Croesus attacked Persia, and Croesus'
army and kingdom were crushed. Croesus complained bitterly to the Oracle's
priests, who replied that the Oracle had been entirely right. By going to war with
Persia, Croesus had destroyed a mighty kingdom - his own."

Pragmatic ambiguity arises when the statement is not specific, and the context does not

provide the information needed to clarify the statement (Walton 1996). The information

necessary to clearly understand the message is omitted. Due to the need to infer the missing

information, pragmatically ambiguous statements have multiple possible interpretations

(Walton 1996). Croesus interpreted the Oracle's statement as indicating his success in battle -

the response he desired. As noted by Hamblin (1970), Croesus' logical response to the

oracular reply would have been to immediately ask the Oracle, "Which kingdom?" Further

information is needed to resolve pragmatic ambiguity.

In the case of an information request, pragmatic ambiguity exists in the request for "A report

of all the clients for a department." The ambiguity is that the request does not refer to a

specific department. The end user could legitimately prepare a report for any department.

Further information is needed to resolve this actual ambiguity in this case.

11
The following hypotheses are proposed:

H5a: Higher pragmatic ambiguity in the information request leads to an increase in the

total errors in the query formulation.

H5b: Higher pragmatic ambiguity in the information request leads to an increase in the

time taken to complete the query formulation.

H5c: Higher pragmatic ambiguity in the information request leads to lower end user

confidence in the accuracy of the query formulation.

Extraneous Ambiguity

In contrast to pragmatic ambiguity, in which information necessary to clearly understand the

message is omitted, extraneous ambiguity arises from an excess of information. Clearer

communication arises where the minimally sufficient words needed to convey the message of

the statement are used (Fowler and Aaron 1998). Where more words are used than

necessary, or where unnecessary detail is provided in the communication that is not part of

the message, ambiguity arises. The excess detail obscures the essential message and

contributes to different emphases or interpretations.

The use of passive voice, vacuous words, or the repetition of phrases with the same meaning

all contribute to lack of clarity (Fowler and Aaron 1998). The use of clichés and the over-use

of figures of speech add volume to the statement, but add little or no meaning. Pretentious

and indirect writing also adds to the bulk of the statement, but without adding meaning.

Fowler and Aaron (1998) provide the following comparative example:

12
Pretentious: To perpetuate our endeavour of providing funds for our elderly citizens as
we do at the present moment, we will face the exigency of enhanced
contributions from all our citizens.

Revised: We cannot continue to fund Social Security and Medicare for the elderly
unless we raise taxes.

The extra volume contributes to vagueness in the first statement, and adds to the multiplicity

of legitimate interpretations of the statement. The first statement exhibits extraneous

ambiguity. The second statement communicates forcefully and concisely.

An example of extraneous ambiguity in an information request is "A report of all clients (and

their names and addresses only) for the Tax and Business Services department. Some of

those clients are our biggest earners, you know". The last sentence is extraneous, and

contains detail that is redundant, uninformative, or misleading relative to the fundamental

message. In information theoretic terms, extraneous ambiguity is "noise" in the

communication (Axley 1984; Eisenberg and Phillips 1991; Severin and Tankard 1997).

The following hypotheses are proposed:

H6a: Higher extraneous ambiguity in the information request leads to an increase in the

total errors in the query formulation.

H6b: Higher extraneous ambiguity in the information request leads to an increase in the

time taken to complete the query formulation.

H6c: Higher extraneous ambiguity in the information request leads to lower end user

confidence in the accuracy of the query formulation.

13
2.2.3 Imaginary Ambiguity

Imaginary ambiguity occurs when a word with a fixed meaning seems to have a different one.

Imaginary ambiguity derives from the optional interpretation that the recipient of the

communication places on the information received. Two distinct types of ambiguity can be

categorised as imaginary ambiguity: emphatic and suggestive.

Emphatic Ambiguity

The question of ambiguity deriving from accent, or emphasis in speaking, is an ancient one

(Hamblin 1970). When a phrasing is rendered in the written form, the verbal emphasis may

only be crudely indicated. Significant meaning and context is lost. Rescher (1964) provides

the following example of emphatic ambiguity:

The intended meaning of the democratic credo "Men were created equal" can be
altered by stressing the word "created" (implying "that's how men started out, but
they are no longer so").

The verbal emphasis creates an inference of meaning that is a legitimate interpretation of the

phrasing. That is, changes in intonation can yield different interpretations.

In the case of an information request, emphatic ambiguity occurs in the example information

request of "A report of our good clients". Ambiguity can derive from placing different

emphases on the words. Depending on the context or on emphasis used, "good clients" could

be legitimately interpreted to be clients that pay on time or clients that have the highest

dollar-value sales. Indeed, with an ironic emphasis on the word "good", this request could be

interpreted as a list of our worst clients - those that do not pay. The information necessary to

resolve the ambiguity is often difficult to convey using only printed media.

14
The following hypotheses are proposed:

H7a: Higher emphatic ambiguity in the information request leads to an increase in the

total errors in the query formulation.

H7b: Higher emphatic ambiguity in the information request leads to an increase in the

time taken to complete the query formulation.

H7c: Higher emphatic ambiguity in the information request leads to lower end user

confidence in the accuracy of the query formulation.

Suggestive Ambiguity

Despite the apparent clarity of the sentence in question, suggestive ambiguity creates diverse

implications and innuendos that can produce different implications (Walton 1996). Fischer

(1970) provides an example:

The First Mate of a ship docked in China returned drunk from shore leave, and
was unable to write up the ship's log. The displeased Captain completed the log,
adding, "The Mate was drunk all day". The next day, the now-sober Mate
challenged the Captain over the entry, as it would reflect poorly on him. The
Captain responded that the comment was true, and must stand. Whereupon the
mate added to that day's log, "The Captain was sober all day". In reply to the
Captain's challenge, the mate responded "the comment is true, and must stand"
(derived from Trow 1905, pp 14-15).

The phrase "The Captain was sober all day" contains suggestive ambiguity. As a further

example, the statement, "The President is now an honest man", is perfectly clear, and yet

considerable innuendo exists. The fact that the President's current honesty is worthy of

comment implies that the President was previously dishonest.

15
Both phrases are perfectly clear, and, indeed, true. However, considerable innuendo exists.

The fact that the Captain's sobriety, or the President's honesty, is singled out for special

comment implies that such a state of affairs is unusual (Walton 1996). The statements are

suggestively ambiguous.

In the case of an information request, an example of this ambiguity is, "A report of the clients

of this accounting practice that have lodged taxation returns in the past five years in

accordance with the requirements of the Australian Taxation Office". The request for

information is quite clear. By definition, however, all taxation returns should be lodged in

accordance with the Australian Taxation Office's requirements. The extra phrase introduces

suggestive ambiguity into the information request by suggesting that the report will not

consist of all taxation clients, because some clients may not have complied with the Tax

Office's requirements.

The following hypotheses are proposed:

H8a: Higher suggestive ambiguity in the information request leads to an increase in the

total errors in the query formulation.

H8b: Higher suggestive ambiguity in the information request leads to an increase in the

time taken to complete the query formulation.

H8c: Higher suggestive ambiguity in the information request leads to lower end user

confidence in the accuracy of the query formulation.

16
2.2.4 Ambiguity in Practice

Table 1 provides examples of the types of ambiguity identified in this paper. The table also

summarises, and provides examples for, each type of ambiguity.

Table 1
Summary and Examples of the Seven Types of Ambiguity in Natural Language Information Requests
Ambiguity Information Request
Type
Lexical A report of our clients for our marketing brochure mail-out.
The word "report" may have several meanings, independent of its context.
For example, there may be: a gunshot report echoing through the hillside;
the Lieutenant reported to the Captain; I dropped the heavy report on my toe,
etc. Although the context may make the meaning clear, the lexical
ambiguity adds to cognitive effort and contributes to ambiguity overall.
Syntactical A report of poor-paying clients and client managers. Determine their effect
on our profitability for the last twelve months.
It is not clear whose effect on profitability is meant. Another example is
"Bob hit the man with a stick". It is not clear, syntactically, whether the man
with a stick was hit, or whether the man was hit, by Bob, with a stick.
Inflective A report showing what the product of our last marketing campaign for sales
of our accounting software product in the last month was.
Ambiguity here derives from the use of the word "product" with two
different meanings in the one information request.
Pragmatic A report of all the clients for a department.
The ambiguity here is that the department has not been specified.
Information necessary to clearly understand the message is omitted. It would
be legitimate to prepare a report for any department. Further information is
needed to resolve this actual ambiguity.
Extraneous A report of all clients (and their names and addresses only) for the Tax and
Business Services department. Some of those clients are our biggest earners,
you know.
The last sentence is extraneous. Unlike pragmatic ambiguity, the sentence
contains information that is redundant, uninformative, or not necessary to
derive the statement's message. "Noise" in the communication exists. More
words are used than are necessary to make the statement.
Emphatic A report of our good clients.
Ambiguity here could derive from the lack of ability to provide emphasis of
the words in its written form. Depending on the emphasis used, "good
clients" could be legitimately interpreted to be clients that pay on time,
clients that have the most dollar-value sales, or even, with the correct ironic
emphasis on the spoken word, our worst clients - those that do not pay.

17
Ambiguity Information Request
Type
Suggestive A report of the clients of this accounting practice that have lodged taxation
returns in the past five years in accordance with the requirements of the
Australian Taxation Office.
The request for information is quite clear until the phrase "in accordance
with the requirements of the Australian Taxation Office". By definition, all
taxation returns should be lodged in accordance with these requirements.
The extra phrase introduces suggestive ambiguity into the information
request by suggesting that the report will not necessarily consist of all
taxation clients.

2.3 Task Complexity

More complex tasks require more cognitive effort and hence have a generally negative

impact on the user's performance in deriving database queries (Campbell 1988; Borthick et

al. 1997; Borthick et al. 2000). Task complexity, in the context of query development,

consists of the inherent task complexity associated with the query syntax, and the data

structure complexity associated with the organisation of the tables and attributes (Liew 1995).

Campbell (1988) and Wood (1986) document the general impact of task complexity. Jih et

al. (1989) studied task complexity and user performance in the context of the use of entity-

relationship diagrams and relational data models. Complexity in this context is generally

measured as a function of the total number of elementary mental discriminations required to

write a query (Halstead 1977).

The following hypotheses are proposed:

H9a: Higher complexity in the information request leads to more total errors in the query

formulation.

18
H9b: Higher complexity in the information request leads to more time taken to complete

the query formulation.

H9c: Higher complexity in the information request leads to lower end user confidence in

the accuracy of the query formulation.

2.4 Theoretical Model Summary

Figure 2 summarises the theoretical model presented in this paper. Complexity and the seven

types of ambiguity have a negative impact on end user query performance as they increase.

Hypotheses 1 through 9 are derived from these hypothesised relationships.

Information
Request

Ambiguity
Lexical
Pragmatic Emphatic Complexity
Syntactical
Extraneous Suggestive
Inflective

Negative Negative
Relationship With Relationship With

End User
Query
Performance

Figure 2
The Theoretical Model of Ambiguity, Complexity, and End User Query Performance

19
3. Methodology

3.1 Experimental Design

A laboratory experiment was conducted to test the hypotheses presented in this study. A two-

factor, within-groups experimental design was used (Huck et al. 1974). Participants were

randomly assigned to two groups (Group A and Group B). Each participant was presented

with up to sixteen questions. Each question was presented in either a clear or ambiguous

formulation.

Group A's question formulations were alternately ambiguous and clear. Group B's question

formulations were alternately clear and ambiguous. Using alternating formulations helped

promote equitable treatment of the two groups. That is, the alternating formulations ensured

that both groups would complete approximately the same number of questions during the

allotted time, expend approximately the same amount of cognitive effort, and would

experience approximately the same level of frustration in dealing with ambiguous

information requests. All participants spent two hours on the experiment. Appendix A

shows the questions presented to students together with the model answers.

A set of instructions (Appendix B), including a synopsis of the query language syntax, was

provided to the participants. A Unix shell script (Appendix C) presented the questions

electronically to the participants and automatically captured their responses in text files. An

entity-relationship diagram describing the database is presented in Appendix D, and was

available to subjects. Further details regarding the experimental process are provided in

Appendix E.

20
3.2 Experiment Participants

Forty-seven undergraduate and nineteen postgraduate students participated in the experiment.

Participating students were enrolled either in an advanced undergraduate or in a post-graduate

database subject within the business school at the University of Queensland. All students

enrolled in the two database subjects participated in the experiment.

The motivation for student participation was the receipt of five percent of the students' final

mark for the subject (2.5% for participation, 2.5% for performance). Participants were aware

that they were participating in an experiment.

Participants had been previously trained in the use of the SQL query language, and had been

afforded the opportunity to practice SQL on the university systems. All practice took place

on different databases than used for the experiment. Generally, student expertise with SQL

was low to intermediate. The experiment, for most students, was the first practical

application of their SQL skills.

3.3 Assessment of Participant Responses

Participant responses were captured in text files that showed each interactive response and

captured the start and end time of each question. This file was edited into a suitable format

for marking by two examiners. Each response was independently assessed by each examiner

to determine whether the response was the participant's final complete response. Responses

where participants did not finish the query formulation were removed from the study.

21
In some instances, the state of completion of the response was indeterminate. If the response

could only be corrected with substantial rework of the submitted response, the examiners

erred on the side of caution and removed these responses from the study.

Examiners then corrected the answers according to the model answers (Appendix A), using

the Semantic Error Counting, SQL Challenge Error Counting, and Intermediate Error

Counting Forms shown in Appendix F. Each examiner independently assessed the

participant responses and corrected the response. Each discrete alteration (addition or

deletion of a query component) counted as one "micro error" in the Semantic Error Counting

Form (Appendix F).

The corrected response that determined the total error count was the response that required

the fewest changes to the participant's response, and still produced the required result set.

This approach ensured a lower error count than a strict modification of the response to ensure

an exact match to the model answer. Appendix G provides an example corrected response.

The examiners then compared their independent assessments to ensure that all errors had

been found and corrected and that the proposed formulations or corrected formulations

produced the correct output. If more than one correction method was found to produce a

correct query, the correction method that produced the smallest number of errors was used.

A diary of common errors and their corrections was kept to ensure consistency throughout the

assessment process. The final, moderated, error sheets were transcribed to a relational

database for analysis.

22
4. Results and Discussion

4.1 Overview of Experimental Results

Participant demographic information and statistics are presented in Tables 2, 3, and 4. The

demographic information indicates that the assignment of participants to ensure homogeneity

between Group A and Group B was successful. The groups are relatively homogeneous in

terms of course background, grade point average (GPA), and age. In any case, both Group A

and Group B received the treatment effect of ambiguity on alternate questions, mitigating

concerns of the effect of a selection bias on experimental results.

Table 2
Participant Demographic Information and Descriptive Statistics:
Course Background of Group A and Group B
Enrolled Degree Group Group Total
A B
Undergraduate Arts 3 3 6
Undergraduate Business 20 18 38
Undergraduate Computer Science/Information systems 3 0 3
Postgraduate Business 2 1 3
Postgraduate Computer Science/Information Systems 5 11 16

Total Participants: 33 33 66

Table 3
Participant Demographic Information and Descriptive Statistics:
Academic Record of Group A and Group B
Academic Record Average Standard Min Max
Deviation
GPA (65 students with academic 4.94 0.90 3.26 7.00
records)
GPA (Group A: 33 students with 5.04 0.83 3.26 6.84
academic records)
GPA (Group B: 32 students with 4.83 0.97 3.29 7.00
academic records)

23
Table 4
Participant Demographic Information and Descriptive Statistics:
Participant Age in Group A and Group B

Age (in Years) Average Standard Min Max


Deviation

Average Age 24.94 7.72 18.74 61.25


(65 Students with date of birth
available)

Average Age 24.76 7.29 19.50 48.53


(Group A, 33 Students with date
of birth available)

Average Age 25.13 8.26 18.74 61.25


(Group B, 32 Students with date
of birth available)

Participants completed 425 responses in the experiment. The experiment contained sixteen

questions for both ambiguous and clear information requests. Due to the two hour time

constraint no participant completed more than twelve questions. Forty participants (60.61%

of the sample population) completed six questions. On average, participants completed 6.44

questions, with a standard deviation of 1.75.

Table 5 provides an overview of the participants' results in the experiment. Total errors is

calculated as the average of the micro errors counted using the Semantic Error Counting

Sheet shown in Appendix F. Appendix H provides a Pearson correlation matrix of the

dependent and independent variables measured in the experiment. Appendix I provides

detailed reports of the errors participants made on each individual question.

24
Table 5
Comparative Statistics for all Participant Responses
Grouped by Question (Q) and Treatment (T). Note that for T, a = ambiguous, c = clear
Q T Halstead's Group Response Attempts Attempts Confidence Confidence Duration Duration Total Errors Total Errors
Complexity Count Average Standard Average Standard Average Standard Average Standard
Deviation Deviation Deviation Deviation
1 a 1.6927 A 32 3.31 1.99 6.22 1.36 10.51 4.63 1.59 3.66
1 c 1.6927 B 33 3.18 2.16 6.42 0.87 11.63 6.60 1.12 2.48
2 a 5.4186 B 33 9.21 8.88 5.21 1.47 20.74 11.30 4.27 8.18
2 c 5.4186 A 33 3.61 3.43 6.30 1.05 9.03 6.89 0.30 0.81
3 a 6.8908 A 33 7.94 6.04 5.91 1.57 11.84 7.72 3.97 3.50
3 c 6.8908 B 33 5.09 6.18 6.27 1.42 8.63 5.29 1.03 2.86
4 a 4.4697 B 32 7.31 4.75 5.38 1.64 15.57 8.95 4.03 5.54
4 c 4.4697 A 33 6.52 7.36 6.21 1.47 10.95 8.46 0.67 2.23
5 a 12.2917 A 33 9.24 6.63 5.24 2.21 18.54 11.06 9.42 10.39
5 c 12.2917 B 30 7.07 5.98 5.37 2.16 15.65 9.74 5.20 7.70
6 a 18.8000 B 17 11.41 7.21 5.59 1.33 23.59 7.93 32.94 13.21
6 c 18.8000 A 23 14.91 9.36 4.87 1.91 25.63 10.13 8.00 10.49
7 a 16.0076 A 15 11.07 6.10 5.07 1.49 18.78 5.46 7.27 8.65
7 c 16.0076 B 15 7.67 4.20 5.07 1.98 15.31 7.86 6.13 7.41
8 a 16.2684 B 6 6.83 8.42 5.83 1.60 13.24 8.36 2.33 4.08
8 c 16.2684 A 10 6.40 2.46 5.00 1.94 12.53 5.35 6.40 6.52
9 a 23.8970 A 3 12.33 2.08 3.00 1.73 16.43 7.77 18.00 10.54
9 c 23.8970 B 2 6.50 3.54 6.50 0.71 15.36 2.51 15.50 21.92
10 a 19.4819 B 1 7.00 - 5.00 - 9.93 - 20.00 -
10 c 19.4819 A 4 7.25 3.20 4.25 2.50 9.56 1.40 5.00 2.58
11 a 22.4000 A 2 7.00 4.24 5.00 2.83 8.53 2.13 22.50 13.44
11 c 22.4000 B 1 4.00 - 7.00 - 9.45 - 8.00 -
12 c 29.1633 B 1 14.00 - 4.00 - 10.10 - 8.00 -

The relationships between the dependent variables (duration, confidence, and total errors) and

the independent variables (complexity, ambiguity) are graphically depicted in Figures 3, 4,

and 5. These figures illustrate that the hypothesised relationships for complexity and

ambiguity were supported for most measures by most queries.

Questions by Treatment and Error

35.00
30.00
Average Errors

25.00
20.00 Ambiguous
15.00 Clear
10.00
5.00
0.00
1 2 3 4 5 6 7 8 9 10 11 12
Question

Figure 3
Depicting graphically the relationship between the treatment received (ambiguous or clear information request)
and the total errors in the participant's response.

25
Questions by Treatment and Duration

30.00
25.00

Average Duration
(in minutes)
20.00
Ambiguous
15.00
Clear
10.00
5.00
0.00
1 2 3 4 5 6 7 8 9 10 11 12
Question

Figure 4
Depicting graphically the relationship between the treatment received (ambiguous or clear information request)
and the duration taken for the participant to prepare the response.

Questions by Treatment and Confidence

8.00
7.00
Average Confidence

6.00
5.00
Ambiguous
4.00
Clear
3.00
2.00
1.00
0.00
1 2 3 4 5 6 7 8 9 10 11 12
Question

Figure 5
Depicting graphically the relationship between the treatment received (ambiguous or clear
information request) and the participant's confidence in the response.

Question Six, with an average of 32.94 errors (standrard deviation of 13.21), caused the most

problems for participants in its ambiguous formulation. Nonetheless the seventeen

respondents to Question Six in its ambiguous formulation took on average slightly less time

to complete the response (23.59 average minutes, 7.93 standard deviation) than the twenty-

three respondents for the clear formulation (25.63 average minutes, 10.13 standard

deviation).

26
Participants that completed Question Eight in the clear formulation made more average errors

(6.40, standard deviation of 6.52) than those with the ambiguous formulation (average of 2.33

and standard deviation of 4.08). Participants also exhibited higher average confidence ratings

for the ambiguous formulation of this question (5.83, standard deviation of 1.60) than

participants receiving the clear formulation (5.00, standard deviation of 1.94).

A reason for these results may be that extraneous ambiguity is apparent in the clear

formulation due to the formulation's length. Question Eight had sixteen completed responses

(six respondents for the ambiguous formulation, ten respondents for the clear formulation),

however, which limits the weight that can be placed on this question's result. Because of the

small number of participants completing Questions Nine through Twelve, analysis of

differences in these individual questions is not appropriate.

4.2 Regression Analysis

Two multiple linear regression models were used to analyse the experimental results. The

model used to test H1a-c, and H9a-c for the effects of ambiguity and complexity respectively

was:

(1) Performance = Ambiguity + Complexity

where ambiguity was a dichotomous variable and complexity was measured using the

Halstead (1977) complexity measure for difficulty.

27
The model used to test the seven individual types of ambiguity in H2a-c to H8a-c was:

(2) Performance = Lexical + Syntactical + Inflective + Pragmatic +

Extraneous + Emphatic + Suggestive + Complexity

where the ambiguity types were measured as shown in Appendix J, according to the

ambiguity assessment instrument presented in Appendix K.

Performance is end user query performance. The dependent variables that proxy for end user

query performance are total errors, duration, and confidence. Duration was measured as

decimal minutes. The Confidence Rating was self-assessed by participants and was

transformed to a numerical rating in accordance with Table 6. The numerical rating was used

as the measure for confidence in the regression analysis.

Table 6
Confidence Rating Transformation to a Numerical Scale
Confidence Rating Numerical Rating
>85-100% 7
70-85% 6
55-70% 5
40-55% 4
25-40% 3
10-25% 2
<10% 1

In all regression models, the Halstead (1977) complexity measure for difficulty was used to

assess the complexity of the required model answer. This measure has been used in several

end user query performance studies (Jih et al. 1989).

For testing H1a-c and H9a-c, a dichotomous variable of 0 (clear formulation, or pseudo-SQL)

and 1 (ambiguous formulation, or manager-English) was used to indicate whether the

28
participant had received a clear formulation or an ambiguous formulation of the information

request. For testing H2a-c to H8a-c, the seven independent ambiguity parameters were

assessed in accordance with the scale presented in Table 7. Each question was assessed by

two independent non-researchers who had been briefed in the definitions of the seven types

of ambiguity. The initial scores were moderated by discussion and consideration between the

independent third parties and the researcher to ensure consistent and correct interpretation of

the seven ambiguity definitions. Cronbach's alpha (Cronbach 1951) for the two third parties'

ambiguity measurement scores was 0.6887, indicating that a moderately reliable measure for

ambiguity across two researchers was achieved.

Table 7
Ambiguity Assessment Scale for the analysis of the Seven Ambiguity Types Regression Model
Ambiguity Assessment Rating Meaning
0 No ambiguity of this type present
1 A little ambiguity of this type present
2 Some ambiguity of this type present
3 Much ambiguity of this type present
4 A great deal of ambiguity of this type present

Each question formulation, clear and ambiguous, for each information request was assessed

to provide a scale of ambiguity. The instrument used to undertake this finer assessment of

ambiguity for questions for which responses exist is reproduced in Appendix K. Using a five

point scale for the ambiguity assessment rating provides a finer measure than would a

dichotomous variable.

4.3 Ambiguity Treatment Multiple Linear Regression Model Results

Table 8 provides the results of the multiple linear regression (Newbold 1984) shown for

model (1) for the Total Errors, Duration, and Confidence measures of end user query

performance. These results provide evidence regarding H1a-c and H9a-c. All relationships

29
are in the hypothesised direction (positive for H1a, H1b, H9a, and H9b, and negative for H1c

and H9c), and indicate strong support for each hypothesis.

Table 8
Regression Analysis Results for the General Ambiguity Regression Model
Source DF Mean F-Value Pr > T Parameter R2
(n=425) Square (2 tailed) Estimate
Model (Total Errors) 2 5430.30 88.44 0.0001 0.2954
Error 422 61.40
Ambiguity (H1a) 1 2447.98 39.87 0.0001 4.8042
Complexity (H9a) 1 8705.38 141.78 0.0001 0.7582

Model (Duration) 2 2236.60 28.59 0.0001 0.1193


Error 422 78.23
Ambiguity (H1b) 1 1250.63 15.99 0.0001 3.4339
Complexity (H9b) 1 3352.81 42.86 0.0001 0.4705

Model (Confidence) 2 42.87 16.25 0.0001 0.0715


Error 422 2.64
Ambiguity (H1c) 1 13.03 4.94 0.0268 -0.3505
Complexity (H9c) 1 74.68 28.31 0.0001 -0.0702

Ambiguity in an information request has a strong impact on the three measures of end user

query performance presented in H1a, H1b, and H1c. Total errors, duration, and end user

confidence are significantly and strongly affected by the presence of ambiguity in the

information request. The result is confirmatory of the general hypothesis of the model

presented in this paper: that an ambiguous information request is likely to result in a query

formulation that is less accurate, takes longer to prepare, and in which the end user is less

confident. Ceteris paribus, a clearly formulated information request is more effective and

efficient than an information request that is ambiguous and poorly specified.

The relationship between ambiguity and end user confidence, however, is generally weaker

than expected, although still significant at the 5% level. The small R 2 (0.0715) for the

confidence model indicates that the ambiguity and complexity of an information request had

little impact on each participant's confidence in their query formulation.

30
Ambiguity is significant for all three models. The R2 for each model (0.2954, 0.1193, and

0.0715) provides strong support for the assertion that ambiguity and complexity negatively

impact end user query performance.

4.4 Multiple Linear Regression Model: Seven Types of Ambiguity

Table 9 provides the results of the multiple linear regression model shown for model (2) for

the Total Errors, Duration, and Confidence measures of end user query performance. This

testing examines hypotheses H2a-c through H8a-c for individual types of ambiguity.

Table 9
Regression Analysis Results for the Seven Ambiguity Types Regression Model
Source DF Mean F-Value Pr > T Parameter R2
(n=425) Square (2 tailed) Estimate
Model (Total Errors) 8 2177.52 46.81 0.0001 0.4737
Error 416 46.52
Lexical (H2a) 1 78.41 1.69 0.1949 -1.5545
Syntactical (H3a) 1 7.99 0.17 0.6789 -0.2274
Inflective (H4a) 1 0.79 0.02 0.8963 -0.4143
Pragmatic (H5a) 1 385.36 8.28 0.0042 1.2621
Extraneous (H6a) 1 254.77 5.48 0.0197 3.3940
Emphatic (H7a) 1 394.51 8.48 0.0038 2.6906
Suggestive (H8a) 1 167.54 3.60 0.0584 2.9079
Complexity 1 2605.34 56.01 0.0001 0.4899

Model (Duration) 8 832.24 11.23 0.0001 0.1776


Error 416 74.10
Lexical (H2b) 1 1272.66 17.17 0.0001 6.2626
Syntactical (H3b) 1 600.95 8.11 0.0046 1.9725
Inflective (H4b) 1 780.00 10.53 0.0013 -13.0021
Pragmatic (H5b) 1 4.65 0.06 0.8023 -0.1387
Extraneous (H6b) 1 1008.31 13.61 0.0003 6.7520
Emphatic (H7b) 1 129.85 1.75 0.1863 -1.5436
Suggestive (H8b) 1 457.05 6.17 0.0134 -4.8029
Complexity 1 1926.10 25.99 0.0001 0.4213

Model (Confidence) 8 14.66 5.64 0.0001 0.0978


Error 417 2.60
Lexical (H2c) 1 8.81 3.39 0.0664 -0.5211
Syntactical (H3c) 1 0.02 0.01 0.9292 -0.0115
Inflective (H4c) 1 1.27 0.49 0.4844 0.5253
Pragmatic (H5c) 1 2.83 1.09 0.2973 -0.1082
Extraneous (H6c) 1 0.10 0.04 0.8435 0.0677
Emphatic (H7c) 1 0.07 0.03 0.8697 -0.0358
Suggestive (H8c) 1 1.91 0.74 0.3915 0.3107
Complexity 1 76.02 29.24 0.0001 -0.0837

31
4.5 Summary of Results

The experimental results indicate that the taxonomy presented in this paper explains a great

deal of the effect of ambiguity on end user query performance. The results indicate that

further refinement of the theory presented in this paper is required. Table 10 provides a

summary of the results obtained in this experiment. All hypotheses indicated as "supported"

are significant at the p = 0.05 level or below according to a one-tailed test. The two-tailed p-

value is shown, and is immediately followed by the one-tailed p-value in brackets.

Table 10
Summary of Analysis' Support for Hypotheses
Hypothesis Statement Result
H1a Higher ambiguity in the information request leads to an Supported
increase in the total errors in the query formulation. p=0.0001 (0.0001)
H1b Higher ambiguity in the information request leads to an Supported
increase in the time taken to complete the query formulation. p=0.0001 (0.0001)
H1c Higher ambiguity in the information request leads to lower Supported
end user confidence in the accuracy of the query formulation. p=0.0268 (0.0134)
H2a Higher levels of lexical ambiguity in the information request Not Supported
lead to more total errors in the query formulation. p=0.1949 (0.0975)
(negative parameter)
H2b Higher levels of lexical ambiguity in the information request Supported
lead to more time taken to complete the query formulation. p=0.0001 (0.0001)
H2c Higher levels of lexical ambiguity in the information request Supported
leads to lower end user confidence in the accuracy of the p=0.0664 (0.0332)
query formulation.
H3a Higher levels of syntactical ambiguity in the information Not Supported
request lead to more total errors in the query formulation. p=0.6789 (0.3395)
H3b Higher levels of syntactical ambiguity in the information Supported
request lead to more time taken to complete the query p=0.0046 (0.0023)
formulation.
H3c Higher levels of syntactical ambiguity in the information Not Supported
request leads to lower end user confidence in the accuracy of p=0.9292 (0.4646)
the query formulation.
H4a Higher levels of inflective ambiguity in the information Not Supported
request lead to more total errors in the query formulation. p=0.8963 (0.4482)
H4b Higher levels of inflective ambiguity in the information Not Supported
request lead to more time taken to complete the query p=0.0013 (0.0007)
formulation. (negative parameter)

32
Hypothesis Statement Result
H4c Higher levels of inflective ambiguity in the information Not Supported
request leads to lower end user confidence in the accuracy of P = 0.4844 (0.2422)
the query formulation.
H5a Higher levels of pragmatic ambiguity in the information Supported
request lead to more total errors in the query formulation. p=0.0042 (0.0021)
H5b Higher levels of pragmatic ambiguity in the information Not Supported
request lead to more time taken to complete the query p=0.8023 (0.4012)
formulation.
H5c Higher levels of pragmatic ambiguity in the information Not Supported
request leads to lower end user confidence in the accuracy of p=0.2973 (0.1487)
the query formulation.
H6a Higher levels of extraneous ambiguity in the information Supported
request lead to more total errors in the query formulation. p=0.0197 (0.0099)
H6b Higher levels of extraneous ambiguity in the information Supported
request lead to more time taken to complete the query p=0.0003 (0.0002)
formulation.
H6c Higher levels of extraneous ambiguity in the information Not Supported
request leads to lower end user confidence in the accuracy of p=0.8435 (0.4218)
the query formulation.
H7a Higher levels of emphatic ambiguity in the information Supported
request lead to more total errors in the query formulation. p=0.0038 (0.0019)
H7b Higher levels of emphatic ambiguity in the information Not Supported
request lead to more time taken to complete the query p=0.1863 (0.0932)
formulation. (negative parameter)
H7c Higher levels of emphatic ambiguity in the information Not Supported
request leads to lower end user confidence in the accuracy of p=0.8697 (0.4349)
the query formulation.
H8a Higher levels of suggestive ambiguity in the information Supported
request lead to more total errors in the query formulation. p=0.0584 (0.0292)
H8b Higher levels of suggestive ambiguity in the information Not Supported
request lead to more time taken to complete the query p=0.0134 (0.0067)
formulation. (negative parameter)
H8c Higher levels of suggestive ambiguity in the information Not Supported
request leads to lower end user confidence in the accuracy of p=0.3915 (0.1958)
the query formulation.
H9a Higher complexity in the information request leads to more Supported
total errors in the query formulation. p=0.0001 (0.0001)
H9b Higher complexity in the information request leads to more Supported
time taken to complete the query formulation. p=0.0001 (0.0001)
H9c Higher complexity in the information request leads to lower Supported
end user confidence in the accuracy of the query formulation. p=0.0001 (0.0001)

33
4.5.1 Potential Ambiguity

The generally weak measured effects for the potential ambiguities assessed by the experiment

(lexical and syntactical) do not support the hypotheses presented in this paper. As the

theoretical model indicates, potential ambiguities derive their ambiguity independently of the

context of the statement. A statement may contain lexical or syntactical ambiguity, but the

context of the statement resolves the ambiguity measured. The hypothesised effects were not

measurable due to the clarification of the ambiguity by the context.

Lexical ambiguity did not show a statistically significant relationship with total errors (H2a).

Lexical ambiguity did demonstrate a statistically significant relationship with duration (H2b,

p=0.0001) and confidence (H2c, p=0.0332 for a one-tailed t-test). The implication of these

results is that lexical ambiguity requires more cognitive effort by the end users to determine

the meaning of the request. Once the meaning of the request has been determined, however,

users do not make significantly more errors in their query formulations. Lexical ambiguity

did result in end users being slightly less confident in their queries.

Although in the hypothesised direction (positive), the relationship between syntactical

ambiguity and total errors (H3a) is not significant (p=0.6789). Syntactical ambiguity does

show a significant relationship with the time taken to compare the query, which indicates that

greater cognitive effort is required to resolve the contextual ambiguity. Syntactical

ambiguity's relationship with end user confidence is not significant (H3c, p=0.9292).

Inflective ambiguity does not show a significant relationship in the hypothesised direction for

H4a (p=0.8963), H4b (negative parameter, p=0.0013), or H4c (p=0.4844). Interestingly,

34
inflective ambiguity shows a significant negative relationship with duration, which is in the

opposite direction to that hypothesised. This result must be considered with caution,

however, as the level of inflective ambiguity present in the questions presented to subjects

was low (Appendix J).

4.5.2 Actual Ambiguity

The role of the actual ambiguity types (pragmatic and extraneous) in the theoretical model are

strongly supported by the empirical results. Actual ambiguities are not clarified by the

context of the statement, i.e., the context does not resolve pragmatic and extraneous

ambiguities. Actual ambiguities generally show a strong relationship with total errors, and

extraneous ambiguity (but not pragmatic ambiguity) displays a strong relationship with

duration. Neither pragmatic or extraneous ambiguities show a significant relationship with

end user confidence.

Pragmatic ambiguities are not clarified by context, and arise where information necessary to

properly answer the information request is missing. The hypothesised relationship between

pragmatic ambiguity and total errors is strongly supported (H5a, p=0.0042). The

hypothesised effects of pragmatic ambiguity on duration (H5b, negative parameter,

p=0.8023), and end user confidence (H5c, p=0.2973) were not significant. Pragmatic

ambiguity may require the end user to infer the missing information, and increase total errors.

In the current experiment, the need to infer missing information did not significantly affect

the time necessary to complete the query response or end user confidence in their query.

Extraneous ambiguity occurs when more information than is required is provided or when the

information request is indirectly and pretentiously written. Extraneous ambiguity misleads

35
end users as to the required response. H6a was strongly supported for total errors (p=0.0197)

and duration (p=0.0003) in the end user query formulation. Extraneous ambiguity, where

more information is provided than is required, appears to require more time and cognitive

effort to resolve the ambiguity, and the query response is more likely to be inaccurate.

The parameter estimates (Table 9) for total errors (3.3940) and for duration (6.7520) indicate

that extraneous information produces severe negative impacts on end user query efficiency

and effectiveness. The result for H6c, which hypothesised that extraneous ambiguity

decreases end user confidence, is not significant (p=0.8435). Where information needs to be

inferred (pragmatic ambiguity), end users appear to recognise and grapple with the

ambiguity. End users appeared less able to recognise and adjust for extraneous ambiguity

than pragmatic ambiguity.

4.5.3 Imaginary Ambiguity

The results for imaginary ambiguities support the hypothesised relationships between these

ambiguities and query errors. The results do not support the hypothesised relationships with

duration or end user confidence. Imaginary ambiguities result in more total errors, but appear

to result in less time taken to complete the requests. These outcomes are important, because,

although not hypothesised, imaginary ambiguities appear to lead end users to infer the

requirements of the question more quickly (leading to a shorter duration required) and to

formulate the query response on that basis (leading to higher total errors). This result should

be treated with caution, as the imaginary ambiguities were not at a high level in this

experiment (Appendix J).

36
Emphatic ambiguity arises from the limited ability to convey intonation in written form. The

hypothesis regarding the effect of emphatic ambiguity on total errors (H7a) is strongly

supported (p=0.0038). Neither H7b (duration) nor H7c (confidence) were statistically

significant. Where the emphasis of the information request cannot be clearly expressed, end

users are required to supply their own emphasis when interpreting the meaning of the

information request. While they appear to make their interpretation quickly, the end users did

not recognise that their queries were more likely to contain errors.

The hypothesised relationship between suggestive ambiguity and total errors (H8a) is

strongly supported (p=0.00292 for a one-tailed t-test). The relationship between suggestive

ambiguity and duration (H8b), however, is opposite to the hypothesised direction, and

significant (negative parameter, p=0.0134). The hypothesised relationship with end user

confidence (H8c) is not supported (p=0.3915). Similar to extraneous ambiguity, suggestive

ambiguity indicates that end users are not able to recognise the negative impact of suggestive

ambiguity on their query formulations. This anomalous result requires further research to

determine the reason for this undesirable result and to search for ways to ameliorate these

problems for end user formulations.

4.5.4 Complexity

The results indicate strong support for the hypotheses regarding complexity (H9a, H9b, and

H9c all with p=0.0001). Task complexity increases total errors and duration, and decreases

the end user's overall confidence in the query formulation. These results are consistent with

previous research (e.g., Borthick et al. 1997; Borthick et al. 2000).

37
5. Implications For Business Practice

This research has developed an initial theory of ambiguity and end user queries. It

empirically investigated seven ambiguities, and measured how they differentially affect end

user query performance. Some ambiguities, e.g., lexical, extraneous, pragmatic, and

emphatic, affect end user query performance more than others. Some ambiguities, i.e.

extraneous, and suggestive, indicate that end users will potentially make decisions based on

results that are inaccurate or misleading.

5.1.1 Electronic Mail

In the business world, electronic mail is often used to transmit information requests,

frequently without the benefit of other channels of communication (Star 1995). Furthermore,

these information requests are hurriedly written (Star 1995; Fowler and Aaron 1998). Such

haste contributes to syntactical, lexical, and inflective ambiguities. The use of shorthand

notations often miscommunicates the intended message. Electronic mails frequently leave

assumptions about the business process unstated and assumed. These omissions contribute to

pragmatic ambiguity. The hurried state of the specification, and the lack of a formal

specification process also contribute to extraneous ambiguity (Fowler and Aaron 1998).

Lexical, syntactical, inflective, and, to some extent, extraneous, ambiguity types are functions

of the grammar used to write the information request. The longer the request, the more likely

the request is to contain these ambiguities (Fowler and Aaron 1998). Concise writing is

important to reduce ambiguity. Good written communication skills on the part of the

individual making the information request are required.

38
All seven ambiguities arise in the daily business specification of reports. Several strategies

are available to reduce their impact. Electronic mails containing information requests need to

be concisely drafted and proofread to reduce pragmatic ambiguity. Providing concise

specifications and avoiding indirect writing, e.g., pretentious writing and passive voice,

reduce the lexical, syntactical, inflective, and extraneous ambiguity of information requests

(Fowler and Aaron 1998).

Emoticons (Sanderson 1993) and generally accepted formatting styles can be used to add

emphasis to electronic mail. These techniques can reduce emphatic ambiguity.

An objective reading of the information request to reduce innuendo addresses suggestive

ambiguity. Explaining the reason for the information request as much as possible will

enhance clarity and reduce the perception of hidden agendas.

Each of the above techniques enhance the clarity of the information request and thus increase

the effectiveness and efficiency of the response received. These techniques initially increase

the time necessary to write the information request. Nonetheless, this paper's results indicate

that the result will be an increase in the timeliness, accuracy, and relevance of the information

received.

5.1.2 Personnel Turnover and Work Teams

Information systems personnel and end users are frequently engaged on short-term contracts.

Turnover in many organisations, and especially within work groups, is high (Moore 2000).

As turnover increases, the ambiguity of information requests also tends to increase. End

users have less experience and understanding of the organisational culture and thus do not

39
understand the context and assumptions made in information requests. Especially when

faced with high turnover of information systems personnel and end users, strategies for

reducing the seven ambiguities can significantly benefit the organisation.

Jessup and Valacich (1993) suggest strategies for retaining group memory and enhancing

organisational learning. For work teams that often have new members, a library of previous

information requests and associated query responses will assist team members to reduce

information request ambiguity by providing a context for the request. To function properly,

new team members must understand the organisational procedures and have a context within

which to function.

Business would benefit from candidly assessing its methodology of making information

requests. Using methodologies that result in less ambiguity through formalisation of the

information request will reduce errors and improve the efficient use of the time of skilled end

users.

40
6. Contributions, Limitations, and Future Research

6.1 Research Contributions

This paper provided significant, unique contributions to the theory of ambiguity, complexity,

and end user query performance. The theory of communication linguistics has been applied

to end user query performance theory. The theory identified seven ambiguities: lexical,

syntactical, inflective, pragmatic, extraneous, emphatic and suggestive. The empirical results

obtained for the developed theory are robust, and indicate substantial support.

An instrument to measure ambiguity in an information request, at a finer level than

previously available, was developed and applied. Although requiring further refinement, this

instrument is a significant advance in the measurement of information request ambiguity.

This paper identifies areas for future research, and examines the implications for business

practices. This paper represents a significant advancement of the theory and application to

ensure the efficient and effective development of queries by end users.

6.2 Research Limitations

Huck et al. (1974) identify seven issues for the internal validity of experiments. Appendix L

provides a detailed analysis of these issues. Appendix L outlines how this experiment's

design controlled for each issue.

As with most controlled laboratory experiments with student participants for subjects, there

are external validity issues. Generalisation from student subjects to the business setting may

41
be invalid. Students' motivations to obtain a high grade may be different to the business end

user. This experiment's use of advanced business and systems undergraduate students as

subjects however implies that this generalisation to the business setting is meaningful, as

these subjects are reflective of the skill levels of end users in a business context.

Generalising from this paper's results to a business setting is invalid to the extent that the

experimental information requests are not representative of information requests made in a

business setting. The information requests nonetheless are based on a close model of the

business world, undertaking likely real world tasks.

Another limitation is the need to extend the results to more extreme levels of ambiguity. The

ambiguity present in the experiment's questions was not extreme. Hence, generalising from

the results of the current experiment to more extreme levels of ambiguity may not be valid.

6.3 Future Research

Replication of this experiment, with more ambiguous information requests than those of the

current experiment, would strengthen the theoretical model. An experiment designed to

examine contextual reduction of the potential ambiguities (lexical, syntactical, and inflective)

would also be valuable. The weaker results of the current experiment may derive from a lack

of variation in ambiguity for some of the seven types of ambiguity. Instantiating ambiguity

into the experiment over a greater range and variation of ambiguity in the information

requests would add empirical insight into the theoretical model.

This paper presents what initially appear to be anomalous results for inflective and suggestive

ambiguity in the context of duration. A future experiment would do well to investigate the

42
circumstances of these results, and to empirically analyse the relationship between inflective

ambiguity, suggestive ambiguity, and duration.

A future experiment having particular regard to end user confidence would significantly

assist the development of the theoretical model. None of the hypotheses, with the exception

of lexical ambiguity (H2c), is supported for end user confidence. On the basis of the current

results, end user confidence often does not reflect the true state of affairs of the query

response's accuracy. End users do not appear to know when the query response is inaccurate.

Outside of the domain of laboratory research, an avenue for future research would be a field

experiment of ambiguity and the performance of business end users. This experiment would

allow the researcher to examine the prevalence and effects of the seven types of ambiguity in

actual business settings. Such a study would also make a contribution by assessing the extent

to which the current experimental results generalise to the business setting.

An experiment designed to analyse the empirical effectiveness of strategies to mitigate each

ambiguity in a business setting would hold considerable value for research and business

practice. This would allow the development and subsequent assessment of strategies to

reduce the effect of ambiguity on end user query performance.

The development and empirical testing of the ambiguity assessment instrument (Appendix K)

would provide the opportunity to refine and enhance the current initial instrument. Future

research is necessary to develop a reliable and robust instrument for the measurement of

ambiguity in information requests.

43
References

Almuallim, H., Akiba, Y., Yamazaki, T., and Kaneda, S. "Learning Verb Translation Rules
from Ambiguous Examples and a Large Semantic Hierarchy," Computational Learning
Theory and Natural Learning Systems, (4), 1997, pp. 323-336.
Athey, S., and Wickham, M. "Required Skills for Information Systems Jobs in Australia".
Journal of Computer Information Systems,.(36:2), 1995-1996.
Australian Bureau of Statistics. "8669.0 Computing Services Industry, Australia, 1995-96".
Australian Bureau of Statistics. 1997.
Axley, S.R. "Managerial and organizational communication in terms of the conduit
metaphor," Academy of Management Review, (9), 1984, pp. 428-437.
Borthick, A.F., Bowen, P.L., and Diery, R.G. "Complexity and Errors in SQL Queries:
Development and Empirical Comparison of Complexity Measures." Workshop on
Information Technologies and Systems (WITS '97), pp. 31-40, December 13-14 1997.
Borthick, A.F., Bowen, P.L., Jones, D.R., and Tse, M.H.K. "The Effects of Information
Request Ambiguity and Construct Incongruence on Query Development," Proceedings of the
Pacific Asia Conference on Information Systems, June 2000.
Campbell, D. J. "Task Complexity: A Review and Analysis," Academy of Management
Review, (13:1), 1988, pp. 40-52.
Cardinali, R. "Information Systems - A Key Ingredient to Achieving Organizational
Competitive Strategy," Computer in Industry, (18:3), 1992, pp. 241-245.
Chomsky, N. "Language and Mind," in Ways of Communicating, Cambridge University
Press, Cambridge, 1991, pp. 56-80.
Conger, S. The New Software Engineering, Wadsworth Publishing, Belmont, California.
1994.
Copi, I. M., and Cohen, C. Introduction to Logic (8th ed.), Macmillan, New York, New York,
1990.
Cronbach, L. J. "Coefficient Alpha and the Internal Structure of Tests," Psychometrika, (16),
1951, pp. 297-334.
Delligatta, A., and Umbaugh, R. E. "EUC Becomes Enterprise Computing," Information
Systems Management, Fall 1993, pp. 53-55.
Dubin, R. Theory Building, Collier Macmillan Publishers, London, 1978.
Eisenberg, E.M., and Phillips, S.R. "Miscommunication in Organizations," in
"Miscommunication" and Problematic Talk, Sage Publications, London, 1991.
Fischer, D. H. Historians' Fallacies, Harper & Row, New York, 1970.
Fowler, H. R., and Aaron, J. E. The Little, Brown Handbook (7th ed.), Addison-Wesley
Publishers Inc., New York, New York, 1998.
Freeman, L.A., Jarvenpaa, S.L., and Wheeler, B. C. "The Supply and Demand of
Information Systems Doctorates: Past, Present and Future," MIS Quarterly, (24:2), June
2000.

44
Halstead, M. H. Elements of Software Science, Elsevier North-Holland Inc, Purdue
University, 1977.
Hamblin, C. L. Fallacies, Methuen, London, 1970.
Huck, S. W., Cormier, W. H., and Bounds, W. G. Jr. Reading Statistics and Research,
Harper & Row, New York, New York, 1974.
Jespersen, O. Language: its nature, development and origin, Allen & Unwin, London, 1922.
Jessup, L.M., and Valacich, J.S. Group Support Systems, Macmillan Publishing Company,
New York, New York, 1993.
Jih, W.J.K., Bradbard, D.A., Snyder, C.A., and Thompson, N.G.A. "The Effects of
Relational and Entity-Relationship Data Models on Query Performance of End Users,"
International Journal of Man-Machine Studies, (31), 1989, pp. 257-267.
Katzeff, C. "Systems Demands on Mental Models for a Fulltext Database," International
Journal of Man-Machine Studies, (32), 1990, pp. 483-509.
Keen, P.G.W. "Information Technology and the Management Difference: A Fusion Map,"
IBM Systems Journal, (32:1), 1993, pp. 17-38.
Kooij, J.G. Ambiguity in Natural Language, North-Holland Publishing Company,
Amsterdam, Holland, 1971.
Liew, S.T. "The Effects of Normalization on Query Errors: An Experimental Evaluation,"
Unpublished Thesis, University of Queensland, 1995.
Moore, J.E. "One Road to Turnover: An Examination of Work Exhaustion in Technology
Professionals," MIS Quarterly, (24:1), March 2000, pp. 141-168.
Nath, R., and Lederer, A.L. "Team Building for IS Success," Information Systems
Management, Spring 1996, pp. 32-37.
Newbold, P. Statistics for Business and Economics, Prentice-Hall Inc, Englewood Cliffs,
New Jersey, 1984.
Ogden, W.C., Korenstein, R., and Smelcer, J.B. An Intelligent Front-End for SQL, IBM
General Products Division, San Jose, California, 1986.
Reilly, R.G. "Miscommunication at the Person-Machine Interface," in "Miscommunication"
and Problematic Talk, Sage Publications, London, 1991.
Reisner, P. "Use of Psychological Experimentation as an Aid to Development of a Query
Language," IEEE Transactions on Software Engineering, SE3:3, 1977, pp. 218-299.
Rescher, N. Introduction to Logic, St Martin's Press, New York, New York, 1964.
Rho, S., and March, S.T. "An Analysis of Semantic Overload in Database Access Systems
using Multi-Table Query Formulation," Journal of Database Management, (8:2), Spring
1997, pp. 3-14.
Rosenthal, D.A., and Jategaonkar, V.A. "Wanted: Qualified IS Professionals," Information
Systems Management, Spring 1995, pp. 27-31.
Russell, B.A.W. "Vagueness," Australasian Journal of Philosophy and Psychology, (1),
1923, pp. 84-92.
Ryan, H.W. "User-Driven Systems Development: Defining a New Role for IS," Information
Systems Management, Summer 1993, pp. 66-68.

45
Ryle, G. Collected Papers, (2), Hutchinson, London, 1971.
Sanderson, D. Smileys, O'Reilly, Sebastapol, California, 1993.
Sekine, S., Carroll, J.J., Ananiadou, S., and Tsujii, J. "Automatic learning for Semantic
Collocation," Third Conference on Applied Natural Language Processing, 1992, pp. 104-
100.
Severin, W.J., and Tankard, J.W. "Communication Theories: Origins, Methods, and Uses in
the Mass Media," Addison Wesley Longman, Inc., New York, New York, 1997.
Star, S.L. The Cultures of Computing, Blackwell Publishers/The Sociological Review,
Oxford, U.K., 1995.
Suh, K.S., and Jenkins, A.M. "A Comparison of Linear Keyword and Restricted Natural
Language Database Interfaces for Novice Users," Information Systems Research, (3:3), 1992,
pp. 252-272.
Tayntor, C.B. "New Challenges or the End of EUC?," Information Systems Management,
Summer 1994, pp. 86-88.
Trow, C.E. The Old Shipmasters of Salem, New York, New York, 1905.
Turner, G.W. (Editor). The Australian Concise Oxford Dictionary of Current English,
Oxford University Press, Melbourne, 1987.
Walton, D. Fallacies Arising from Ambiguity, Kluwer Academic Publishers, Dordrecht,
1996.
Williamson, T. Vagueness, Routledge, New York, New York, 1994.
Wood, R.E. "Task Complexity: Definition of the Construct," Organizational Behaviour and
Human Decision Processes, (37), 1986, pp. 60-82.

46
Appendix A: Experiment Information Requests and Model Answers

No. Formulation Information Request


1. Ambiguous Management wants a list of each of our suppliers with no
duplicates in the list.
Clear List the distinct suppliers of the items we stock.
Model Answer (Halstead’s Complexity: 1.6927):
Select distinct(item_maker) from inventory;
2. Ambiguous Produce a report that lists the inventory items where the quantity
on hand is much larger, on a percentage basis, than the quantity
ordered.
Clear List item number, item name, quantity on hand, quantity on order
where quantity on hand is greater than 2 * quantity ordered.
Model Answer (Halstead’s Complexity: 5.4186):
Select item_no, item_name, qty_hand, qty_ordered from inventory where qty_hand > 2 *
qty_ordered;
3. Ambiguous Management wants a list of all Japanese customers and customers
with credit limits over $15,000.
Clear List customer numbers, customer names, country, and credit limit
of customers with credit limits greater than $15,000 or of
customers in Japan.
Model Answer (Halstead’s Complexity: 6.8908):
Select cust_no, cust_name, country, credit_limit from customer where country = 'Japan' or
credit_limit > 15000;
4. Ambiguous Produce a report that statistically compares the credit limits for
customers in different countries.
Clear List country, average credit limit, and standard deviation of
customer credit limit grouped by country.
Model Answer (Halstead’s Complexity: 4.4697):
Select country, avg(credit_limit), stddev(credit_limit) from customer group by country;
5. Ambiguous Produce a report of clients that prefer the Speedair carrier and
addresses.
Clear List customer number, customer name, street, city, post code, and
country where the customer's preferred carrier is Speedair.
Model Answer (Halstead’s Complexity: 12.2917):
Select cust_no, cust_name, street, city, state, post_code, country From customer, carrier
where customer.pref_carrier_code = carrier.carrier_code and carrier_name = ‘Speedair’;

47
No. Formulation Information Request
6. Ambiguous We're wondering if some of our winemakers are using poor quality
packaging and bottles - we've had a few complaints. Can you get
us a report that gives us some sort of idea about what items we are
shipping compared to what the customers are taking delivery of?
It would probably be a good idea while you're at it to give a
comparative percentage of the stuff shipped that doesn't make it -
just so the vintners won't try and weasel their way out of it, you
understand, they're good at that.
Clear List item maker, item number, item name, and 100 * (sum of
quantity shipped less sum of quantity accepted) / (sum of quantity
shipped) where the type of alcohol is wine.
Model Answer (Halstead’s Complexity: 18.8):
Select item_maker, inventory.item_no, item_name, 100 * (sum(qty_shipped - qty_accepted) /
sum(qty_shipped)) From inventory, invoiceitem where inventory.item_no =
invoiceitem.item_no and type_of_alc = "wine" Group by item_maker, inventory.item_no,
item_name;
7. Ambiguous Prepare a report that provides *all* customer's details and
indicates the number of different products they have ordered from
us.
Clear List customer number, and customer name for *all* customers,
and, if they have ordered anything, a count of unique items
ordered.
Model Answer (Halstead’s Complexity: 16.0076):
Select customer.cust_no, cust_name, count(distinct(item_no)) from customer, invoice,
invoiceitem where customer.cust_no = invoice.cust_no (+) and invoice.invoice_no =
invoiceitem.invoice_no (+) group by customer.cust_no, cust_name;
8. Ambiguous Management wants to know which customers we've shipped goods
more than 10 times to them by the shipper that they requested.
Clear List customer number, name, and count of invoices, where the
actual carrier is the same as the customer's preferred carrier,
having more than 10 shipments.
Model Answer (Halstead’s Complexity: 16.2684):
Select customer.cust_no, cust_name, count(*) from Invoice, Customer where
invoice.cust_no = customer.cust_no and invoice.carrier_code =
customer.pref_carrier_code group by customer.cust_no, cust_name having count(*) > 10;

48
No. Formulation Information Request
9. Ambiguous Produce a report, with best items first, on the gross contribution to
profitability of each inventory item for July 1999.
Clear List item number, item description, and (unit price less unit cost)
multiplied by units sold in July 1999. Sort your output by
descending gross contribution to profitability.
Model Answer (Halstead’s Complexity: 23.897):
select inventory.item_no, item_name, avg(avg_unit_price - avg_unit_cost) *
sum(qty_accepted) from invoice, invoiceitem, inventory where invoice.invoice_no =
invoiceitem.invoice_no and invoiceitem.item_no = inventory.item_no and deliver_date
between '1-Jul-99' and '31-Jul-99' group by inventory.item_no, item_name order by 3 desc;
10. Ambiguous Produce a report with the relevant customer details that gives us an
idea of how much of our business is exposed to foreign currency
fluctuations.
Clear List customer number, customer name, customer country, and a
total of the amount paid where the settlement currency code for the
invoice is not equal to the currency code for Australian dollars.
Group results by customer number.
Model Answer (Halstead’s Complexity: 19.4819):
Select customer.cust_no, cust_name, country, sum(amt_paid) from customer, invoice,
currency where customer.cust_no = invoice.cust_no and invoice.currency_code =
currency.currency_code and currency.currency_name <> ‘Australian Dollar’ Group by
customer.cust_no, cust_name, country;
11. Ambiguous Management is concerned about current slow-moving inventory
items, based on shipments since 1 June 1999. Produce a report of
the items that they might be most concerned about.
Clear List inventory item number, item description, quantity on hand,
and sum(quantity shipped) with ship dates greater than 1 June
1999 that have sums of the quantity shipped less than the sums of
the quantity on hand.
Model Answer (Halstead’s Complexity: 22.4):
Select inventory.item_no, item_name, sum(qty_hand), sum(qty_shipped) from inventory,
invoiceitem, invoice where inventory.item_no = invoiceitem.item_no and
invoiceitem.invoice_no = invoice.invoice_no and ship_date > ‘1-Jun-99’ group by
inventory.item_no, item_name having sum(qty_shipped) < sum(qty_hand);

49
No. Formulation Information Request
12. Ambiguous Produce a report that gives some idea about our best USA export
items where the amount since March is bigger than $5,000.
Clear List item numbers, item descriptions and the total accepted
quantity times agreed price of each item for items shipped to US
customers since 1 March 1999 and having a total accepted quantity
times agreed price greater than $5,000.
Model Answer (Halstead’s Complexity: 29.1633):
select inventory.item_no, item_name, sum(qty_accepted * agreed_unit_price) from invoice,
invoiceitem, inventory, customer where invoice.invoice_no = invoiceitem.invoice_no and
invoiceitem.item_no = inventory.item_no and customer.cust_no = invoice.cust_no and
ship_date > '1-Mar-99' and country = ‘USA’ group by inventory.item_no, item_name
having sum(qty_accepted * agreed_unit_price) > 5000;
13. Ambiguous Produce a report showing our Japanese client base that didn't order
anything in July. We're going to need an idea of how many
invoices and things like that that we have for them. We're
concerned about why our orders have dropped off. Can you use
that statistical thing (you know, the one that gives an idea of how
the numbers are varying, not variance, the other one) to show
whether the date the stuff is delivered is different to the date they
wanted the stuff?
Clear List customer number, customer name, number of invoices, and
standard deviation of the difference between the deliver date and
the want date for Japanese customers who did not place an order in
July 1999.
Model Answer (Halstead’s Complexity: 24.0168):
select customer.cust_no, cust_name, count(invoice_no), stddev(deliver_date - want_date)
from customer, invoice where customer.cust_no = invoice.cust_no and country = 'Japan'
and customer.cust_no not in (select cust_no from invoice where order_date between '1-Jul-
99' and '31-Jul-99') group by customer.cust_no, cust_name;
14. Ambiguous We want to have a mail-out to our best customers (say, those who
paid us more than $5000 or so recently, and those with credit
limits over $20,000). We're interested in seeing if we can move
that new Hunter Valley shipment. Can you get us a mailing list?
Clear List customer number, name, street, city, state, post code, and
country for those customers with credit limits greater than $20,000
or since 1 July 1999 have total paid invoices of more than $5,000.
Model Answer (Halstead’s Complexity: 29.9607):
select customer.cust_no, cust_name, street, city, state, post_code, country from customer,
invoice where customer.cust_no = invoice.cust_no group by customer.cust_no, cust_name,
street, city, state, post_code, country having sum(amt_paid) > 5000
UNION
select customer.cust_no, cust_name, street, city, state, post_code, country from customer
where credit_limit > 20000;

50
No. Formulation Information Request
15. Ambiguous Produce a report that shows the percentage of orders where we're
not meeting customers' delivery date expectations in each country.
Clear Count all invoices, where the date the order was delivered was
larger than the date the customer wanted the order. Group by
country. Calculate the percentage of late orders by country.
Model Answer (Halstead’s Complexity: 34.992):
Create View TotalOrders as select country, count(*) Total_Orders from customer, invoice
here customer.cust_no = invoice.cust_no group by country;
Create view LateOrders as select country, count(*) Late_Orders from customer, invoice
where customer.cust_no = invoice.cust_no and deliver_date > want_date group by country;
Select total_orders.country, 100*(late_orders / total_orders) Percent_Late_Orders from
lateorders, totalorders where totalorders.country = lateorders.country;
16. Ambiguous Produce a report that shows, by country, which carriers are, on
average, not meeting their expected delivery times.
Clear List carrier code, carrier name, country, and average of (delivery
days less the difference between delivery date and ship date) by
country having that average difference greater than 1 day.
Model Answer (Halstead’s Complexity: 40.1661):
select carrier.carrier_code, carrier_name, delivdays.country avg((deliver_date - ship_date)
- deliver_days) from carrier, invoice, customer, delivdays where carrier.carrier_code =
invoice.carrier_code and invoice.cust_no = customer.cust_no and carrier.carrier_code =
delivdays.carrier_code and customer.city = delivdays.city and customer.state =
delivdays.state and customer.country = delivdays.country group by carrier.carrier_code,
carrier_name, delivdays.country having avg((deliver_date - ship_date) - deliver_days) > 1;

51
Appendix B: Experiment Instruction Sheet

INSTRUCTIONS
This laboratory session requires you to execute command files and query a database.

Please follow the instructions carefully.

52
Part 1 - Scenario

George Harford Wine Merchant distributes wines throughout the world. They predominantly
trade with customers in France, Japan, the USA, and the UK. Customers place orders for
wines which employees process, pack, and ship to the customers via an appropriate carrier.
The packers attach an invoice created by the Accounts Receivable department to the goods
when shipped. These invoices contain all relevant information generated from the invoice and
inventory databases. The data structures for the relevant tables are attached.

53
Part 2 - SQL Syntax Reminder

The SQL syntax for SELECT commands follows. Items in square brackets [ ] are optional,
and items in braces { } can be repeated zero or more times:

SELECT [DISTINCT]*|(((table. | view.)column | expression) [alias]


{, ((table. | view.)column | expression)[alias]})
FROM (table|view)[alias]{,(table | view)[alias]}
WHERE condition {, condition}
[GROUP BY expression{,expression} [HAVING condition{,condition}]]
[(UNION|UNIONALL|INTERSECT|MINUS) SELECT command]
[ORDER BY (expression|position)[DESC]{,(expression|position)
[DESC]}];

Only under highly unusual circumstances should you formulate a select command that
contains more than one table in the FROM clause without a join in the WHERE cause. As a
general rule, the number of joins should equal to the number of foreign key attributes. Except
for extremely rare queries that usually produce only summary results (such as counting the
number of records in a table), all SQL queries, even those involving only one table, should
include WHERE conditions.

You may need to use some of the following keywords

AND MIN SYSDATE


AVG NOT UNIQUE
COUNT NULL VARIANCE
DISTINCT OR (+) (outer join)
IN STDDEV
MAX SUM

The SQL syntax for VIEW commands follows

CREATE VIEW viewname AS (SELECT command);

When you create a view with the same name as an already-existing view (for example, you
rerun your query), you will need to drop the already-existing view:

DROP VIEW viewname;

Reminders:
 Aliases for columns in views should not be enclosed in quotes.
 If you have multiple join conditions, i.e., more than one foreign key or a concatenated
foreign key, you may need to put the outer join symbol on other join conditions.

54
Part 3 - Getting started

Log into your area on valinor. For the purposes of assessment, everything you do in this
laboratory session needs to be recorded and sent to the instructor. Follow the instructions
carefully. In particular, please refrain from running more than one session on valinor because
running more than one session will mean that all your query attempts will not be recorded. To
begin this quiz, type the following at the valinor prompt:

valinor> ksh
valinor> /home/staff/bowen/startqz199b

Follow the instructions given by the program carefully. You can attempt each query as many
times as you wish.

You should note that once you accept a query, you cannot return to the question again.

55
Part 3 - Getting started

Log into your area on valinor. For the purposes of assessment, everything you do in this
laboratory session needs to be recorded and sent to the instructor. Follow the instructions
carefully. In particular, please refrain from running more than one session on valinor because
running more than one session will mean that all your query attempts will not be recorded. To
begin this quiz, type the following at the valinor prompt:

valinor> ksh
valinor> /home/staff/bowen/startqz199a

Follow the instructions given by the program carefully. You can attempt each query as many
times as you wish.

You should note that once you accept a query, you cannot return to the question again.

56
Part 4 - Your Mission

You are an internal auditor at George Harford. On 16 August 1999, your supervisor
approaches you with a list of questions. Some questions were designed by the supervisor,
who knows SQL well. Your supervisor was also given questions from management, who do
not know SQL all that well.

Your task is to formulate and execute SQL queries to answer these questions.

Your supervisor is gone for the day and getting answers for these questions is urgent.
Therefore, you need to make your best interpretation of the questions from management. You
can discuss with your supervisor the assumptions you made after she returns. However, she
will be most annoyed if you do not make an attempt to answer as many of the questions as
you can prior to her return.

The questions have been structured so that easier questions appear first and then become
progressively more difficult.

Your supervisor wants to see the complete SQL queries that you use. When the question is
phrased asking for a name, your query should use criteria that include that name i.e. you
should not look up the code to avoid joining to the table that contains the name.

57
Appendix C: Command Interpreter Unix Shell Script

Two Unix Shell Scripts were used to operate the experiment. The two scripts were essentially

identical except that they used different source data depending on the treatment initially

received by the different experimental groups (the variable $quizfile). This script has been

developed, modified, and enhanced from previous experiments undertaken within the Faculty

of Commerce at the University of Queensland (Borthick et al. 1997; Borthick et al. 2000).

The interface source code had been previously developed by Mr Andrew Jones.

58
#!/bin/ksh
## /\ndy. 28/08/98. version 0.02

## NB. this script requires ksh because it uses "read -u".


## The rest of it should run in any sh-compatible shell (sh, bash, ksh etc)

## DoLog() - A utility function to append a message to our log file.


## As it stands, each line contains the username, process ID, date, time,
## and a message
## eg.
## [jones] <4268> 28/08 11:41:09: Displaying question 3
## [jones] <4298> 28/08 11:41:12: Attempting question 3 Attempt number 1

DoLog()
{
## %a = day, %e = date, %m = month. %T = time.
now=`date +"[$username] <$$> %e/%m %T:"`
echo "$now $*" >> $logfile
}

## Obtain the username of the person running this program, for the log
file.
## No need to change this.
###username=${USER:-$LOGNAME}
username=`whoami`

## CONFIGURE THIS:
## "quizfile" is a variable which contains the name of the file with the
## questions you wish to present to the students. You should edit this
## script to set this variable to the appropriate value.
## If this variable is null, then the program will expect a single
## command-line argument, which will be the filename of the question file.
##
## The question file should contains questions, one per line.
##
## Note that the user running this program must have access privs to the
## question file and the directories above it...

## eg. quizfile="/home/staff/bowen/questions"
quizfile="/home/staff/bowen/questions99qz1b"

## CONFIGURE THIS:
## Location of the log file to record what people do.
## You can reset this to whatever you like, but make sure that everyone
## can append to it. Also note that files in /tmp disappear when
## valinor is restarted. /var/tmp might be safer, but who knows.
##
## Probably best if you make a logfile directory in your home dir,
## chmod it to mode 1777 and put the log files in there...
##
## Note: If the log file does not already exist, this program will now
## create it. This better allows per-user log files to work.
## However, if you are using only one log file, it is a better idea
## if you create and chmod it yourself...

59
#logfile="/var/tmp/sql.log" # one log for all users..

#logfile="/var/tmp/sql.$username.log" # one log per user...


logfile="/home/staff/bowen/logfile/qz199/$username.log"

## Editor to use. pico is the easiest.. esp if we run it in "tool" mode...


editor="pico -t"

## temporary filenames.
tmp="/tmp/qn-$username.$$"
attfile="$HOME/answer.$$"

qnum=1 # question number


attnum=0 # attempt number

## Set up a clean up routine to clean up after ourselves in case we die..


trap 'rm -f "$attfile" "$tmp"; exit 1' 1 3 15 8

## "echo -n" is supposed to print without a newline.


## This little hack ensures it will on valinor...
PATH=/usr/ucb:${PATH}

## ---------------------------------------------------------------------
## End of configuration section: Start of program.

## Create the log file if it doesn't exit...


if [ ! -f "$logfile" ]
then
> $logfile
chmod 666 $logfile
DoLog "StartUp: Created this Log file."
fi

if [ -z "$quizfile" ]
then
## No $quizfile, so we expect a question file command-line argument.
if [ $# != 1 ]
then
echo "Usage: `basename $0` file-with-questions"
DoLog "Error: No quizfile and no cmd line argument."
exit 1
fi

quizfile="$1"
fi

## Make sure we can read the file. NB. this requires some permissions on
the
## directory containing the file, and that directory's parent, and ...
if [ ! -f "$quizfile" ]
then
echo "Error: Unable to read file: \"$quizfile\"."
DoLog "Error: Can't open file $question (pwd=`pwd`)"

exit 2

60
fi

## Splash screen telling them what will happen.


DoLog "Startup: Showing splash screen."
clear
cat <<ENDOFBLURB
CO365 DATABASE MANAGEMENT SYSTEMS IN BUSINESS

QUIZ ONE

In this exercise, you will be presented with a series of problems.

The first problem will be displayed, and then the system will wait
for you to hit the <RETURN> (aka the <ENTER>) key.
This gives you time to read and absorb the problem.

After you hit the <ENTER> key, you will be taken into the user-friendly
editor "pico", where you can compose a solution. When you are satisfied,
quit the editor with the Control-X command. Your solution will be run,
and any output will be displayed on your screen.

You will then be asked whether you are happy with your solution.
If you are not, then you can re-edit your first attempt and try again.
Otherwise, you will be asked to rank your confidence in your solution.

You then continue on to the second problem, and so on...

ENDOFBLURB

echo -n "Hit the <RETURN> key to continue."


read junk
echo
echo

clear

DoLog "Startup: Finished showing splash screen."

exec 3<"$quizfile"

qnum=1

## This is the main loop of the program.


while read -u3 question
do
## if we are between questions, make the screen tidier.
if [ "$qnum" -gt 1 ]
then
clear
## echo
echo "Ok. Onto the next question."
echo
fi

thisattmpt="retry"
attnum=0 # attempt number
> $attfile

61
## attempt the current question.
while [ "$thisattmpt" != "accept" ]
do
attnum=`expr $attnum + 1`
clear
echo "Question #$qnum:"
echo
echo "$question"
echo

if [ $attnum = 1 ]
then
echo
echo "--------------------------------------------------"
echo "When you are finished reading the question, hit the
<ENTER> key, to start"
echo -n "using an editor to create your solution. "
DoLog "Displaying question $qnum"
else
echo
echo "--------------------------------------------------"
echo "Your current solution is ..."
sed -e 's/^/| /' < $attfile

echo
echo -n "Hit the <ENTER> key to re-edit this... "
fi

# pause here until they hit RETURN


read junk

DoLog " Attempting question $qnum Attempt number $attnum"


$editor $attfile

## cp $attfile $username.sql
## echo "quit" >> $username.sql
echo
echo "Ok. Now testing this solution..."
echo

## FIXME: Need to make sure that the Oracle environment


## is properly set up so that they can run sqlplus...
## Plus, the /dev/null thing is crude, but probably enough to
## prevent them getting into an interactive oracle session...
sqlplus / @$attfile < /dev/null

## Reformat of output allows users to use data more


## interactively. Micheal Axelsen 1999.
## Disabled since they can then end up in a cartesian
## product join.

## echo "Attempting Question: $qnum" > $username.lst


## echo "" >> $username.lst

## cat "$question" >> output_screen


## echo "" >> $username.lst

## echo "Your SQL Query:" >> $username.lst


## echo >> $username.lst
## cat $attfile >> $username.lst
## echo "" >> $username.lst

62
## echo "Results:" >> $username.lst
## sqlplus / @$username.sql >> $username.lst
## $editor $username.lst

## Should we pipe output into less for them to see?

echo

## Should we capture their attempt?


DoLog " The attempt was ..."
sed -e "s/^/[$username] <$$> Qn: $qnum Att: $attnum /" <
$attfile >> $logfile

## ask if happy with this attempt or not


echo "Are you happy with this attempt, or do you want to try
again?"
PS3="Choice: "
select thisattmpt in retry accept
do
if [ -n "$thisattmpt" ]
then
echo "Ok."
break
fi
echo "Invalid response. Try again."
done

echo

done

DoLog "Completed question $qnum Number of attempts was $attnum"

## DoLog "The final solution was ..."


## sed -e 's/^/| /' < $attfile >> $logfile

## Ask here how confident they are...


echo "How confident are you about your solution?"

PS3="Confidence? "
select conf in "85-100%" "70-85%" "55-70%" "40-55%" "25-40%" "10-25%"
"<10%"
do
if [ -n "$conf" ]
then
echo "Ok."
break
fi
done

DoLog "Confidence for question $qnum was $conf"

echo
echo "Ok. Now what?"
PS3="What now? "
select whatnow in "Contine to next question" "Quit"
do
if [ -n "$whatnow" ]

63
then
break
fi
done

if [ "$whatnow" = "Quit" ]
then
echo
echo "Are you sure you want to quit?"
PS3="Confirm quit: "
select confirm in yes no
do
if [ -n "$confirm" ]
then
break
fi
done

if [ "$confirm" = "yes" ]
then
echo "Ok. Quitting now."
break
else
echo "Ok. Not quitting."
fi
fi

## NB. It's more efficient to use the shell's built in arithmetic...


qnum=`expr $qnum + 1`
done

DoLog "Quitting."
rm -f "$attfile" "$tmp"

echo "Bye..."

64
Appendix D: Experiment Entity-Relationship Diagram

Delivdays
Carrier_code+ FK =
Carrier_code Carrier
City+
Carrier_code+
State+
Carrier_name
Country+
Carrier_type
Deliver_days

FK = Carrier_code

Customer
FK = Emp_no
Cust_no+ Invoice
Invoice_no+ Employee
Cust_name
Order_date Emp_no+
Phone_no
FK = Cust_no Cust_no Emp_name
Street
City Ship_date
State Want_date
Post_code Deliver_date
Country Paid_date
Credit_limit Fob_code FK =
Outstanding_bal Disc_pct Currency_code
Pref_carrier_code Disc_days + [Appropriate Currency
Currency_code Dates] Currency_code+
Amt_paid Currency_name
FK = Fob_code Carrier_code Currency_date+
Emp_no Currency_rate
Fob
Fob_code+
Fob_name
FK = Invoice_no

Inventory
Item_no+ Invoiceitem
Item_name Invoice_no+
Item_maker FK = Item_no Item_no+
Item_package Unit_meas
Item_year Quoted_unit_price
Type_of_alc Agreed_unit_price
Alc_category Qty_shipped
Alc_content Qty_accepted
Avg_unit_cost Diff_cause
Unit_meas
Avg_unit_price
Qty_hand
Qty_ordered

FK = Foreign Key
+ Primary Key

65
Abbreviation Type Description

Table: Invoice
Invoice_no Char(7) Invoice number
Order_date Date Date the order was placed
Cust_no Char(5) Customer number
Ship_date Date Date the order was shipped
Want_date Date Date the order was wanted by the customer
Deliver_date Date Date the order was delivered
Paid_date Date Date the invoice was paid
Fob_code Char(1) FOB code {1,2}
Disc_pct Number Discount percent, e.g. 1, 1.5, 2, 2.25
Disc_days Number Discount days - start day depends on FOB
Currency_code Char(1) Settlement currency code
Amt_paid Number Amount paid in Australian dollars
Carrier_code Char(5) Carrier code of carrier that delivered the order
Emp_no Char(4) Employee number of person who packed the order

Table: Customer
Cust_no Char(5) Customer number
Cust_name Char(20) Customer's name
Phone_no Char(15) Customer's telephone number
Street Char(30) Customer's street address
City Char(20) Customer's city
State Char(20) Customer's state
Post_code Char(10) Customer's post code
Country Char(20) Customer's country
Credit_limit Number Customer's credit limit
Outstanding_bal Number Customer's outstanding balance (amount owing)
Pref_carrier_code Char(5) Customer's preferred carrier

Table: Carrier
Carrier_code Char(5) Carrier code
Carrier_name Char(20) Carrier's nae
Carrier_type Char(8) Type of carrier {air, surface}

Table: Fob
Currency_code Char(1) Currency code
Currency_name Char(15) Name of currency
Currency_date Date Date for which the currency rate applies
Currency_rate Number Currency rate as of the currency date, i.e. the
number of units of the currency that one Australian
dollar will purchase, e.g., one Australian dollar can
currency be exchanged for approximately 0.65 US
dollars.

66
Table: Delivdays
Carrier_code Char(5) Carrier code
City Char(20) Deliver to city
State Char(20) Deliver to state
Country Char(20) Deliver to country
Deliver_days Number Expected number of calendar days for the carrier to
deliver merchandise to the city, state, and country,
i.e., the carrier's estimate of the time required to
deliver an order to the destination described by city,
state, and country.

Table: Employee
Emp_no Char(4) Employee number
Emp_name Char(20) Employee's name

Table: Invoiceitem
Invoice_no Char(7) Invoice number
Item_no Char(7) Inventory item number
Unit_meas Char(5) Unit of measure for item {case, each}
Quoted_unit_price Number Quoted unit cost of the item in Australian dollars
Agreed_unit_price Number Agreed unit cost of the item in Australian dollars
Qty_shipped Number Quantity of the item shipped to the customer
Qty_accepted Number Quantity of the item accepted by the customer
Diff_cause Char(15) Reason for differences in costs or quantities {broken
bottle, damaged cork, late delivery, no diff,
shortage, sugary, vinegary}

Table: Inventory
Item_no Char(7) Inventory item number
Item_name Char(20) Name or description of the item
Item_maker Char(20) Maker of the item, e.g. the vintner
Item_package Char(15) How each component of the item is packaged
{bottle, can, cardboard box}
Item_year Number Year the item was produced.
Type_of_alc Char(5) Type of alcohol {beer, wine}
Alc_category Char(15) Alcohol category {dark, dry, full strength, light,
mid-strength, red, sparkling, white}
Alc_content Number Alcohol content e.g. full strength beers are typically
about 5.0 (percent) and wines are typically between
12 and 14 (percent)
Avg_unit_cost Number Average price per unit at which the item was
purchased from the item maker
Unit_meas Char(5) Unit of measure for item {case, each}
Avg_unit_price Number Average price per unit at which the item is sold to
customers
Qty_hand Number Quantity of the item on hand
Qty_ordered Number Quantity of the item ordered in the last 12 months

67
Appendix E: Experimental Design

Stratification Into Group A and Group B

To control for a testing effect (Huck et al. 1974), and to ensure even representation of skill

sets across Group A and Group B, participants were stratified into classes. This stratification

was in accordance with participants' previous subject enrolments. Participants within each

strata class were then ranked according to their current enrolment subject and their

performance in earlier subjects, and their experience with database query languages. Thirteen

groups were used to classify participants. Table 11 shows the final strata class ordering, and

the number of participants in each strata class.

This process resulted in a ranked listing of participants from one to sixty-six. The

experimental treatment effect of manager-English (ambiguous) and pseudo-SQL (clear) was

assigned randomly to the first student on this list and then alternately to each student

thereafter. This resulted in two student groups with equivalent participant counts: Group A

and Group B. Group A's first question formulation was ambiguous, and then alternately clear

and ambiguous thereafter. Group B's first question formulation was clear, and then

alternately ambiguous and clear thereafter.

68
Table 11
Participant Strata Classes
Strata Class Participant Count Description
865(1) 4 Students in the postgraduate Database Design
subject who had previously participated in more
than one similar experiment.
365(1) 1 Students in the undergraduate Database Design
subject who had previously participated in more
than one similar experiment.
365(2) 1 Computer Science students in the undergraduate
Database Design subject who had previously
participated in a similar experiment.
865(2) 15 Students who had undertaken a database design
course previously and enrolled in the
postgraduate database design subject.
365(3) 10 Students who had undertaken a database design
course previously and undertaking the
undergraduate database design course.
865(3) 2 Students who had undertaken a database design
course previously (but not at University of
Queensland) and undertaking the postgraduate
database design course.
365(4) 13 Students who had undertaken advanced
information systems courses previously and
undertaking the undergraduate database design
course.
865(4) 3 Students who had undertaken information
systems courses previously and undertaking the
postgraduate database design course.
365(5) 6 Students who had undertaken introductory
computer courses previously and undertaking the
undergraduate database design course.
365(6) 3 Students who had undertaken no information
system or computer courses previously and
undertaking the undergraduate database design
course.
865(5) 6 Students undertaking the postgraduate database
design course with no available academic
history.
365(7) 2 Students undertaking the undergraduate database
design course with no available academic
history.

69
The Experiment

The experiment was held over two days during the fourth week of instruction. Students

undertook a two hour closed-book (no reference material allowed) experiment on computer,

with no perusal time, in their normal classes. The random assignment of membership to

Group A and Group B had the purpose and effect of ensuring an even representation of

Group A and Group B in each class.

Participants knew before the experiment that questions increased in complexity, that there

were sixteen questions in total, and that, once a question had been completed, they could not

return to their answer. Participants were also aware that the number of attempts they made

on the question did not affect their mark.

An instruction sheet was provided to participants (refer Appendix B), depending on the

treatment group (A or B) to which the participant had been previously assigned. The only

point of difference between the two groups' instruction sheet was the name of the Unix

command script file to use: startqz199a for Group A and startqz199b for Group B. The

instruction sheet contained an overview of SQL syntax as a reference for participants.

Further, an entity-relationship diagram was provided to describe the database being used, as

reproduced in Appendix D.

Participants could make reference notes on working paper if they required. Participants

returned these materials to the examiner at the end of the experiment. The question

formulations used in the experiment and model answers are reproduced in Appendix A.

70
There were two examiners present (the course lecturer and the researcher). Assistance was

provided to participants in the operation of the experimental program (the Unix command

script). Assistance was also provided on some technical aspects of SQL on request.

User Interface and Query Development Process

Appendix C contains an example of the Unix command interpreter script used by participants

to enter information using the relatively easy-to-use Pico editor, with which they were

familiar. The command interpreter presented the question to the participant. On the

completion of an attempt, the SQL result set was displayed. If the participant did not

consider the results presented to be their final response, the participant could return to the

SQL formulation. If the participant considered the result satisfactory, the participant would

be prompted to rank their confidence in the solution, and proceed to the next question.

Hence, the participant was able to interactively build and test their response until they were

confident in their answer. This confidence was self-assigned on the following scale: >85-

100%, 70-85%, 55-70%, 40-55%, 25-40%, 10-25% and <10%.

The questions were only available electronically. The questions were presented alternately

ambiguous (natural language) and clear (pseudo-SQL). A participant in Group A received an

ambiguous formulation for Question One, clear for Question Two, ambiguous for Question

Three, and so on. A participant in Group B had clear for Question One, ambiguous for

Question Two, clear for question three, and so on. The required answer was identical for

both formulations of the same question.

71
Appendix F: Error Marking Sheets

Semantic Error Counting Form

User Name Question Number Attempts

Confidence:
Duration:

MICRO ERRORS
Keywords
View Select From Where Join Where Cond Group by Having Order by

Symbols
View Select From Where Join Where Cond Group by Having Order by

Logical Operators
View Select From Where Join Where Cond Group by Having Order by

Relational Operators
View Select From Where Join Where Cond Group by Having Order by

Tables
View Select From Where Join Where Cond Group by Having Order by

Attributes
View Select From Where Join Where Cond Group by Having Order by

Values
View Select From Where Join Where Cond Group by Having Order by

Set Operators
Where Union Intersect Minus

MACRO ERRORS
Columns Rows Aggregation

72
SQL Challenge Error Counting Form

User Name Question Number Attempts

Confidence:

SQL CHALLENGE EXPRESSION


Present Challenge Response Comment

Distinct Keyword in Select Clause P/A 1 2 3 4 5 6 7

Built-in Function (Avg, Sum, Std P/A 1 2 3 4 5 6 7


Dev, etc)

Mathematical Expression in Select P/A 1 2 3 4 5 6 7


Clause

Mathematical Expression in Where P/A 1 2 3 4 5 6 7


Clause

Mathematical Expression in P/A 1 2 3 4 5 6 7


Having Clause

ERD (Join not shown on ERD) P/A 1 2 3 4 5 6 7

Join P/A 1 2 3 4 5 6 7

Outer Join P/A 1 2 3 4 5 6 7

Subquery P/A 1 2 3 4 5 6 7

Or (Where or Having) P/A 1 2 3 4 5 6 7

Between P/A 1 2 3 4 5 6 7

Not Equal P/A 1 2 3 4 5 6 7

Group By P/A 1 2 3 4 5 6 7

Having P/A 1 2 3 4 5 6 7

View P/A 1 2 3 4 5 6 7

73
Intermediate Error Counting Form

User Name Question Number Attempts

Confidence:

Column Errors
Missing
Extra
Wrong (in contrast with missing &
extra columns)

Table Errors
Missing
Extra
Wrong

Row Restriction
Missing
Extra
Wrong
Logical Operator

Join Restrictions
Missing
Extra
Wrong

Aggregation Level (Group by/Aggregation in Select)


Missing
Extra
Wrong

Aggregation Restriction (Having)


Missing
Extra
Wrong

Sort/Order by
Missing
Wrong Attribute Order
Wrong Direction (ascending,
descending)
Wrong

74
Appendix G: Annotated Corrected Participant Response

This appendix provides an annotated example of the process used to correct participant

responses according to the model answer. This question was chosen to provide a flavour of

the methodology used to determine and classify errors. The response shown here is the fifth

participant's response (in order of assessment) to the third question.

Model Answer:

Select cust_no, cust_name, country, credit_limit from customer where credit_limit > 15000

or country = 'Japan';

Actual Response:

Select cust_no, cust_name, country, credit_limit from customer where credit_limit > 15000

and country = 'japan';

Annotated Response:

Select cust_no, cust_name, country, credit_limit from customer where credit_limit > 15000

and (1) or (2) country = 'j (3) J(4)apan';

In this annotated response, the superscript number in brackets indicates the error count. In

this response there are four micro errors.

75
Micro Error Sheet:

Errors (1) and (2) result in a total of two logical operator errors in the WHERE COND clause.

Errors (3) and (4) result in a total of two value errors in the WHERE COND clause.

Macro Error Sheet

There are two row errors here, as there are two errors in the WHERE COND clause.

SQL Challenge Sheet

The SQL Challenge presented in this question is the "Or (Where or Having)" challenge. The

challenge is present, and the participant's response to the challenge was poor, resulting in a

"1" assessment.

Intermediate Error Counting Sheet

In this response there are two row restriction errors, one "wrong" row restriction and one

"logical operator" error.

76
Appendix H: Pearson Correlation Matrix of Variables

Total Errors
Complexity

Extraneous
Confidence

Syntactical
Ambiguity

Suggestive
Pragmatic

Emphatic
Inflective
Attempts

Duration

Lexical

GPA
Ambiguity 1.0000
one-sided p 0.0000
Complexity -0.0330 1.0000
one-sided p 0.2488 0.0000
Attempts 0.1247 0.3312 1.0000
one-sided p 0.0050 0.0000 0.0000
Confidence -0.0961 -0.2463 -0.4242 1.0000
one-sided p 0.0239 0.0000 0.0000 0.0000
Duration 0.1729 0.2932 0.6905 -0.4282 1.0000
one-sided p 0.0002 0.0000 0.0000 0.0000 0.0000
Total Errors 0.2421 0.4783 0.2742 -0.3241 0.3653 1.0000
one-sided p 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Lexical 0.7169 -0.0593 0.0847 -0.1213 0.2241 0.2165 1.0000
one-sided p 0.0000 0.1114 0.0406 0.0062 0.0000 0.0000 0.0000
Syntactical 0.6103 -0.1196 0.0532 0.0153 -0.0122 -0.0491 0.0855 1.0000
one-sided p 0.0000 0.0068 0.1367 0.3769 0.4007 0.1564 0.0391 0.0000
Inflective 0.3957 -0.0219 -0.0602 0.0698 0.0118 0.2534 0.2816 0.1606 1.0000
one-sided p 0.0000 0.3266 0.1079 0.0754 0.4045 0.0000 0.0000 0.0004 0.0000
Pragmatic 0.4735 -0.1131 0.0877 -0.0403 0.1057 0.2521 0.4378 0.1257 0.2299 1.0000
one-sided p 0.0000 0.0098 0.0354 0.2035 0.0146 0.0000 0.0000 0.0048 0.0000 0.0000
Extraneous 0.1855 0.3333 0.1410 -0.0223 0.2183 0.5764 0.2616 -0.2611 0.5837 0.3314 1.0000
one-sided p 0.0001 0.0000 0.0018 0.3234 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Emphatic 0.7173 0.1914 0.1886 -0.1490 0.2482 0.3588 0.7100 0.2746 0.1177 0.2486 0.2870 1.0000
one-sided p 0.0000 0.0000 0.0000 0.0010 0.0000 0.0000 0.0000 0.0000 0.0076 0.0000 0.0000 0.0000
Suggestive 0.4930 0.2863 0.1432 -0.0270 0.1927 0.5611 0.3881 0.1127 0.5723 0.4139 0.8347 0.4058 1.0000
one-sided p 0.0000 0.0000 0.0015 0.2893 0.0000 0.0000 0.0000 0.0101 0.0000 0.0000 0.0000 0.0000 0.0000
GPA (n=420) 0.0000 0.1256 -0.0842 0.1764 -0.1256 -0.1313 -0.0282 0.0336 0.0099 0.0079 -0.0013 0.0010 0.0275 1.0000
one-sided p 0.4999 0.0050 0.0424 0.0001 0.0050 0.0035 0.2820 0.2463 0.4196 0.4358 0.4891 0.4919 0.2869 0.0000

77
Appendix I: Analysis of Ambiguity's Effect On Error Type
Question One
SQL Type Keywords Symbols Logical Relational Tables Attributes Values Total:
Component Operators Operators
View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select A 0.156 0.406 0.000 0.000 0.000 0.344 0.000 0.906
Select C 0.091 0.273 0.000 0.000 0.000 0.182 0.000 0.545
From A 0.000 0.000 0.000 0.000 0.250 0.000 0.000 0.250
From C 0.000 0.061 0.000 0.000 0.091 0.000 0.000 0.152
Where Join A 0.031 0.063 0.000 0.031 0.063 0.063 0.000 0.250
Where Join C 0.030 0.061 0.000 0.030 0.061 0.061 0.000 0.242
Where Cond A 0.031 0.063 0.000 0.031 0.000 0.031 0.031 0.188
Where Cond C 0.000 0.000 0.000 0.000 0.000 0.000 0.061 0.061
Group By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Group By C 0.000 0.030 0.000 0.000 0.000 0.030 0.000 0.061
Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By C 0.030 0.000 0.000 0.000 0.000 0.030 0.000 0.061
Total A 0.219 0.531 0.000 0.063 0.313 0.438 0.031 1.594
Total C 0.152 0.424 0.000 0.030 0.152 0.303 0.061 1.121

SQL Type Set Summary Error Response


Component Operators Average Count
Where A 0.000 Ambiguous 1.594 32
Where C 0.000 Clear 1.121 33
Union A 0.000
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.000
Minus C 0.000
Total A 0.000
Total C 0.000

Question Two
SQL Type Keywords Symbols Logical Relational Tables Attributes Values Total:
Component Operators Operators
View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select A 0.091 1.394 0.030 0.000 0.061 1.212 0.030 2.818
Select C 0.000 0.061 0.000 0.000 0.000 0.000 0.000 0.061
From A 0.061 0.030 0.000 0.000 0.121 0.000 0.000 0.212
From C 0.000 0.000 0.000 0.000 0.030 0.000 0.000 0.030
Where Join A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Join C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Cond A 0.182 0.364 0.000 0.091 0.121 0.364 0.121 1.242
Where Cond C 0.030 0.152 0.000 0.000 0.000 0.030 0.000 0.212
Group By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Group By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Total A 0.333 1.788 0.030 0.091 0.303 1.576 0.152 4.273
Total C 0.030 0.212 0.000 0.000 0.030 0.030 0.000 0.303

SQL Type Set Summary Error Response


Component Operators Average Count
Where A 0.000 Ambiguous 4.273 33
Where C 0.000 Clear 0.303 33
Union A 0.000
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.000
Minus C 0.000
Total A 0.000
Total C 0.000

78
Question Three
SQL Type Keywords Symbols Logical Relational Tables Attributes Values Total:
Component Operators Operators
View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select A 0.061 1.182 0.000 0.000 0.000 1.182 0.000 2.424
Select C 0.000 0.212 0.000 0.000 0.000 0.152 0.000 0.364
From A 0.030 0.000 0.000 0.000 0.030 0.000 0.000 0.061
From C 0.000 0.000 0.000 0.000 0.061 0.000 0.000 0.061
Where Join A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Join C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Cond A 0.061 0.182 0.788 0.091 0.000 0.030 0.303 1.455
Where Cond C 0.000 0.152 0.152 0.030 0.000 0.091 0.152 0.576
Group By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Group By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By C 0.000 0.000 0.000 0.000 0.000 0.030 0.000 0.030
Total A 0.152 1.364 0.788 0.091 0.030 1.212 0.303 3.939
Total C 0.000 0.364 0.152 0.030 0.061 0.273 0.152 1.030

SQL Type Set Summary Error Response


Component Operators Average Count
Where A 0.000 Ambiguous 3.970 33
Where C 0.000 Clear 1.030 33
Union A 0.030
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.000
Minus C 0.000
Total A 0.030
Total C 0.000

Question Four
SQL Type Keywords Symbols Logical Relational Tables Attributes Values Total:
Component Operators Operators
View A 0.031 0.000 0.000 0.000 0.000 0.000 0.000 0.031
View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select A 0.813 1.031 0.000 0.000 0.094 0.500 0.000 2.438
Select C 0.121 0.182 0.000 0.000 0.000 0.121 0.000 0.424
From A 0.063 0.094 0.000 0.000 0.219 0.000 0.000 0.375
From C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Join A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Join C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Cond A 0.094 0.000 0.031 0.000 0.000 0.000 0.000 0.125
Where Cond C 0.061 0.061 0.000 0.000 0.000 0.061 0.000 0.182
Group By A 0.313 0.125 0.000 0.000 0.000 0.438 0.000 0.875
Group By C 0.030 0.000 0.000 0.000 0.000 0.000 0.000 0.030
Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By A 0.031 0.063 0.000 0.000 0.000 0.094 0.000 0.188
Order By C 0.030 0.000 0.000 0.000 0.000 0.000 0.000 0.030
Total A 1.344 1.313 0.031 0.000 0.313 1.031 0.000 4.031
Total C 0.242 0.242 0.000 0.000 0.000 0.182 0.000 0.667

SQL Type Set Summary Error Response


Component Operators Average Count
Where A 0.000 Ambiguous 4.031 32
Where C 0.000 Clear 0.667 33
Union A 0.000
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.000
Minus C 0.000
Total A 0.000
Total C 0.000

79
Question Five
SQL Type Keywords Symbols Logical Relational Tables Attributes Values Total:
Component Operators Operators
View A 0.091 0.000 0.000 0.000 0.030 0.000 0.000 0.121
View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select A 0.121 1.273 0.000 0.000 0.030 1.273 0.000 2.697
Select C 0.033 0.267 0.000 0.000 0.067 0.333 0.000 0.700
From A 0.030 0.303 0.000 0.000 0.333 0.000 0.000 0.667
From C 0.033 0.233 0.000 0.000 0.333 0.000 0.000 0.600
Where Join A 0.030 1.212 0.333 0.515 1.273 1.303 0.000 4.667
Where Join C 0.033 0.700 0.200 0.233 0.667 0.733 0.000 2.567
Where Cond A 0.030 0.212 0.091 0.212 0.000 0.273 0.364 1.182
Where Cond C 0.000 0.233 0.100 0.200 0.067 0.433 0.267 1.300
Group By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Group By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Total A 0.303 3.000 0.424 0.727 1.667 2.848 0.364 9.333
Total C 0.100 1.433 0.300 0.433 1.133 1.500 0.267 5.167

SQL Type Set Summary Error Response


Component Operators Average Count
Where A 0.091 Ambiguous 9.424 33
Where C 0.033 Clear 5.200 30
Union A 0.000
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.000
Minus C 0.000
Total A 0.091
Total C 0.033

Question Six
SQL Type Keywords Symbols Logical Relational Tables Attributes Values Total:
Component Operators Operators
View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
View C 0.043 0.000 0.000 0.000 0.000 0.000 0.000 0.043
Select A 2.235 5.529 0.000 0.000 0.765 3.412 0.353 12.294
Select C 0.174 1.435 0.000 0.000 0.217 0.652 0.043 2.522
From A 0.000 0.471 0.000 0.000 0.588 0.000 0.000 1.059
From C 0.000 0.087 0.000 0.000 0.087 0.000 0.000 0.174
Where Join A 0.235 1.353 0.176 0.647 1.294 1.294 0.000 5.000
Where Join C 0.000 0.391 0.174 0.174 0.391 0.522 0.000 1.652
Where Cond A 0.176 2.118 0.941 1.647 0.294 2.118 1.471 8.765
Where Cond C 0.000 0.217 0.130 0.130 0.000 0.130 0.130 0.739
Group By A 0.706 2.059 0.000 0.000 0.647 2.353 0.000 5.765
Group By C 0.261 1.130 0.000 0.000 0.391 1.087 0.000 2.870
Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Total A 3.353 11.529 1.118 2.294 3.588 9.176 1.824 32.882
Total C 0.478 3.261 0.304 0.304 1.087 2.391 0.174 8.000

SQL Type Set Summary Error Response


Component Operators Average Count
Where A 0.059 Ambiguous 32.941 17
Where C 0.000 Clear 8.000 23
Union A 0.000
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.000
Minus C 0.000
Total A 0.059
Total C 0.000

80
Question Seven
SQL Type Keywords Symbols Logical Relational Tables Attributes Values Total:
Component Operators Operators
View A 0.200 0.000 0.000 0.000 0.067 0.000 0.000 0.267
View C 0.000 0.133 0.000 0.000 0.000 0.000 0.000 0.133
Select A 0.533 1.200 0.000 0.000 0.333 0.600 0.000 2.667
Select C 0.733 0.800 0.000 0.000 0.133 0.533 0.000 2.200
From A 0.133 0.067 0.000 0.000 0.200 0.000 0.000 0.400
From C 0.000 0.067 0.000 0.000 0.067 0.000 0.000 0.133
Where Join A 0.200 1.667 0.067 0.133 0.133 0.133 0.000 2.333
Where Join C 0.067 1.067 0.067 0.067 0.133 0.133 0.000 1.533
Where Cond A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Cond C 0.000 0.067 0.067 0.067 0.067 0.067 0.067 0.400
Group By A 0.267 0.400 0.000 0.000 0.200 0.467 0.000 1.333
Group By C 0.133 0.467 0.000 0.000 0.200 0.400 0.000 1.200
Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having C 0.067 0.133 0.000 0.067 0.133 0.133 0.000 0.533
Order By A 0.000 0.000 0.000 0.000 0.000 0.267 0.000 0.267
Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Total A 1.333 3.333 0.067 0.133 0.933 1.467 0.000 7.267
Total C 1.000 2.733 0.133 0.200 0.733 1.267 0.067 6.133

SQL Type Set Summary Error Response


Component Operators Average Count
Where A 0.000 Ambiguous 7.267 15
Where C 0.000 Clear 6.133 15
Union A 0.000
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.000
Minus C 0.000
Total A 0.000
Total C 0.000

Question Eight
SQL Type Keywords Symbols Logical Relational Tables Attributes Values Total:
Component Operators Operators
View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select A 0.167 0.667 0.000 0.000 0.000 0.167 0.000 1.000
Select C 0.200 0.400 0.000 0.000 0.000 0.100 0.000 0.700
From A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
From C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Join A 0.000 0.333 0.167 0.167 0.333 0.333 0.000 1.333
Where Join C 0.000 0.600 0.300 0.300 0.600 0.800 0.000 2.600
Where Cond A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Cond C 0.000 0.000 0.100 0.000 0.000 0.000 0.000 0.100
Group By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Group By C 0.100 0.400 0.000 0.000 0.100 0.400 0.000 1.000
Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having C 0.400 0.600 0.000 0.100 0.200 0.600 0.100 2.000
Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Total A 0.167 1.000 0.167 0.167 0.333 0.500 0.000 2.333
Total C 0.700 2.000 0.400 0.400 0.900 1.900 0.100 6.400

SQL Type Set Summary Error Response


Component Operators Average Count
Where A 0.000 Ambiguous 2.333 6
Where C 0.000 Clear 6.400 10
Union A 0.000
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.000
Minus C 0.000
Total A 0.000
Total C 0.000

81
Question Nine
SQL Type Keywords Symbols Logical Relational Tables Attributes Values Total:
Component Operators Operators
View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select A 2.000 1.000 0.000 0.000 0.000 1.000 0.000 4.000
Select C 1.500 1.000 0.000 0.000 0.000 2.000 0.000 4.500
From A 0.000 0.333 0.000 0.000 0.333 0.000 0.000 0.667
From C 0.000 0.500 0.000 0.000 0.500 0.000 0.000 1.000
Where Join A 0.000 0.667 0.333 0.333 0.667 0.667 0.000 2.667
Where Join C 0.000 1.000 0.500 0.500 1.000 1.000 0.000 4.000
Where Cond A 0.000 1.333 1.333 0.667 0.000 0.667 1.333 5.333
Where Cond C 0.000 1.000 1.000 0.500 0.000 0.500 1.000 4.000
Group By A 0.333 1.333 0.000 0.000 0.333 1.333 0.000 3.333
Group By C 0.000 1.000 0.000 0.000 0.000 1.000 0.000 2.000
Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By A 1.000 0.333 0.000 0.000 0.000 0.667 0.000 2.000
Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Total A 3.333 5.000 1.667 1.000 1.333 4.333 1.333 18.000
Total C 1.500 4.500 1.500 1.000 1.500 4.500 1.000 15.500

SQL Type Set Summary Error Response


Component Operators Average Count
Where A 0.000 Ambiguous 18.000 3
Where C 0.000 Clear 15.500 2
Union A 0.000
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.000
Minus C 0.000
Total A 0.000
Total C 0.000

Question Ten
SQL Type Keywords Symbols Logical Relational Tables Attributes Values Total:
Component Operators Operators
View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
From A 0.000 1.000 0.000 0.000 1.000 0.000 0.000 2.000
From C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Join A 0.000 2.000 1.000 1.000 2.000 2.000 0.000 8.000
Where Join C 0.000 0.500 0.250 0.250 0.500 1.000 0.000 2.500
Where Cond A 1.000 3.000 0.000 2.000 0.000 2.000 2.000 10.000
Where Cond C 0.000 0.500 0.000 0.500 0.000 0.000 0.500 1.500
Group By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Group By C 0.000 0.500 0.000 0.000 0.000 0.500 0.000 1.000
Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Total A 1.000 6.000 1.000 3.000 3.000 4.000 2.000 20.000
Total C 0.000 1.500 0.250 0.750 0.500 1.500 0.500 5.000

SQL Type Set Summary Error Response


Component Operators Average Count
Where A 0.000 Ambiguous 20.000 1
Where C 0.000 Clear 5.000 4
Union A 0.000
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.000
Minus C 0.000
Total A 0.000
Total C 0.000

82
Question Eleven
SQL Type Keywords Symbols Logical Relational Tables Attributes Values Total:
Component Operators Operators
View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select A 2.500 4.000 0.000 0.000 0.500 2.500 0.000 9.500
Select C 1.000 1.000 0.000 0.000 0.000 0.000 0.000 2.000
From A 0.500 0.000 0.000 0.000 0.500 0.000 0.000 1.000
From C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Join A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Join C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Cond A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Cond C 0.000 0.000 0.000 0.000 0.000 0.000 2.000 2.000
Group By A 0.500 1.000 0.000 0.000 0.500 1.000 0.000 3.000
Group By C 0.000 1.000 0.000 0.000 0.000 1.000 0.000 2.000
Having A 3.000 2.000 0.000 1.000 0.000 2.000 0.000 8.000
Having C 1.000 1.000 0.000 0.000 0.000 0.000 0.000 2.000
Order By A 0.500 0.000 0.000 0.000 0.000 0.000 0.000 0.500
Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Total A 7.000 7.000 0.000 1.000 1.500 5.500 0.000 22.000
Total C 2.000 3.000 0.000 0.000 0.000 1.000 2.000 8.000

SQL Type Set Summary Error Response


Component Operators Average Count
Where A 0.000 Ambiguous 22.500 2
Where C 0.000 Clear 8.000 1
Union A 0.000
Union C 0.000
Intersect A 0.000
Intersect C 0.000
Minus A 0.500
Minus C 0.000
Total A 0.500
Total C 0.000

Question Twelve
SQL Type Keywords Symbols Logical Relational Tables Attributes Values Total:
Component Operators Operators
View A
View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Select A
Select C 0.000 2.000 0.000 0.000 0.000 0.000 0.000 2.000
From A
From C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Join A
Where Join C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Where Cond A
Where Cond C 0.000 0.000 0.000 0.000 0.000 0.000 2.000 2.000
Group By A
Group By C 0.000 1.000 0.000 0.000 0.000 1.000 0.000 2.000
Having A
Having C 0.000 2.000 0.000 0.000 0.000 0.000 0.000 2.000
Order By A
Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Total A
Total C 0.000 5.000 0.000 0.000 0.000 1.000 2.000 8.000

SQL Type Set Summary Error Response


Component Operators Average Count
Where A Ambiguous
Where C 0.000 Clear 8.000 1
Union A
Union C 0.000
Intersect A
Intersect C 0.000
Minus A
Minus C 0.000
Total A
Total C 0.000

83
Appendix J: Seven Ambiguity Types Question Assessment Ratings

This table displays the average of ambiguity assessments provided by two independent non-

researchers. The scale used to assess the presence of the different type of ambiguity is:

0 1 2 3 4
None A little Some Much A Great Deal

Question Formulation

Extraneous
Syntactical

Suggestive
Pragmatic

Emphatic
Inflective
Lexical

1 Ambiguous 1.5 2 0.5 1 0.5 0.5 0.5


1 Clear 0.5 0.5 0 0 0 0 0
2 Ambiguous 2 1 0 1.5 0.5 1.5 0.5
2 Clear 1 0.5 0 1 0 0 0
3 Ambiguous 0.5 3.5 0 1 0 0.5 0.5
3 Clear 0.5 0 0 0 0.5 0.5 0
4 Ambiguous 1.5 1 0 3 0 0.5 0
4 Clear 0.5 2 0 2 0 0 0
5 Ambiguous 1.5 2.5 0 0.5 0 2 0
5 Clear 1 0.5 0 0 0.5 0 0
6 Ambiguous 1.5 0.5 0.5 3 3.5 1.5 2.5
6 Clear 0.5 0.5 0 0.5 0.5 0 0
7 Ambiguous 1.5 2.5 0 0.5 0 1 1
7 Clear 0.5 0.5 0 0 0 0 0
8 Ambiguous 0.5 3.5 0.5 0.5 0 0 0
8 Clear 0.5 0.5 0 1 0 0 0
9 Ambiguous 1.5 0.5 0 2.5 0 3 0
9 Clear 0.5 0.5 0 0.5 0 0 0.5
10 Ambiguous 2 0 0 2 0.5 1 1.5
10 Clear 0.5 0 0 0 0 0 0
11 Ambiguous 1.5 1 0 2 1 1 1.5
11 Clear 0.5 0 0 0 0 0 0
12 Clear 0.5 0 0 0.5 0.5 0 0

84
Appendix K: Ambiguity Assessment Instrument

Ambiguity Measurement Questionnaire

Type Information Request


Lexical A report of our clients for our marketing brochure mail-out.
The word "report" may have several meanings, independent of its context.
There is: a gunshot report echoing through the hillside; the Lieutenant
reported to the Captain; I dropped the heavy report on my toe, etc.
Although the context may make the meaning clear, the lexical ambiguity
that is present adds to cognitive effort and contributes to ambiguity overall
in that manner.
Syntactical A report of clients in Brisbane and on our Gold list.
The natural language "and" does not map well to its Boolean equivalent. A
legitimate interpretation would be to assume that this request is for clients
that satisfy both conditions (Brisbane-based and on the Gold List), or for
clients that satisfy either condition (Brisbane-based or on the Gold list).

Another formulation is Bob hit the man with a stick. It is not clear,
syntactically, whether it was the man with a stick, that was hit, or whether
the man was hit with a stick by Bob.
Inflective A report that is the product of our last marketing campaign regarding sales
of our accounting software product in the last month.
Inflective ambiguity here derives from the use of the word "product" with
two different meanings in the one information request. Inflective
ambiguity is where the same word is used in the one grammatical structure
(paragraph, sentence, phrase) with different meanings. Natural writing
tends to avoid this.
Pragmatic A report of all the clients for a department.
The ambiguity here is that the department has not been specified. It would
be legitimate to prepare a report for any department, although it is likely
that this will not address the needs of the person making the information
request. Further information is needed to resolve this actual ambiguity.
Extraneous A report of all clients (and their names and addresses only) for the Tax and
Business Services department. Some of those clients are our biggest
earners, you know.
The last sentence is extraneous - unlike pragmatic ambiguity, it contains
information that is redundant, uninformative, or not necessary to meet the
needs of the question or task asked in the statement. It is "noise" in the
communication - where more words are used than are necessary to make
the statement.
Emphatic A report of our good clients.
Ambiguity here could derive from the lack of ability to provide the verbal
emphasis of the words in its written form. Depending on the emphasis
used, "good clients" could be legitimately interpreted to be clients that pay
on time, clients that have the most dollar-value sales, our very best clients

85
Type Information Request
(a much shorter list than if based on dollar-value), or even, with the correct
sarcastic or ironic emphasis on the spoken word, our worst clients - those
that do not pay.
Suggestive A report of the clients of this accounting practice that have lodged taxation
returns in the past five years in accordance with the requirements of the
Australian Taxation Office.
The request for information is quite clear until the phrase "in accordance
with the requirements of the Australian Taxation Office". By definition, all
taxation returns should be lodged in accordance with these requirements.
The extra phrase introduces suggestive ambiguity into the information
request by suggesting that the report will not necessarily consist of all
taxation clients.

86
Mark all Information Requests in Accordance with the Following Scale

0 1 2 3 4
none A little Some Much A Great
Deal

No. Ambiguity Information Request


Type (Scale)
1. Management wants a list of each of our suppliers with no
duplicates in the list.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4
List the distinct suppliers of the items we stock.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4
2. Produce a report that lists the inventory items where the quantity
on hand is much larger, on a percentage basis, than the quantity
ordered.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4
List item number, item name, quantity on hand, quantity on order
where quantity on hand is greater than 2 * quantity ordered.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4

87
No. Ambiguity Information Request
Type (Scale)
3. Management wants a list of all Japanese customers and customers
with credit limits over $15,000.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4
List customer numbers, customer names, country, and credit limit
of customers with credit limits greater than $15,000 or of
customers in Japan.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4
4. Produce a report that statistically compares the credit limits for
customers in different countries.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4
List country, average credit limit, and standard deviation of
customer credit limit grouped by country.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4
5. Produce a report of clients that prefer the Speedair carrier and
addresses.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4

88
No. Ambiguity Information Request
Type (Scale)
List customer number, customer name, street, city, post code, and
country where the customer's preferred carrier is Speedair.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4
6. We're wondering if some of our winemakers are using poor quality
packaging and bottles - we've had a few complaints. Can you get
us a report that gives us some sort of idea about what items we are
shipping compared to what the customers are taking delivery of?

It would probably be a good idea while you're at it to give a


comparative percentage of the stuff shipped that doesn't make it -
just so the vintners won't try and weasel their way out of it, you
understand, they're good at that.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4
List item maker, item number, item name, and 100 * (sum of
quantity shipped less sum of quantity accepted) / (sum of quantity
shipped) where the type of alcohol is wine.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4
7. Prepare a report that provides *all* customer's details and
indicates the number of different products they have ordered from
us.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4

89
No. Ambiguity Information Request
Type (Scale)
List customer number, and customer name for *all* customers,
and, if they have ordered anything, a count of unique items
ordered.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4
8. Management wants to know which customers we've shipped goods
more than 10 times to them by the shipper that they requested.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4
List customer number, name, and count of invoices, where the
actual carrier is the same as the customer's preferred carrier,
having more than 10 shipments.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4
9. Produce a report, with best items first, on the gross contribution to
profitability of each inventory item for July 1999.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4
List item number, item description, and (unit price less unit cost)
multiplied by units sold in July 1999. Sort your output by
descending gross contribution to profitability.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4

90
No. Ambiguity Information Request
Type (Scale)
Suggestive 0 1 2 3 4

91
No. Ambiguity Information Request
Type (Scale)
10. Produce a report with the relevant customer details that gives us an
idea of how much of our business is exposed to foreign currency
fluctuations.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4
List customer number, customer name, customer country, and a
total of the amount paid where the settlement currency code for the
invoice is not equal to the currency code for Australian dollars.
Group results by customer number.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4
11. Management is concerned about current slow-moving inventory
items, based on shipments since 1 June 1999. Produce a report of
the items that they might be most concerned about.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4
List inventory item number, item description, quantity on hand,
and sum(quantity shipped) with ship dates greater than 1 June
1999 that have sums of the quantity shipped less than the sums of
the quantity on hand.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4

92
No. Ambiguity Information Request
Type (Scale)
12. Produce a report that gives some idea about our best USA export
items where the amount since March is bigger than $5,000.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4
List item numbers, item descriptions and the total accepted
quantity times agreed price of each item for items shipped to US
customers since 1 March 1999 and having a total accepted quantity
times agreed price greater than $5,000.
Lexical 0 1 2 3 4
Syntactical 0 1 2 3 4
Inflective 0 1 2 3 4
Pragmatic 0 1 2 3 4
Extraneous 0 1 2 3 4
Emphatic 0 1 2 3 4
Suggestive 0 1 2 3 4

93
Appendix L: Internal Validity of the Experiment

A full explanation of the recognised seven "threats" for the internal validity of experiments is

contained in Huck et al. (1974). The comments made below have their basis in the discussion

presented in Huck et al. (1974).

History

The history threat to internal validity arises where an event outside of the domain of the

experiment occurs that may affect the independent variable. As the experiment took place

over a two hour period in a controlled setting, over two days of experimental testing, there is

not considered to be a history threat to internal validity for this experiment.

Maturation

Maturation occurs where the participants mature, grow, and learn during the course of the

experiment. The passage of time increases the recorded end user query performance. Any

maturation effect is adequately controlled for in this instance, as the experiment was two

hours in duration, homogeneous groups were used, and each tutorial group tested contained

both Group A and Group B participants. Further, both groups received the ambiguity

treatment on alternate questions. Any residual maturation effect (such as learning the use of

the SQL experimental tool or increased proficiency in SQL during the experiment) applies

equally to the clear and ambiguous treatment effects.

94
Testing

Testing occurs where the individuals taking the test score higher than their first sitting of the

test. Within this experiment, the possibility exists that participants learned more about the

use of the experimental tools and process (the SQL editor). Subsequent questions (for

example, question one compared to question six) might result in superior performance

(particularly time for completion) due to the testing effect. Due to the factors cited for the

maturation effect, any experimental testing effect - should there be any - applies equally to

both the clear and ambiguous formulations of the question. Additionally, participants who

had undertaken similar experiments previously are stratified into separate classes. Group A

and Group B were homogeneous in this respect. Therefore, both within the experiment, and

from previous experiments, any testing effect that exists in this experiment from these

sources applies equally to both treatment effects.

Instrumentation

Instrumentation is identified by Huck et al. (1974) as the effect of any change in the

observational technique accounting for any experimentally observed difference. This could

arise in the current experiment with a maturation change in the assessors over the time taken

to assess student responses. Assessors could correct later participant responses differently to

earlier participant responses.

This effect is controlled for in several ways. Firstly, when assessing responses, assessors had

no means to identify participant responses by student name, only student number. This

avoided assessors' preconceptions about student's performance. The use of two independent

assessors controlled for some differences in marking strategies, as did the use of diary notes

95
to ensure consistency of marking over time. An exhaustive cross-checking and data

correctness procedure also mitigates this effect.

Responses were assessed by student in no particular order. Group A and Group B participant

responses were evenly distributed in the marking order, with a calculated non-parametric runs

test z statistic of 0.9924 (Newbold 1984). This weak z-statistic (significant only at a 32%

confidence level on a two-tailed hypothesis) implies that any residual instrumentation effect,

should it exist, is evenly applicable to either question formulation. Overall, the threat of

instrumentation to experimental results in this regard is controlled for.

Statistical Regression

Statistical regression occurs where the analysis of the experiment is on extreme scores, such

that subsequent tests tend to regress to the mean (Huck et al. 1974). The current experiment

is not exposed to this threat to internal validity, as extreme scores are not the focus of the

experiment. Furthermore, the experimental design and assessment process used adequately

controls for this threat to internal validity, as previously described.

Mortality

Mortality occurs where participants drop out of the experiment during its course. As this

experiment is short in duration (two hours), participant mortality did not occur during the

experiment. In addition, all sixty-six students enrolled in the subjects participated in the

experiment. The mortality effect is of some concern, however, in that incomplete participant

responses were removed from the analysis. There were 506 participant responses, of which

425 responses were completed and statistically analysed in the experiment.

96
The effect of this acknowledged experimental bias is to reduce the total number of responses

examined, and a general tendency to remove from analysis responses with a significant

number of errors. As this bias tends to be against the direction of the hypotheses made in this

paper, any conclusions drawn in this regard are strengthened, and the mortality effect on

interpretation of results is lessened. Overall, the mortality effect strengthens any conclusions

drawn, and thus is less of an internal validity issue for the current experiment.

Selection Bias

The selection process resulted in two homogeneous groups, Group A and Group B, drawn

from the entire student population of two information systems subjects. There is no evident

selection bias between Group A and Group B. In any case, both Group A and Group B

received the treatment effect of ambiguity on alternate questions, further mitigating concerns

of the effect of a selection bias on experimental results.

97

Potrebbero piacerti anche