Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
DOI 10.1007/s11135-005-3150-6
Abstract. Design has been recognized for a long time both as art and as science. In the six-
ties of the previous century design-oriented research began to draw the attention of scien-
tific researchers and methodologists, not only in technical engineering but also in the social
sciences. However, a rather limited methodology for design-oriented research has been devel-
oped, especially as to the social sciences. In this article we introduce evaluation methodol-
ogy and research methodology as a systematic input in the process of designing. A designing
cycle is formulated with six stages, and for each of these stages operations, guidelines and
criteria for evaluation are defined. All this may be used for a considerable improvement of
the process and product of designing.
1. Introduction
Up till the last decade most of research methodology in the social sci-
ences is primarily concerned with theory-oriented research, as at that stage
most of these disciplines aimed at knowledge just for knowledge (l’art pour
l’art). As a consequence of a push from society, from then on scientific
researchers and methodologists pay increasing attention to practice-ori-
ented research. In this challenge they are mainly focussed on improvements
of existing reality. More specifically, they aim at the solution of what may
be called improvement problems. However, the last few decades gradually
came into being another type of practice-oriented research that is aimed
at the creation of a new artefact. Here the researcher aims at a solution
of a so called construction problem. In design literature these improve-
ment and construction problems are labelled “normal”, respectively, “radi-
cal” (Vincenti, 1990) or “inventive” (Dasgupta, 1996) problems.
In this article, we will focus on research aiming at solving construc-
tion or inventive problems, which we will call design-oriented research. This
type of research exists already for a long time in the technical disciplines.
∗
Author for correspondence: Piet Verschuren, Department of Methodology,
Nijmegen School of Management, Radboud University, P.O. Box 9104, 6500 HE
Nijmegen, The Netherlands. Tel: 31243611469, 31243581324; Fax: 31243612351; E-mail:
p.verschuren@fm.ru.nl
734 PIET VERSCHUREN AND ROB HARTOG
But in the social sciences it is rather new, so that it lacks in large part
the support of design methodology. Moreover, in our view existing design
methodology does not provide sufficient explicit rules on evaluation as an
integral part of a designing process. Literature on designing (Alexander,
1979; Brown, 1988; Simon, 1996) indicates that designers should be well
aware that designing involves more “perspiration than inspiration”. That
is, the designer must be very critical as to the utility and satisfaction of
the future users and the other stakeholders. So the artefact to be designed,
once realized, should satisfy a set of design criteria. From this it follows
that evaluation should play an important role in the process of design-
ing. And this in its turn means that the designer may benefit from existing
research methodology in general (Yin, 1984; Creswell, 1994; Denzin and
Lincoln, 1994; Yin, 1994; Babbie, 1998), and from evaluation methodology
in particular (Mohr, 1995; Patton, 1997; Rossi et al. 1998, Pawson 1998;
and Tiley, 1997). For that reason in this paper both evaluation and empir-
ical research will have a central role in the process of designing. We will
match design-oriented research on the one hand, and existing know how
on evaluation research and research methodology in general on the other.
We firstly unravel the process of designing in six stages, the so-called
designing cycle, as a counterpart of the intervention or policy cycle in busi-
ness and policy administration (Section 2). Next we give a short overview
of different types of evaluation that are relevant for designing (Section 3).
Then criteria for evaluation of processes and products of designing are for-
mulated, ready for use as touch stones in each instance of evaluation (Section
4). Finally with the aid of these tools we elaborate on evaluation within each
of the six stages of the designing cycle (Section 5), followed by conclusions.
(1). First hunch: The very first stage of a designing process is the appear-
ance of a first hunch and initiative for constructing a new material
or immaterial artefact. The main result of this stage should be a
small set of goals [G] to be realised with the artefact to be designed.
For instance the goal of an aircraft designer may be the construc-
tion of a new type of aircraft aimed at the transportation of flowers
from Africa to the northern hemisphere. Or a manager may want to
have designed a helpdesk system that supports all employees of an
organisation with respect to the use of office applications such as a
EVALUATION IN DESIGN-ORIENTED RESEARCH 735
After this overview a few remarks on the designing cycle have to be made.
First of all it should be noted that very often evaluation in stage 6 points
out that the artefact not yet fully comes up to the goals [G] and the expec-
tations or requirements of the stakeholders. This may be an occasion to
738 PIET VERSCHUREN AND ROB HARTOG
3. Types of Evaluation
There is quite a bit of literature on design methodology, although this
is more the case in the domains of technical engineering (Asimow, 1962;
Cross, 2000), architecture (Alexander, 1964) and the building of informa-
tion systems (Dasgupta, 1991; Gamma et al., 1994) than in the social sci-
ences. However, this literature is surprisingly implicit on the subject of
evaluation. What we need is an explicit role of evaluation in the process
of designing, as well as a conception of evaluation that goes far beyond
common sense. For that reason in this and the following section, we will
link the designing process to existing evaluation methodology and research
methodology.
In this article, by evaluation we mean: “to compare separate parts of a
designing process with selected touchstones or criteria (in the broadest sense
of the word), and to draw a conclusion in the sense of satisfactory or unsat-
isfactory”. Within the context of designing we make a distinction between
the following three rough stages of designing: (a) The plan (on paper) of
the design, i.e., the product of the first three stages of the designing cycle,
(b) the realisation or carrying out of this plan, which roughly regards the
stages 4 and 5, and (c) the effects that the use or the presence of the
EVALUATION IN DESIGN-ORIENTED RESEARCH 739
artefact has, i.e., stage 6. This threefold grouping of stages coincides with
a well known distinction in evaluation methodology, i.e., plan, process and
product evaluation, respectively. So the first three stages 1–3 the methodol-
ogy of plan evaluation should be used, in the stages 4 and 5 we may make
use of process evaluation and in the last stage 6 product evaluation is at
stake.
Plan evaluation implies an assessment of the quality of the design on
paper. If we call the combined set of requirements [R], the assumptions [A]
and the specifications [S] the means to achieve the goal [G], then a plan
evaluation involves mainly a separate test of the adequacy of (1) the goal,
(2) the means and (3) the relationship between the goal and the means.
More details about plan evaluation follow in Section 4, where criteria for
evaluation are formulated.
For the second group of stages a focus on process evaluation implies
that the issues and objects of consideration are the constructive activities
and the means that are used in realizing the plan that was the result of
stage (3).
Product evaluation, finally, involves finding out what are the results of
the designing process, what the value of these results is, and what are the
short and long term effects of the artefact once it came into being.
Although, above the three types of evaluation are linked with three
rough stages in the designing cycle, we may also use process and product
evaluation to each separate stages 1–6. That is, besides an evaluation from
the designing activity as a whole, each separate stage asks for carrying out
activities (process) and should end up with a result (product). So, each of
the six stages should also be evaluated on its own merits by means of pro-
cess and product evaluation. Especially, process evaluation of this type is
very important.
Plan, process and product evaluation differ highly as to the aim of the
evaluation and the way the evaluation has to be carried out. The aim of a
plan evaluation is a logical, ethical and empirical check of (the quality and
appropriateness of) all separate design requirements [R], design assump-
tions [A], structural specifications [S], and the design goal(s) [G]. It should
also be evaluated whether they form a coherent and balanced whole. One
reason for the latter is that the whole in principle is, or at least should be,
more than the sum of its parts.
In general the aim of a process evaluation is to improve the process,
and via this the product, of designing. Very often process evaluation is also
essential in order to prevent defects that will be hard to detect, let alone
to repair, in the very last stage of the designing process. For instance, the
number of tests that should be done on the final version of the software of
the helpdesk system in order to ensure that there are no errors, is too large
and too time consuming from a practical point of view.
740 PIET VERSCHUREN AND ROB HARTOG
Besides, the reader should keep in mind that the concept of ‘ex post’ is
relative. Ex post evaluation may be relevant for further improving the arte-
fact after the first run of the designing cycle is finished. That is, it can be
used as an input for a second run aimed at a further development of the
artefact or for adapting it to changing conditions. This situation may hap-
pen quite often, as most (construction) problems are so complex that they
cannot be solved in one single run. In that case it has a formative function
as opposed to summative evaluation. And, last but not least, ex post eval-
uation is important for improving design methodology. That is, especially
with respect to our faults made during the process of designing as a whole
we can learn how to do better next time.
A fourth distinction is the one between goal based and goal free eval-
uation. In a goal based evaluation we judge a design or parts thereof as
to the extent that it contributes to achieving the design goal(s) [G]. So,
as a generic example, effectiveness assessment is goal based by definition.
In contrast, in a goal free evaluation the evaluator observes whether the
design satisfies general professional or practical criteria or standards not
directly linked with the design goal [G].
For most people goal based evaluation is the normal case. However,
there are at least two handicaps in goal-based evaluation in the context of
designing that may make it a difficult if not impossible job. The first is as
already said that in general the design goal [G] seldom is quite clear at the
beginning of the designing process. And if it is clear it often is defined only
at a conceptual level instead of in operational terms. The reason for this
is that there is very easily disagreement between the stakeholders as to the
design goal(s). Keeping them at an abstract level or vague is one of the
most used strategies to achieve consensus. Secondly there may be several
goals without an indication of their order of priority. For obvious reasons
in a goal-based evaluation we need operationally defined, stable goals that
have a predetermined order of priority.
Finally a reason why goals in general and designing goals in particular
are often not operationally defined is that people in general are less “goal
rational” than they seem to be at first sight. Accordingly, the design goals
[G] very often are not clear at the start and the designer further develops
them during the designing process. Unfortunately for the evaluator this is
rather common practice. (Verschuren and Zsolnai, 1998). He or she may
try to solve this by carrying out a goal-free evaluation. This may mean that
instead of using design goals as a standard, the designer/evaluator has to
use other general criteria, such as whether future users will accept the arte-
fact to be designed, or whether the artefact comes up to general profes-
sional standards or to expectances of the public.
744 PIET VERSCHUREN AND ROB HARTOG
ity for crosswind during take-off and landing”, and “response time to the
control wheel”. These in their turn may ask for further unravelling. For
instance, “behaviour in turbulent conditions” may regard different catego-
ries of users. It may include the effects on the control wheel and rud-
der pedals (pilot), the shaking of the fuselage (passengers) or the allowed
maximum speed in specified conditions (all).
As to type 1 operationalisation, an example can be taken from instruc-
tional design, more specifically from course design. If the goal [G] in a
course “decision theory” is that students become motivated to achieve
generic insights in the phenomenon of decision-making, this by far is not
an operationally defined goal. An operational specification is for instance
that 90% or more of the students who attend at least 80% of the lectures
should have at least an 80% score on a measurement scale for measuring
motivation. The reader should make clear that here again a list of addi-
tional conditions must be specified as to the type of students, the type of
teachers, all kinds of learning aids et cetera, in order to make the require-
ment fully operational. Once the requirements [Rf ], [Ru ] and [Rc ] have been
unravelled and operationalised, i.e., translated in observable or measurable
terms, they are labelled (operational) design criteria [C]. Depending of what
type they are, they are denoted [Cf ], [Cu ] and [Cc ]. They are the result of
unravelling and an operational specification of more roughly defined func-
tional, users and contextual requirements, respectively.
As to type 2 operationalisation the following is relevant. Defining a
mode or a score is not enough if it comes to formative evaluation. If for
instance the aircraft turns out to consume more kerosene than was speci-
fied in advance, this fact alone does not provide any clue as to what can
be done in order to come closer to the required criterion in the next test.
However, a design requirement regarding the composition of the exhaust
fumes may point to incomplete combustion or combustion at certain tem-
peratures, and thus give information about the efficiency of the combus-
tion. Also if the students of the course “decision theory” receive a low
score for motivation this in itself does not tell the designer how to improve
the learning material. For a formative evaluation we also need information
as to which parts of the learning material are appreciated most and which
least by the students.
Besides, this operational clearness, for an evaluation of the requirements
Rf , roughly the same criteria may be used as for the goal(s) [G]. As to Ru
and Rc by far the most important criterion for evaluating is validity, i.e., the
question whether these correspond to reality. Next may also be used crite-
ria such as clearness, feasibility, affordability, consensus and ethical accept-
ability.
As to the second instance of plan evaluation concerning the relations
between the several demands, we can be short. Here the central criterion
746 PIET VERSCHUREN AND ROB HARTOG
line that can be derived from decision theory is that the scope of decisions
during the designing process will be explicitly formulated and underpinned
by arguments as well. This helps to reduce the number of design decisions
with overlapping scope, and thus helps to reduce the number of potentially
conflicting design decisions.
Another criterion for process evaluation borrowed from decision theory
is whether the designer makes a distinction between strategic, tactical and
operational decisions, and whether he or she does not make a mistake as
to the order in which these decisions have to be taken. Strategic design
decisions are essentially decisions about the artefact as a whole. They will
be taken in the initial stages of the designing process. Reversing the order
of strategic and tactical decisions incurs relatively high costs. In our air-
craft example a strategic decision is the choice of both the type and the
material of the fuselage and wings. This decision is strategic for several rea-
sons. First of all it requires an overall view of the field of aircraft design.
This in contrast to decisions about the form and the material of the fuel
tanks, which domain is a rather narrow one. The latter is a tactical decision
that can be delegated to someone whose expertise is much more limited,
but maybe also much more detailed within its limitations. Second, changing
a decision about the fuselage and wings will have consequences for many
other decisions: another fuselage will have more or less mass, the mass will
be distributed differently, the form of the aircraft will be different, the way
it behaves in the air currants may be different et cetera. Finally operational
decisions are decisions of a very detailed and recurrent nature such as deci-
sions about the lay out of the hydrolics throughout the aircraft.
Next, we have to decide how to check process guidelines in the context
of a process evaluation. Empirical measurement of departures from design
guidelines may be difficult, because of the many, often not formalised and
moral aspects that may be involved and that ask for human consideration.
Currently there seem to be two options: we can invite experts to check if
predefined process guidelines are being obeyed during the process or we
can translate guidelines into requirements for intermediate products and
check if the intermediate products satisfy these requirements. The latter is
often feasible as has been shown for instance in Quality Function Deploy-
ment (Hauser and Clausing, 1988). However, it may slow down the design
process considerably because of the extra effort that must be invested in the
production of intermediate products. The modular design guideline is easy
to test by checking if intermediate products are modular. But it is difficult
to check whether the designing process is constantly aimed towards a mod-
ular design, i.e., by checking if the designer consciously thinks in terms of
modules with minimal interdependencies and maximum internal coherence.
On the other hand this form of expert review is exactly what an advisor of
748 PIET VERSCHUREN AND ROB HARTOG
5. Evaluation in Stages
Once it is clear how the designing process should be structured, what types
of evaluation are relevant, what is the role of design guidelines and require-
ments, and what type of criteria may be used, we can start a discussion on
evaluation as part of the designing process. For each stage in the design-
ing cycle we discuss the evaluation that should at least make part of the
designing process. Here may be at stake local process and product evalua-
tion on the one hand, and overall plan, process and product evaluation on
the other.
1. First hunch: In this first stage of the designing cycle the designer/evaluator
has to answer the question whether all conditions were fulfilled to have a
fruitful idea about the creation of a new artefact. He or she especially must
check whether the design goal [G] really covers the desires of the stake-
holders. If the desires where studied by means of empirical research such
as an interview, a questionnaire or documents, then the evaluators have to
check the validity, reliability, researcher-independence and verifiability of the
research. These are standard scientific quality criteria, largely elaborated in
methodological handbooks.
Although many people may take this first stage of the designing cycle
for granted, being more a matter of common sense, intuition and arts
rather than of systematic thought, rationality and science, it is worthwhile
to consider this stage from a methodological and an evaluative point of
view. For obvious reasons it is very important that, right from the start,
the designer has a clear idea and overview of all the recent social and
technological, material and immaterial commodities, raw material, (half)
manufactures, modules and subroutines that are available and from which
he or she can make use in the process of designing, especially in the
stages 4 and 5. It is very unlikely that new and fruitful ideas come out
of the blue. (Alexander, 1964; Brown, 1988; Nonaka and Takeuchi, 1995;
Csikszentmihalyi, 1996; Simon, 1996). If the designer is not up to date
and well informed in this respect, the artefact to be designed and produced
most probably will not be sufficiently innovative. (Csikszentmihalyi, 1996).
It even may be superseded before it is produced. Thus a design guideline
for this stage is that the designer(s) should invest sufficient effort in acquir-
ing knowledge and information of all those aspects and details that may
be important to construct the prototype, i.e., to realise the design. If in
the helpdesk example the business process designers do not understand in
detail the variety of problems with respect to office programs, if they are
not familiar with office programs, and neither with the many different types
750 PIET VERSCHUREN AND ROB HARTOG
of users, nor with the different forms of pressure that can be exerted on
help desk employees, it is very unlikely that their concepts will turn out
to be effective. In short, local process evaluation during stage (1) involves
at least a check whether the designer is knowledgeable or invests efforts in
knowledge acquisition.
However, a test whether relevant fields and disciplines were taken
into consideration is not enough. An unexpected and innovative hunch
becomes more probable whenever we bring together experts from (totally)
different fields who normally have no contact with each other. For an iter-
ative brainstorm of experts the researcher may make use of participatory
research techniques such as Delphi, workshop techniques, focus group
interviewing, gaming and scenario building with experts to elicit their
relevant knowledge and information. The choice of experts and the com-
munication and fine-tuning between them, must be balanced against the
importance of the design problem and the opportunities of the designer. In
many cases this implies that the construction problem at hand is already
sufficiently well formulated in order to entice a number of people to invest
attention in it.
However, this initial stage of the designing process is seldom satisfac-
torily formulated. Thus it is advisable to invest efforts in reformulating
the design challenge and looking at the problem from different perspec-
tives. It is clear that ex post evaluation of the result of stage (1) will
not easily convey to what extent the designers have looked at the prob-
lem from different perspectives. So we have to ask them questions (inter-
view or questionnaire) as part of the evaluation. At the same time it is
clear that during the initial stages of the process of designing the guide-
line to look at the design challenge from different perspectives, may lead
to the formulation of different partially mutual exclusive sets of func-
tional requirements (Rf ). For instance, in the case of the helpdesk, one
line of reasoning is that the helpdesk provides information to employees
who encounter a problem. Another line of reasoning is that the helpdesk
should monitor what problems cost much time in the organisation, and
then should come up with proposals for alternative ways of proceeding.
If many employees ask helpdesk support with respect to the data han-
dling options in the spreadsheet program, the helpdesk can try to find out
why so many employees do individual data handling. It also may (help)
answer the question why they want to do this with a spreadsheet pro-
gram and what might be an alternative for the organisation as a whole.
A helpdesk with this function will differ considerably from a helpdesk that
only gives information with respect to a specific detail of an office applica-
tion.
A designer looking for input from different perspectives should rea-
lise that these perspectives often stem from different underlying para-
EVALUATION IN DESIGN-ORIENTED RESEARCH 751
arguments? See also other criteria to be used at this stage, as these were
formulated in the last section.
2. Requirements and assumptions: In this stage empirical research should be
carried out in order to find out what are the users [Ru ] and the contextual
requirements [Rc ]. Thus standard criteria for evaluation here are the empir-
ical validity and reliability, as well as the researcher independence and ver-
ifiability of the results. As to Rf , at this stage the designer/evaluator has to
carry out a logical test, whether the functional demands really fit the set
goal(s) [G] of stage 1. The set of functional requirements [Rf ] should cover
this/these goal(s), no more (error of commission) no less (error of omis-
sion). (see Verschuren and Zsolnai, 1998). Next the designer needs insight
in the user requirements [Ru ]. It stands to reason that whenever there is a
very large group of potential users, we may draw a random sample out of
the target population, and send them a questionnaire by post or by email.
Interviews may be better because they offer an opportunity to interact with
the respondent, but these are more time consuming. Again compliance to
standard scientific criteria is required. The questionnaire or interviews may
yield a big number of different and even contrasting demands. If these
demands cannot be reconciled in a satisfactory way, the designer either has
to reduce the target group or to design different variants of the artefact, or
both. The same holds for an operational translation of initial formulations
of requirements into design criteria [C].
If the members of the users group are supposed to interact and com-
municate intensively in employing the artefact once it is realised, then once
more in many cases the researcher preferably uses participatory techniques
such as focus group interviews and workshop techniques. Because in these
methods interaction and communication between the participants play an
important role, this gives a better opportunity for obtaining relevant data
than individual face-to-face interviews. A good option is gaming. This is a
form of human simulation. By building a game of something that resem-
bles the artefact as the object of gaming, followed by playing the game
with the future users of the artefact, the researcher in principle obtains a
clear and detailed insight in what is important or not. However, gaming is
very expensive and time consuming. In any case it is important at this stage
that the stakeholders get a clear idea of the artefact to be designed and of
the context in which and the purposes for which it will be used.
Of course, all this is an instance of ex ante evaluation as the arte-
fact still is not realised. Thus, if gaming is not opportune the questions
must be answered on the basis of imagination and a “mental eye” of/on
a future state of affairs. In principle methods and strategies for empirical
research are of little use here. In fact this means that the specification of
users requirements [Ru ] may remain a problem (Dasgupta, 1991). Experi-
ence has shown that for innovative design users often do not know what
EVALUATION IN DESIGN-ORIENTED RESEARCH 753
they want, which makes validation of user requirements ex ante very diffi-
cult if not impossible. This is also the main reason for a designing strategy
that involves forms of rapid prototyping in order to enable the prospective
users to experience the opportunities and threats of the proposed innova-
tions (Stapleton, 1997).
Finally the researcher has to find out what are the relevant contextual
requirements [Rc ]. What practical, social, political or juridical side condi-
tions are at stake? Here to a certain extent the same arguments are valid as
is the case with the assessment of the user’s requirements [Ru ]. Besides, as
sometimes requirements are laid down in official documents, the researcher
should gather these documents and carry out a systematic content analysis.
To derive and verify the separate design requirements [R] is a necessary
but not a sufficient task at this stage. On top of it, as a part of a plan evalua-
tion the designer has to check the logic of the combination of and the relations
between the three classes of exogenous requirements [Rf ], [Ru ] and [Rc ]. Here
not only expert knowledge may be used. Besides, some special procedures are
available. One of these is Quality Function Deployment (QFD) (Hauser and
Clausing, 1988). This is a methodology that supports the process of making
explicit the relations between [G], [Ru ], and [Rc ]. Central in the description of
these relations is a series of matrix like constructs. Such a construct is called
“the house of quality” in which the rows describe detailed user requirements
in the language of the user, and the columns describe engineering variables
in the language of the engineer. The roof makes a connection between engi-
neering variables from different engineering aspects. This methodology may
function as a design guideline and thus as a testing criterion for evaluation
at this stage. However, as long as there is no experience with QFD in a suffi-
ciently wide range of different design areas, application of it is certainly not
a trivial exercise (Costa et al., 2001).
Another approach in the process of specifying exogenous requirements
is to use a pattern language (Alexander et al., 1977; Gamma et al., 1994)
to describe design patterns. A pattern language enables us to make explicit
those design requirements that became clear during a series of previous
design efforts in the same design area. Alexander (1964) sees “the process
of achieving good fit between two entities as a negative process of neu-
tralising the incongruities, or irritants, or forces, which cause misfit”. His
approach of defining design patterns aims at describing advice for neutral-
ising those incongruities and forces. As an example we present one of the
design patterns that are typical for a farm house in the Bernese Oberland:
“North south axis//west facing entrance down the slope// two floors//hay
loft at the back//bedrooms in the front//garden to the south//piched roof//half
hipped end//balcony toward the garden//carved ornaments//. (Alexander,
1979)”. The use of design patterns as a checklist for evaluation has called
growing interest in completely different fields (Gamma et al., 1994). The
754 PIET VERSCHUREN AND ROB HARTOG
minimal form of process evaluation for this stage is a check whether they
have used a methodology in order to establish the functional requirements
[Rf ], the user requirements [Ru ] and the contextual requirements [Rc ], and
whether they have used a methodology to translate these requirements into
operationally defined requirements, i.e., design criteria [Cf ], [Cc ] and [Cu ].
As pointed out in Section 3, this entails unraveling key variables in parts
and aspects, setting modalities or scores on these variables that should be
achieved, and finding criteria that make clear how a design can be improved
if formative evaluation urges this. The next step is to decide whether the
designers did select a useful methodology for this stage, followed by a check
whether the designers did correctly follow the guidelines in the chosen meth-
odology.
Product evaluation in this stage involves answering the question whether
the output of this stage consists of operationally defined design require-
ments, and if these requirements really cover the exogenous requirements
and at the same time match the goals [G].
If all requirements are operationally defined, and if it could be estab-
lished unambiguously that the requirements fully cover the goal(s) [G], then
empirical evaluation in the later stages can be straightforward. In that case,
the soft spot in the evaluation is localised in stage 2. In practice the output
of stage 2 seldom completely satisfies all stakeholders. A number of design
requirements is often formulated ambiguously or there is doubt whether
these requirements [R] cover the exogenous demands and the goals. Both
shortcomings of the outcome of stage 2 will lead to proliferation of soft
spots to other stages of the designing cycle and make evaluation in other
stages more difficult. This may force the designer to go back to stage 2 and
to improve the formulations of the requirements. This again is an instance
of an iterative designing strategy.
Finally at this stage, the designer/evaluator has to check the credibility
and acceptance of the assumptions [Af ], [As ] and [Ac ]. This of course pri-
marily is a matter of empirical research. In case of insufficient credibility
the designer either has to induce changes in reality, for instance by giving
information and instructions, or to adapt the design or both.
3. Structural specifications: In this stage evaluation aims at an assessment
of the quality of the translation of the design requirements into the struc-
tural specifications. This is a logical rather than an empirical test. Here,
we especially have to look at structural alternatives. That is, mostly a given
functional requirement [Rf ] may be served by several alternative structural
characteristics of the artefact to be designed. For instance, in our exam-
ple in Section 2, the functional requirement of rapid charging the aircraft
may not only be satisfied by a large cargo door as has been proposed. A
structural alternative is the use of containers on a rail system that may be
charged in advance. For several reasons it is seldom feasible to select one
EVALUATION IN DESIGN-ORIENTED RESEARCH 755
and only one alternative as “the best”. For that reason (Simon, 1996) intro-
duced the term “satisficing” as the most typical characteristic of this stage
of the design activity. The choice of the alternative depends on the infra-
structural circumstances, desires of the users and stakeholders, and finan-
cial costs. So these too may be used as criteria for evaluation.
Besides, in this stage we have to evaluate as part of an iterative design-
ing strategy whether the functional requirements [Rf ] from stage 2 can be
mapped to a composition of (sub) systems or should be adapted. The lat-
ter essentially implies a reiteration of earlier stages. Such iteration is quite
common.
As already said, the output of stage 3 is a design on paper, i.e., a
detailed outline of the artefact. It has the form of a blueprint that allows
direct implementation of the outline into a prototype in the next stage. Pro-
cess evaluation at this stage involves a methodological check of the spec-
ifications that are used during this stage, after they have been unraveled
and operationalised (if necessary). A product evaluation implies a check
whether the results of this stage are compliant with the design criteria [Cf ],
[Cu ] and [Cc ], and a check whether the results of this stage are sufficiently
clear for those who have to work with the structural specifications in stage
4 (feed forward).
One of the differences between experienced and young designers is that
the first mostly are able to directly evaluate and exclude at face value pos-
sible structural alternatives. Thus they efficiently allocate their resources to
that set of alternatives that matters. A guideline that may be helpful nota-
bly to inexperienced designers is the following: First look at those con-
straints that cut off the largest parts of the set of alternatives but at the
same time leave as much options open as possible. In other words search
for those constraints that make many solutions not worth considering. But
be also sure that you do not throw away a solution before you are con-
vinced that the solution is not feasible.
By means of a continuous reflection on guidelines for efficient evalua-
tion of possible alternatives (“the problem space” (Simon, 1996)), we may
guarantee that the design resources are not spoiled on unpromising corners
of this space. It is unlikely that a good design will result if most design
efforts were wasted in the wrong parts of the problem space.
With respect to product evaluation in this stage, it is clear that the
results of both the preliminary and the detailed design stage must be eval-
uated against the design criteria. As the design is not yet implemented in
this stage, an empirical test of these criteria is still not feasible.
As to an eventual modular structure of the design, evaluation may
involve a check by experts in order to make sure that the interfaces of
the modules have been defined according to a specific formalism. Thus, for
a modular course in higher education the interface of a module usually
756 PIET VERSCHUREN AND ROB HARTOG
the case, then the researcher has to adapt the structural specifications or
the way these are realised, or both.
In short, product evaluation at this stage regards at least the relation
between the design (on paper) on the one hand, and the prototype on the
other. This evaluation is an analogy of the verification step in modeling
(Schlesinger et al., 1979). A mismatch between the symbolic representation
and the prototype may be detected in an expert review or as a result of
an empirical test of the behaviour of the prototype in the context where
it should function in the next stage of the designing cycle. Also a focus
group interview with the future users may shed light on the question how
to improve the prototype.
Although other forms of prototyping do not fit the scope of this article,
a few remarks should be made here. If the design does not aim at mass
production, such as the design of a skyscraper, or a nuclear waste stor-
age facility, or a law), or in case the costs of mass production are virtu-
ally zero (such as the design of digital materials) it does not make sense to
realise a full-blown prototype. In such cases a scale model or partial prod-
uct may substitute the role of a prototype. Testing whether a scale model
or a partial product satisfies the design requirements is then the only real-
istic option, even though such a test is based on assumptions that relate
the test results to the behaviour of the full blown product. Evaluation of
the process that led to the scale model or to a partial product, implies an
assessment of the theoretical line of reasoning that leads to the conclusion
that the scale model or partial product both fit the structural specifications.
5. Implementation: In this stage, the process and the outcome of the imple-
mentation of the prototype has to be evaluated. In the context of a process
evaluation, primarily formative rather than summative, we try to answer
the question whether this implementation process was properly carried out.
The designer next has to follow the adapted implementation process guide-
lines, leading to an improvement of the prototype.
Evaluation in stage 5 means that we must check whether the condi-
tions under which the prototype is supposed to operate all have been rea-
lised. This boils down to a check whether the elements in set [A] have
been satisfied. In the example of the helpdesk it is likely that the design-
ers have made assumptions [A] about the way employees will access the
helpdesk. Thus, the implementation of the helpdesk may imply that every-
one will receive instructions or follow a training in how to access the help-
desk. Thus, implementation guidelines will involve a systematic check on
all assumptions, and some action of the designer whenever an assumed
condition turns out not to be fulfilled.
During the implementation stage or even during the test stage we often
will detect assumptions which were made implicitly and must first be made
explicit, or assumptions that were fulfilled when we started the design but
758 PIET VERSCHUREN AND ROB HARTOG
are not fulfilled any more because the environment has changed while we
were in the designing process.
Evaluation of the implementation stage of all this involves a check
whether all the contextual design criteria [Cc ] and contextual assumptions
[Ac ] have been satisfied. For instance, as to the latter, in our helpdesk
example one of the assumptions [Ac ] may be that every employee has a
sound card on his or her desktop. Other assumptions may be that every
user speaks and understands English, or that employees are willing to
transfer responsibility for decisions at a detailed level to the helpdesk. For
the latter they must have the right attitude. All these assumptions should
have been made explicit.
Sometimes a problem with assumptions about the environment can
directly be solved. For instance, in our example a soundcard can be added
to those desktops that turned out to have no soundcard after all. Of
course, it should be evaluated if the employees who will take part in the
prototype test, got the right and sufficient instructions. This instruction
must also raise the right expectations with these employees. Again these
“right expectations” should have been formulated as elements in the set
[A].
At this stage, the evaluator often has to rely on the opinion of practical
and theoretical experts in relevant domains. For doing this in a method-
ologically sound way we may bring together these experts in a workshop
or a focus group, thereby making use of appropriate participatory tech-
niques. Besides, qualitative research methods such as systematic observa-
tion, in depth interviews and qualitative content analysis of written and
audio-visual documents may be useful, rather than quantitative methods.
The reason for this is that several aspects of the context have to be bal-
anced. This can hardly be done in a quantitative or reductionalistic way,
such as by means of paired comparisons of experts. We rather need a
holistic approach, i.e., the use of group techniques and qualitative research
methods (Verschuren, 2001, 2003).
At this stage, the evaluator should also check whether the right users
were selected with respect to knowledge, skill, experience and attitude. And
also whether the users have access to a relevant infra-structure and logis-
tics. For checking compliance with these guidelines in a professional way
the designer again preferably uses qualitative and participatory methods for
data gathering.
At the end of this stage, the prototype is set into operation in an
environment that is compliant with [A]. Then the behavior of the proto-
type and its environment is compared with the design criteria mentioned
in Section 3. (Notice that some design criteria may refer to environmen-
tal variables! For instance, in case of the aircraft there could be a design
criterion defining maximum wake turbulence and in case of the helpdesk
EVALUATION IN DESIGN-ORIENTED RESEARCH 759
results and affects of this use on the other. For this causal principle we
need a randomised experiment. Next best is a correlational design in which
we keep constant suspect variables that may bias a causal conclusion or
which we analyse by calculating partial correlations. Still another possibil-
ity is a case study design, where an intensive and qualitative study of the
process of causation must test the plausibility of the causal hypothesis.
Assessment of the direct effect(s) may be followed by an evaluation after
the artefact has been used for a while in its real life context, in order to
assess the long-term qualities of it. This is still goal-based evaluation. How-
ever, because of rapid changing conditions and circumstances, and or in
case of not operationally defined goals [G], the evaluator may be forced to
carry out a goal free evaluation. In that case he or she resigns from the
“official” design goal or problem(s) to be solved [G]. Instead he or she sets
other professional criteria derived from theory or proposed by stakeholders
and/or experts in the field.
ator has to ask her or himself, whether the structural specifications make
a design that comes up to the criteria [Cf ], [Cu ] and [Cc ], as well as to
the assumptions [Au ] and [Ac ]. Of course, these mainly are criteria to be
checked at face value by means of logical reasoning.
Several types of evaluation appear to be relevant for the designing pro-
cess. In particular, the distinction between goal-based and goal-free eval-
uation is important for those designs for which insufficient operationally
defined design requirements can be formulated. To the extent that the
goals are fully captured in design requirements, goal-based evaluation is
essentially empirical requirement testing. However, in much design oriented
research the proof that the design requirements are a correct reformula-
tion of the goals is not trivial. One of the reasons is that the design
requirements are often much more detailed than the goals. Given a goal
[G] mostly more than one set of design requirements and structural speci-
fications are possible.
Designers/evaluators should be well aware that exogenous requirements
[R] and structural specifications [S] must be unravelled in different dimen-
sions and aspects, and from these operationally well defined criteria must
be derived. As long as we do not succeed in defining such criteria, ade-
quate formative evaluation will either involve expert reviews or will not be
possible at all.
For empirical evaluation of designs normal scientific criteria should be
used: i.e., validity, reliability, researcher independence and verifiability. Pri-
ority on the methodological research agenda should be how to evaluate
validity of the process that leads to operationally defined design require-
ments and the validity of goal-free evaluation.
References
Alexander, C. (1964). Notes on the Synthesis of Form. Cambridge: Harvard University Press.
Alexander, C. (1979). The timeless way of building. New York: Oxford University Press.
Alexander, C., Ishikawa, S. et al. (1977). A pattern Language: Towns, Buildings, Construction.
New York: Oxford University Press.
Asimow, M. (1962). Introduction To Design. Englewood Cliffs, NJ: Prentice-Hall.
Babbie, E. (1998). The Practice of Social Research. Belmont, CA: Wadsworth.
Booch, G., Rumbaugh, J. et al. (1998). The Unified Modeling Language User Guide.
Amsberdam, Addison-Wesley.
Brown, K. A. (1988). Inventors at Work : Interviews with 16 Notable American Inventors.
Redmond, Washington: Tempus Books of Microsoft Press.
Chandrasekaran, B. (1990). Design problem solving: a task analysis. AI-Magazine 11: 59–71.
Chen, P. P.-S. (1976). The entity-relationship model-toward a unified view of data. ACM
TODS 1.
Costa, A. I. A., Dekker, M. et al. (2001). Quality function deployment in the food industry:
a review. Trends in Food Science Technology 11: 306–314.
762 PIET VERSCHUREN AND ROB HARTOG